Part nine of the Linux Container series
Linux Container Primitives: Memory, CPU, Freezer and Device Control Groups
After discussing the Network and Block I/O controllers, this post considers the Memory, CPU, Freezer and Device controllers. The following list shows the topics of all scheduled blog posts. It will be updated with the corresponding links once new posts are being released.
- An Introduction to Linux Containers
- Linux Capabilities
- An Introduction to Namespaces
- The Mount Namespace and a Description of a Related Information Leak in Docker
- The PID and Network Namespaces
- The User Namespace
- Namespaces Kernel View and Usage in Containerization
- An Introduction to Control Groups
- The Network and Block I/O Controllers
- The Memory, CPU, Freezer and Device Controllers
- Control Groups Kernel View and Usage in Containerization
The Memory Controller (v2)
Processes can be accounted and limited regarding their memory usage. The following types of memory usages are currently being tracked :
- Consumed memory in user-space
- Memory usage in kernel-space, as in kernel data structures
- TCP socket buffers
The memory consumed by a control group can be read using the
memory.current file. One of the most common ways to limit memory usage includes setting the
memory.max value to define an upper memory limit for all processes residing in a control group.
A cgroup is charged for its memory usage when allocating memory. In turn, this accounting gets removed once
free() or similar mechanisms to free previously allocated memory are being used. When moving a process that has allocated memory in the name of a control group to another group, the original group is being charged for the allocated memory since these mappings are not being transferred.
The CPU Controller (v1/v2)
Various controllers exist for version 1 and 2 of the control group implementation to manage CPU utilization. There are three controllers for
cpu a control group is supplied with a guaranteed minimum time of CPU utilization. The
cpuset controller allows specifying a set of processors a process is allowed to be executed on. For the current process this can be examined with
cat /proc/$$/status | grep Cpus. Changes to this setting propagate to all descendants in the control group hierarchy. Accounting is performed with
cpuacct, for example the file
cpuacct.usage gives information on the consumed CPU time of all processes in a control group in nanoseconds.
Control group version 2 allows weight and absolute CPU limiting models with the
cpu subsystem. In contrast to
v1, the newer control group version does not support real-time processes. Therefore, all real-time processes have to be moved to the root control group first before activating the
cpu v2 subsystem in the cgroup tree.
The Freezer Controller (v1)
This control group does not limit or account resource usage - it rather allows freezing a process. Freezing a process ultimately stops the execution and suspends it. This allows analyzing the current state of a process with the ability to unfreeze it afterwards and continue the execution without side effects. By creating a checkpoint with the
freezer subsystem it’s also possible to move an entire running process, including its children, to another machine or restart a process from a specific state .
For this, the virtual file
freezer.state exists that can receive either
THAWED as input values to freeze and unfreeze a process. This works by walking down the control group hierarchy and marking all descendants of a process with the desired state. Additionally, all processes managed by the affected groups have to be moved in or out of the
freezer group, depending on the desired freezing state. Freezing itself is done by sending a signal to the affected processes. Also, the
freezer has to follow all child process of the affected processes that may result from calling
fork and freeze these as well to prevent freeze escapes .
The Devices Controller (v1)
This controller type allows to implement access controls for devices. One can use whitelist and blacklist approaches to only block or allow very specific accesses by defining exceptions. Child control groups are forced to have the exact same or a subset of the exception list of the parent. This results in faster checks whether a rule can be added to the exception list because only the list of the child has to checked and not the whole group tree. This controller is one of the few that makes use of the hierarchical organization in order to pass configuration information to its child groups.
For the the following example, the
devices controller will be used to restrict a process from accessing
To limit the usage of devices, their major and minor numbers have to be used. These numbers are the respective identifiers of a device in the filesystem tree. The major number describes the driver that’s required and is used by the kernel in order to access a specific device. The minor number is used by the device driver to distinguish logical and physical devices resulting from the existence of a certain device. In the above example for
/dev/null these numbers can be identified using
stat -c "major: %t minor: %T" /dev/null which yields the values
1 for major and
3 for minor.
First, a device control group has to be created with
cgcreate -g devices:nodevnull with the identifier of the control group being
nodevnull. To add the current shell process to this group, the command
cgclassify -g devices:nodevnull $$ will be invoked. The process identifier of the current shell process is
$$. To finally deny accessing the
/dev/null device, this command will be executed:
cgset -r devices.deny="c 1:3 rwm" nodevnull. The format of the parameter for
devices.deny is as follows:
The device type is determined by using the first character of the output of
ls -la /dev/null, which shows that it’s a character device.
Now accessing the specific device is being blocked, even for processes running as
root@box:~# echo "a" > /dev/zero # Allowed root@box:~# echo "a" > /dev/null # Denied bash: /dev/null: Operation not permitted
Next post in series
- The next post in this series ‘Control Groups Kernel View and Usage in Containerization’ will be published soon.
Follow us on Twitter, LinkedIn, Xing to stay up-to-date.
Credits: The elaboration and software project associated to this subject are results of a Master’s thesis created at SCHUTZWERK in collaboration with Aalen University by Philipp Schmied.