Part nine of the Linux Container series

Linux Container Primitives: Memory, CPU, Freezer and Device Control Groups

After discussing the Network and Block I/O controllers, this post considers the Memory, CPU, Freezer and Device controllers. The following list shows the topics of all scheduled blog posts. It will be updated with the corresponding links once new posts are being released.

  1. An Introduction to Linux Containers
  2. Linux Capabilities
  3. An Introduction to Namespaces
  4. The Mount Namespace and a Description of a Related Information Leak in Docker
  5. The PID and Network Namespaces
  6. The User Namespace
  7. Namespaces Kernel View and Usage in Containerization
  8. An Introduction to Control Groups
  9. The Network and Block I/O Controllers
  10. The Memory, CPU, Freezer and Device Controllers
  11. Control Groups Kernel View and Usage in Containerization

The Memory Controller (v2)

Processes can be accounted and limited regarding their memory usage. The following types of memory usages are currently being tracked [1]:

  • Consumed memory in user-space
  • Memory usage in kernel-space, as in kernel data structures
  • TCP socket buffers

The memory consumed by a control group can be read using the memory.current file. One of the most common ways to limit memory usage includes setting the memory.max value to define an upper memory limit for all processes residing in a control group.

A cgroup is charged for its memory usage when allocating memory. In turn, this accounting gets removed once free() or similar mechanisms to free previously allocated memory are being used. When moving a process that has allocated memory in the name of a control group to another group, the original group is being charged for the allocated memory since these mappings are not being transferred.

The CPU Controller (v1/v2)

Various controllers exist for version 1 and 2 of the control group implementation to manage CPU utilization. There are three controllers for v1:

With cpu a control group is supplied with a guaranteed minimum time of CPU utilization. The cpuset controller allows specifying a set of processors a process is allowed to be executed on. For the current process this can be examined with cat /proc/$$/status | grep Cpus. Changes to this setting propagate to all descendants in the control group hierarchy. Accounting is performed with cpuacct, for example the file cpuacct.usage gives information on the consumed CPU time of all processes in a control group in nanoseconds.

Control group version 2 allows weight and absolute CPU limiting models with the cpu subsystem. In contrast to v1, the newer control group version does not support real-time processes. Therefore, all real-time processes have to be moved to the root control group first before activating the cpu v2 subsystem in the cgroup tree.

The Freezer Controller (v1)

This control group does not limit or account resource usage - it rather allows freezing a process. Freezing a process ultimately stops the execution and suspends it. This allows analyzing the current state of a process with the ability to unfreeze it afterwards and continue the execution without side effects. By creating a checkpoint with the freezer subsystem it’s also possible to move an entire running process, including its children, to another machine or restart a process from a specific state [2].

For this, the virtual file freezer.state exists that can receive either FROZEN or THAWED as input values to freeze and unfreeze a process. This works by walking down the control group hierarchy and marking all descendants of a process with the desired state. Additionally, all processes managed by the affected groups have to be moved in or out of the freezer group, depending on the desired freezing state. Freezing itself is done by sending a signal to the affected processes. Also, the freezer has to follow all child process of the affected processes that may result from calling fork and freeze these as well to prevent freeze escapes [3].

The Devices Controller (v1)

This controller type allows to implement access controls for devices. One can use whitelist and blacklist approaches to only block or allow very specific accesses by defining exceptions. Child control groups are forced to have the exact same or a subset of the exception list of the parent. This results in faster checks whether a rule can be added to the exception list because only the list of the child has to checked and not the whole group tree. This controller is one of the few that makes use of the hierarchical organization in order to pass configuration information to its child groups.

For the the following example, the devices controller will be used to restrict a process from accessing /dev/null.

To limit the usage of devices, their major and minor numbers have to be used. These numbers are the respective identifiers of a device in the filesystem tree. The major number describes the driver that’s required and is used by the kernel in order to access a specific device. The minor number is used by the device driver to distinguish logical and physical devices resulting from the existence of a certain device. In the above example for /dev/null these numbers can be identified using stat -c "major: %t minor: %T" /dev/null which yields the values 1 for major and 3 for minor.

First, a device control group has to be created with cgcreate -g devices:nodevnull with the identifier of the control group being nodevnull. To add the current shell process to this group, the command cgclassify -g devices:nodevnull $$ will be invoked. The process identifier of the current shell process is $$. To finally deny accessing the /dev/null device, this command will be executed: cgset -r devices.deny="c 1:3 rwm" nodevnull. The format of the parameter for devices.deny is as follows:

The device type is determined by using the first character of the output of ls -la /dev/null, which shows that it’s a character device.

Now accessing the specific device is being blocked, even for processes running as root user:

root@box:~# echo "a" > /dev/zero # Allowed
root@box:~# echo "a" > /dev/null # Denied
bash: /dev/null: Operation not permitted

Next post in series

  • The next post in this series ‘Control Groups Kernel View and Usage in Containerization’ will be published soon.
    Follow us on Twitter, LinkedIn, Xing to stay up-to-date.

Credits

Credits: The elaboration and software project associated to this subject are results of a Master’s thesis created at SCHUTZWERK in collaboration with Aalen University by Philipp Schmied.

References

Philipp Schmied