资源限制

默认情况下，容器没有资源限制，可以使用主机内核调度程序允许的尽可能多的资源。Docker 提供了控制容器可以使用多少内存或 CPU 的方法，通过设置 docker run 命令的运行时配置标志。本节详细介绍了何时应设置此类限制以及设置它们可能带来的影响。

许多这些功能需要你的内核支持Linux能力。要检查支持情况，你可以使用docker info命令。如果你的内核中某个能力被禁用，你可能会在输出的末尾看到如下警告：

WARNING: No swap limit support

请查阅您的操作系统文档以启用它们。更多信息请参见 Docker Engine 故障排除指南。

了解内存不足的风险

重要的是不要让正在运行的容器消耗太多主机内存。在Linux主机上，如果内核检测到没有足够的内存来执行重要的系统功能，它会抛出一个OOME，即内存不足异常，并开始杀死进程以释放内存。任何进程都可能被杀死，包括Docker和其他重要的应用程序。如果杀错了进程，这可能会有效地导致整个系统崩溃。

Docker 试图通过调整 Docker 守护进程的 OOM 优先级来减轻这些风险，使其比系统上的其他进程更不容易被杀死。容器的 OOM 优先级不会被调整。这使得单个容器比 Docker 守护进程或其他系统进程更容易被杀死。你不应该通过在守护进程或容器上手动将 --oom-score-adj 设置为极端的负值，或在容器上设置 --oom-kill-disable 来规避这些保护措施。

有关Linux内核的OOM管理的更多信息，请参阅内存不足管理。

您可以通过以下方式减轻由于OOME导致的系统不稳定的风险：

在将应用程序投入生产之前，执行测试以了解其内存需求。
确保您的应用程序仅在具有足够资源的主机上运行。
限制容器可以使用的内存量，如下所述。
在配置Docker主机的交换空间时要小心。交换空间比内存慢，但可以提供缓冲，防止系统内存耗尽。
考虑将您的容器转换为服务，并使用服务级别的约束和节点标签来确保应用程序仅在具有足够内存的主机上运行

Option	Description
`-m` or `--memory=`	The maximum amount of memory the container can use. If you set this option, the minimum allowed value is `6m` (6 megabytes). That is, you must set the value to at least 6 megabytes.
`--memory-swap`*	The amount of memory this container is allowed to swap to disk. See `--memory-swap` 详情.
`--memory-swappiness`	By default, the host kernel can swap out a percentage of anonymous pages used by a container. You can set `--memory-swappiness` to a value between 0 and 100, to tune this percentage. See `--memory-swappiness` 详情.
`--memory-reservation`	Allows you to specify a soft limit smaller than `--memory` which is activated when Docker detects contention or low memory on the host machine. If you use `--memory-reservation`, it must be set lower than `--memory` for it to take precedence. Because it is a soft limit, it doesn't guarantee that the container doesn't exceed the limit.
`--kernel-memory`	The maximum amount of kernel memory the container can use. The minimum allowed value is `6m`. Because kernel memory can't be swapped out, a container which is starved of kernel memory may block host machine resources, which can have side effects on the host machine and on other containers. See `--kernel-memory` 详情.
`--oom-kill-disable`	By default, if an out-of-memory (OOM) error occurs, the kernel kills processes in a container. To change this behavior, use the `--oom-kill-disable` option. Only disable the OOM killer on containers where you have also set the `-m/--memory` option. If the `-m` flag isn't set, the host can run out of memory and the kernel may need to kill the host system's processes to free memory.

`--memory-swap` 详情

--memory-swap 是一个修饰符标志，只有在 --memory 也被设置时才有意义。使用交换空间允许容器在耗尽所有可用 RAM 时将多余的内存需求写入磁盘。对于经常将内存交换到磁盘的应用程序，会有性能损失。

它的设置可能会产生复杂的影响：

如果 --memory-swap 设置为正整数，则必须同时设置 --memory 和 --memory-swap。--memory-swap 表示可以使用的内存和交换空间的总量，而 --memory 控制非交换内存的使用量。因此，如果 --memory="300m" 和 --memory-swap="1g"，容器可以使用 300m 的内存和 700m (1g - 300m) 的交换空间。
如果 --memory-swap 设置为 0，则该设置将被忽略，并且该值被视为未设置。
如果 --memory-swap 设置为与 --memory 相同的值，并且 --memory 设置为正整数，容器将无法访问交换空间。请参阅防止容器使用交换空间。
如果--memory-swap未设置，并且--memory已设置，容器可以使用与--memory设置一样多的交换空间，前提是主机容器配置了交换内存。例如，如果--memory="300m"且--memory-swap未设置，容器总共可以使用600m的内存和交换空间。
如果 --memory-swap 被明确设置为 -1，则允许容器使用无限制的交换空间，最多可达主机系统上可用的数量。
在容器内部，像free这样的工具报告的是主机的可用交换空间，而不是容器内部的可用交换空间。不要依赖free或类似工具的输出来确定是否存在交换空间。

防止容器使用交换空间

如果--memory和--memory-swap设置为相同的值，这将阻止容器使用任何交换空间。这是因为--memory-swap是可以使用的内存和交换空间的总量，而--memory只是可以使用的物理内存量。

`--memory-swappiness` 详情

值为0时关闭匿名页面交换。
值为100时，将所有匿名页面设置为可交换。
默认情况下，如果您不设置--memory-swappiness，该值将从主机继承。

`--kernel-memory` 详情

内核内存限制是根据分配给容器的总内存来表示的。考虑以下场景：

无限内存，无限内核内存：这是默认行为。
无限内存，有限内核内存：当所有cgroups所需的内存量大于主机上实际存在的内存量时，这是合适的。您可以配置内核内存，使其永远不会超过主机上可用的内存量，而需要更多内存的容器则需要等待。
有限的内存，无限的内核内存: 总体内存是有限的，但内核内存不是。
有限的内存，有限的内核内存：限制用户和内核内存对于调试与内存相关的问题非常有用。如果容器使用了意外数量的任何一种内存，它会在不影响其他容器或主机的情况下耗尽内存。在此设置中，如果内核内存限制低于用户内存限制，耗尽内核内存会导致容器遇到OOM错误。如果内核内存限制高于用户内存限制，内核限制不会导致容器遇到OOM。

当你启用内核内存限制时，主机会以每个进程为基础跟踪“高水位线”统计信息，因此你可以跟踪哪些进程（在这种情况下是容器）正在使用过多的内存。这可以通过在主机上查看/proc//status来查看每个进程的情况。

CPU

默认情况下，每个容器对主机CPU周期的访问是无限的。您可以设置各种约束来限制给定容器对主机CPU周期的访问。大多数用户使用并配置默认的CFS调度器。您也可以配置实时调度器。

配置默认的CFS调度器

CFS 是 Linux 内核中用于普通 Linux 进程的 CPU 调度器。多个运行时标志允许你配置容器对 CPU 资源的访问量。当你使用这些设置时，Docker 会修改主机上容器的 cgroup 设置。

Option	Description
`--cpus=<value>`	Specify how much of the available CPU resources a container can use. For instance, if the host machine has two CPUs and you set `--cpus="1.5"`, the container is guaranteed at most one and a half of the CPUs. This is the equivalent of setting `--cpu-period="100000"` and `--cpu-quota="150000"`.
`--cpu-period=<value>`	Specify the CPU CFS scheduler period, which is used alongside `--cpu-quota`. Defaults to 100000 microseconds (100 milliseconds). Most users don't change this from the default. For most use-cases, `--cpus` is a more convenient alternative.
`--cpu-quota=<value>`	Impose a CPU CFS quota on the container. The number of microseconds per `--cpu-period` that the container is limited to before throttled. As such acting as the effective ceiling. For most use-cases, `--cpus` is a more convenient alternative.
`--cpuset-cpus`	Limit the specific CPUs or cores a container can use. A comma-separated list or hyphen-separated range of CPUs a container can use, if you have more than one CPU. The first CPU is numbered 0. A valid value might be `0-3` (to use the first, second, third, and fourth CPU) or `1,3` (to use the second and fourth CPU).
`--cpu-shares`	Set this flag to a value greater or less than the default of 1024 to increase or reduce the container's weight, and give it access to a greater or lesser proportion of the host machine's CPU cycles. This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit. `--cpu-shares` doesn't prevent containers from being scheduled in Swarm mode. It prioritizes container CPU resources for the available CPU cycles. It doesn't guarantee or reserve any specific CPU access.

如果你有1个CPU，以下每个命令每秒保证容器最多使用50%的CPU。

$ docker run -it --cpus=".5" ubuntu /bin/bash

这相当于手动指定--cpu-period和--cpu-quota；

$ docker run -it --cpu-period=100000 --cpu-quota=50000 ubuntu /bin/bash

配置实时调度器

您可以为不能使用CFS调度器的任务配置容器以使用实时调度器。在您可以确保主机的内核配置正确之前，您需要配置Docker守护进程或配置单个容器。

警告
CPU调度和优先级是高级的内核级功能。大多数用户不需要从默认值更改这些设置。错误地设置这些值可能会导致您的主机系统变得不稳定或无法使用。

配置主机的内核

通过运行zcat /proc/config.gz | grep CONFIG_RT_GROUP_SCHED或检查文件/sys/fs/cgroup/cpu.rt_runtime_us的存在来验证Linux内核中是否启用了CONFIG_RT_GROUP_SCHED。有关配置内核实时调度器的指导，请参阅操作系统的文档。

要使用实时调度程序运行容器，请运行Docker守护进程，并将--cpu-rt-runtime标志设置为每个运行周期为实时任务保留的最大微秒数。例如，使用默认的1000000微秒（1秒）周期，设置--cpu-rt-runtime=950000可确保使用实时调度程序的容器每1000000微秒周期可以运行950000微秒，至少留下50000微秒用于非实时任务。要在使用systemd的系统上使此配置永久生效，请为docker服务创建一个systemd单元文件。例如，请参阅有关如何使用systemd单元文件配置守护进程以使用代理的说明。

配置单个容器

您可以在使用docker run启动容器时传递几个标志来控制容器的CPU优先级。请查阅您的操作系统文档或ulimit命令以获取有关适当值的信息。

Option	Description
`--cap-add=sys_nice`	Grants the container the `CAP_SYS_NICE` capability, which allows the container to raise process `nice` values, set real-time scheduling policies, set CPU affinity, and other operations.
`--cpu-rt-runtime=<value>`	The maximum number of microseconds the container can run at real-time priority within the Docker daemon's real-time scheduler period. You also need the `--cap-add=sys_nice` flag.
`--ulimit rtprio=<value>`	The maximum real-time priority allowed for the container. You also need the `--cap-add=sys_nice` flag.

以下示例命令在debian:jessie容器上设置了这三个标志。

$ docker run -it \
    --cpu-rt-runtime=950000 \
    --ulimit rtprio=99 \
    --cap-add=sys_nice \
    debian:jessie

如果内核或Docker守护程序未正确配置，则会发生错误。

$ docker run -it --rm --gpus all ubuntu nvidia-smi

暴露所有可用的GPU并返回类似于以下结果：

+-------------------------------------------------------------------------------+
| NVIDIA-SMI 384.130            	Driver Version: 384.130               	|
|-------------------------------+----------------------+------------------------+
| GPU  Name 	   Persistence-M| Bus-Id    	Disp.A | Volatile Uncorr. ECC   |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M.   |
|===============================+======================+========================|
|   0  GRID K520       	Off  | 00000000:00:03.0 Off |                  N/A      |
| N/A   36C	P0    39W / 125W |  	0MiB /  4036MiB |      0%  	Default |
+-------------------------------+----------------------+------------------------+
+-------------------------------------------------------------------------------+
| Processes:                                                       GPU Memory   |
|  GPU   	PID   Type   Process name                         	Usage  	|
|===============================================================================|
|  No running processes found                                                   |
+-------------------------------------------------------------------------------+

使用device选项来指定GPU。例如：

$ docker run -it --rm --gpus device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a ubuntu nvidia-smi

暴露特定的GPU。

$ docker run -it --rm --gpus '"device=0,2"' ubuntu nvidia-smi

暴露第一个和第三个GPU。

注意
NVIDIA GPU只能由运行单一引擎的系统访问。

设置NVIDIA功能

您可以手动设置功能。例如，在Ubuntu上，您可以运行以下命令：

$ docker run --gpus 'all,capabilities=utility' --rm ubuntu nvidia-smi

这启用了utility驱动程序功能，该功能将nvidia-smi工具添加到容器中。

能力以及其他配置可以通过环境变量在镜像中设置。有关有效变量的更多信息可以在nvidia-container-toolkit文档中找到。这些变量可以在Dockerfile中设置。

你也可以使用自动设置这些变量的CUDA镜像。参见官方的 CUDA镜像 NGC目录页面。

资源限制

内存

了解内存不足的风险

限制容器对内存的访问

`--memory-swap` 详情

防止容器使用交换空间

`--memory-swappiness` 详情

`--kernel-memory` 详情

CPU

配置默认的CFS调度器

配置实时调度器

配置主机的内核

配置 Docker 守护进程

配置单个容器

GPU

访问NVIDIA GPU

先决条件

安装 nvidia-container-toolkit

暴露GPU以供使用

设置NVIDIA功能