mirror of
https://github.com/kemko/nomad.git
synced 2026-01-06 02:15:43 +03:00
docs: expand on recommendations for CPU resource reservation (#25964)
Add some prescriptive guidance to the CPU concepts document around when to use `resources.cores` vs `resources.cpu`. Extend some of the text to cover cgroups v2. Ref: https://hashicorp.atlassian.net/browse/NMD-297 Ref: https://go.hashi.co/rfc/nmd-211 Ref: https://github.com/hashicorp/nomad/pull/25963
This commit is contained in:
@@ -106,8 +106,9 @@ available for scheduling of Nomad tasks.
|
||||
|
||||
When scheduling jobs, a Task must specify how much CPU resource should be
|
||||
allocated on its behalf. This can be done in terms of bandwidth in MHz with the
|
||||
`cpu` attribute. This MHz value is translated directly into [cpushares][] on
|
||||
Linux systems.
|
||||
`cpu` attribute. On Linux under cgroups v1, Nomad maps this MHz value directly
|
||||
into a [cpu.share][]. On Linux under cgroups v2, Nomad converts the MHz value to
|
||||
a [cpu.weight][] proportionally the same as the cgroups v1 `cpu.share`.
|
||||
|
||||
```hcl
|
||||
task {
|
||||
@@ -125,6 +126,42 @@ task drivers enable limiting a task to use only the amount of bandwidth
|
||||
allocated to the task, described in the [CPU Hard Limits](#cpu-hard-limits)
|
||||
section below.
|
||||
|
||||
### Relative CPU shares/weights on Linux
|
||||
|
||||
Linux cgroups are hierarchical, and the `cpu.share`/`cpu.weight` values reflect
|
||||
relative weights within a given subtree. Nomad creates its own cgroup subtree
|
||||
(`nomad.slice`) on startup, and all `cpu.share`/`cpu.weight` values that Nomad
|
||||
writes are relative between processes within that slice. The `nomad.slice`
|
||||
subtree is itself relative to another subtree on the host. For example, a host
|
||||
running systemd might have the following slices:
|
||||
|
||||
```
|
||||
/sys/fs/cgroup
|
||||
├── nomad.slice
|
||||
│ ├── reserve.slice
|
||||
│ └── share.slice
|
||||
│ ├── 912dcc05-61e1-53cb-5489-a976a1231960.task.scope
|
||||
│ ├── 247e706a-6df8-4123-89b3-1bcf2846b503.task.scope
|
||||
│ └── 586c0c58-3d50-4730-b4ad-022076d3c6a4.task.scope
|
||||
├── system.slice
|
||||
│ ├── journald.service
|
||||
│ └── (various system services, etc.)
|
||||
└── user-1000.slice
|
||||
├── session-1.scope
|
||||
└── (various user services, etc.)
|
||||
```
|
||||
|
||||
If the task `912dcc05` has `resources.cpu = 1024` and tasks `247e706a` and
|
||||
`586c0c58` have `resources.cpu = 512`, then `912dcc05` will get 50% of the CPU
|
||||
resources available to the `nomad.slice` and tasks `247e706a` and `586c0c58`
|
||||
will get 25% each. (The `reserve.slice` and `share.slice` are passthrough for
|
||||
cpu shares here.) But together they'll get 33% of the total host's CPU resources
|
||||
unless the `nomad.slice`, `system.slice`, or `user-1000.slice` have something
|
||||
other than the default 1024 shares. The 1024 value is only meaningful within the
|
||||
context of the Nomad slice.
|
||||
|
||||
### Allocating cores
|
||||
|
||||
On Linux systems, Nomad supports reserving whole CPU cores specifically for a
|
||||
task. No task will be allowed to run on a CPU core reserved for another task.
|
||||
|
||||
@@ -136,6 +173,11 @@ task {
|
||||
}
|
||||
```
|
||||
|
||||
We recommend using `resources.cores` for tasks that require high CPU performance
|
||||
to give those tasks exclusive access to CPU bandwidth. Sidecar tasks in the same
|
||||
allocations can use `resources.cpu` to get a proportional share of the remaining
|
||||
CPU on the node.
|
||||
|
||||
Nomad Enterprise supports NUMA aware scheduling, which enables operators to
|
||||
more finely control which CPU cores may be reserved for tasks.
|
||||
|
||||
@@ -381,5 +423,6 @@ When running on a virtualized host such as Amazon EC2 Nomad makes use of the
|
||||
require installing the `dmidecode` package manually.
|
||||
|
||||
[cpuset]: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/cpusets.html
|
||||
[cpushares]: https://www.redhat.com/sysadmin/cgroups-part-two
|
||||
[cpu.share]: https://www.redhat.com/sysadmin/cgroups-part-two
|
||||
[cpu.weight]: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#weights
|
||||
[numa_wiki]: https://en.wikipedia.org/wiki/Non-uniform_memory_access
|
||||
|
||||
@@ -68,9 +68,10 @@ The following examples only show the `resources` blocks. Remember that the
|
||||
|
||||
### Cores
|
||||
|
||||
This example specifies that the task requires 2 reserved cores. With this block, Nomad will find
|
||||
a client with enough spare capacity to reserve 2 cores exclusively for the task. Unlike the `cpu` field, the
|
||||
task will not share cpu time with any other tasks managed by Nomad on the client.
|
||||
This example specifies that the task requires 2 reserved cores. With this block,
|
||||
Nomad finds a client with enough spare capacity to reserve 2 cores exclusively
|
||||
for the task. Unlike the `cpu` field, the task does not share CPU time with any
|
||||
other tasks managed by Nomad on the client.
|
||||
|
||||
```hcl
|
||||
resources {
|
||||
@@ -78,7 +79,12 @@ resources {
|
||||
}
|
||||
```
|
||||
|
||||
If `cores` and `cpu` are both defined in the same resource block, validation of the job will fail.
|
||||
If `cores` and `cpu` are both defined in the same resource block, validation of
|
||||
the job fails.
|
||||
|
||||
Refer to [How Nomad Uses CPU][concepts-cpu] for more details on Nomad's
|
||||
reservation of CPU resources.
|
||||
|
||||
|
||||
### Memory
|
||||
|
||||
@@ -160,3 +166,4 @@ resource utilization and considering the following suggestions:
|
||||
[quota_spec]: /nomad/docs/other-specifications/quota
|
||||
[numa]: /nomad/docs/job-specification/numa 'Nomad NUMA Job Specification'
|
||||
[`secrets/`]: /nomad/docs/runtime/environment#secrets
|
||||
[concepts-cpu]: /nomad/docs/concepts/cpu
|
||||
|
||||
Reference in New Issue
Block a user