docs: expand on recommendations for CPU resource reservation (#25964)

Add some prescriptive guidance to the CPU concepts document around when to use
`resources.cores` vs `resources.cpu`. Extend some of the text to cover cgroups
v2.

Ref: https://hashicorp.atlassian.net/browse/NMD-297
Ref: https://go.hashi.co/rfc/nmd-211
Ref: https://github.com/hashicorp/nomad/pull/25963
This commit is contained in:
Tim Gross
2025-06-03 15:57:04 -04:00
committed by GitHub
parent ac31a3c629
commit 6c630c4bfa
2 changed files with 57 additions and 7 deletions

View File

@@ -106,8 +106,9 @@ available for scheduling of Nomad tasks.
When scheduling jobs, a Task must specify how much CPU resource should be
allocated on its behalf. This can be done in terms of bandwidth in MHz with the
`cpu` attribute. This MHz value is translated directly into [cpushares][] on
Linux systems.
`cpu` attribute. On Linux under cgroups v1, Nomad maps this MHz value directly
into a [cpu.share][]. On Linux under cgroups v2, Nomad converts the MHz value to
a [cpu.weight][] proportionally the same as the cgroups v1 `cpu.share`.
```hcl
task {
@@ -125,6 +126,42 @@ task drivers enable limiting a task to use only the amount of bandwidth
allocated to the task, described in the [CPU Hard Limits](#cpu-hard-limits)
section below.
### Relative CPU shares/weights on Linux
Linux cgroups are hierarchical, and the `cpu.share`/`cpu.weight` values reflect
relative weights within a given subtree. Nomad creates its own cgroup subtree
(`nomad.slice`) on startup, and all `cpu.share`/`cpu.weight` values that Nomad
writes are relative between processes within that slice. The `nomad.slice`
subtree is itself relative to another subtree on the host. For example, a host
running systemd might have the following slices:
```
/sys/fs/cgroup
├── nomad.slice
│ ├── reserve.slice
│ └── share.slice
│ ├── 912dcc05-61e1-53cb-5489-a976a1231960.task.scope
│ ├── 247e706a-6df8-4123-89b3-1bcf2846b503.task.scope
│ └── 586c0c58-3d50-4730-b4ad-022076d3c6a4.task.scope
├── system.slice
│ ├── journald.service
│ └── (various system services, etc.)
└── user-1000.slice
├── session-1.scope
└── (various user services, etc.)
```
If the task `912dcc05` has `resources.cpu = 1024` and tasks `247e706a` and
`586c0c58` have `resources.cpu = 512`, then `912dcc05` will get 50% of the CPU
resources available to the `nomad.slice` and tasks `247e706a` and `586c0c58`
will get 25% each. (The `reserve.slice` and `share.slice` are passthrough for
cpu shares here.) But together they'll get 33% of the total host's CPU resources
unless the `nomad.slice`, `system.slice`, or `user-1000.slice` have something
other than the default 1024 shares. The 1024 value is only meaningful within the
context of the Nomad slice.
### Allocating cores
On Linux systems, Nomad supports reserving whole CPU cores specifically for a
task. No task will be allowed to run on a CPU core reserved for another task.
@@ -136,6 +173,11 @@ task {
}
```
We recommend using `resources.cores` for tasks that require high CPU performance
to give those tasks exclusive access to CPU bandwidth. Sidecar tasks in the same
allocations can use `resources.cpu` to get a proportional share of the remaining
CPU on the node.
Nomad Enterprise supports NUMA aware scheduling, which enables operators to
more finely control which CPU cores may be reserved for tasks.
@@ -381,5 +423,6 @@ When running on a virtualized host such as Amazon EC2 Nomad makes use of the
require installing the `dmidecode` package manually.
[cpuset]: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/cpusets.html
[cpushares]: https://www.redhat.com/sysadmin/cgroups-part-two
[cpu.share]: https://www.redhat.com/sysadmin/cgroups-part-two
[cpu.weight]: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#weights
[numa_wiki]: https://en.wikipedia.org/wiki/Non-uniform_memory_access

View File

@@ -68,9 +68,10 @@ The following examples only show the `resources` blocks. Remember that the
### Cores
This example specifies that the task requires 2 reserved cores. With this block, Nomad will find
a client with enough spare capacity to reserve 2 cores exclusively for the task. Unlike the `cpu` field, the
task will not share cpu time with any other tasks managed by Nomad on the client.
This example specifies that the task requires 2 reserved cores. With this block,
Nomad finds a client with enough spare capacity to reserve 2 cores exclusively
for the task. Unlike the `cpu` field, the task does not share CPU time with any
other tasks managed by Nomad on the client.
```hcl
resources {
@@ -78,7 +79,12 @@ resources {
}
```
If `cores` and `cpu` are both defined in the same resource block, validation of the job will fail.
If `cores` and `cpu` are both defined in the same resource block, validation of
the job fails.
Refer to [How Nomad Uses CPU][concepts-cpu] for more details on Nomad's
reservation of CPU resources.
### Memory
@@ -160,3 +166,4 @@ resource utilization and considering the following suggestions:
[quota_spec]: /nomad/docs/other-specifications/quota
[numa]: /nomad/docs/job-specification/numa 'Nomad NUMA Job Specification'
[`secrets/`]: /nomad/docs/runtime/environment#secrets
[concepts-cpu]: /nomad/docs/concepts/cpu