docs: expand on allocation GC details (#26792)

Expand on the documentation of allocation garbage collection:
* Explain that server-side GC of allocations is tied to the GC of the
evaluation that spawned the allocation.
* Explain that server-side GC of allocations will force them to be immediately
GC'd on the client regardless of the client-side configurations.

Ref: https://github.com/hashicorp/nomad/issues/26765

Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
This commit is contained in:
Tim Gross
2025-09-19 12:17:17 -04:00
committed by GitHub
parent 377674f93e
commit b5530128df
3 changed files with 45 additions and 22 deletions

View File

@@ -154,17 +154,22 @@ client {
- `gc_interval` `(string: "1m")` - Specifies the interval at which Nomad
attempts to garbage collect terminal allocation directories.
- `gc_disk_usage_threshold` `(float: 80)` - Specifies the disk usage percent which
Nomad tries to maintain by garbage collecting terminal allocations.
- `gc_disk_usage_threshold` `(float: 80)` - Specifies the disk usage percent
which Nomad tries to maintain by garbage collecting terminal allocations. Note
that Nomad immediately garbage collects terminal allocations if garbage
collected on the server.
- `gc_inode_usage_threshold` `(float: 70)` - Specifies the inode usage percent
which Nomad tries to maintain by garbage collecting terminal allocations.
which Nomad tries to maintain by garbage collecting terminal allocations. Note
that Nomad immediately garbage collects terminal allocations if garbage
collected on the server.
- `gc_max_allocs` `(int: 50)` - Specifies the maximum number of allocations
which a client will track before triggering a garbage collection of terminal
allocations. This will _not_ limit the number of allocations a node can run at
a time, however after `gc_max_allocs` every new allocation will cause terminal
allocations to be GC'd.
allocations to be GC'd. Note that Nomad immediately garbage collects terminal
allocations if garbage collected on the server.
- `gc_parallel_destroys` `(int: 2)` - Specifies the maximum number of
parallel destroys allowed by the garbage collector. This value should be

View File

@@ -103,14 +103,21 @@ server {
- `eval_gc_threshold` `(string: "1h")` - Specifies the minimum time an
evaluation must be in the terminal state before it is eligible for garbage
collection. This is specified using a label suffix like "30s" or "1h". Note
that batch job evaluations are controlled via `batch_eval_gc_threshold`.
that batch job evaluations are controlled via
`batch_eval_gc_threshold`. Nomad garbage collects allocations with their
evaluations, so this field also controls server garbage collection of
allocations. Evaluations with non-terminal allocations cannot be garbage
collected.
- `batch_eval_gc_threshold` `(string: "24h")` - Specifies the minimum time an
evaluation stemming from a batch job must be in the terminal state before it is
eligible for garbage collection. This is specified using a label suffix like
"30s" or "1h". Note that the threshold is a necessary but insufficient condition
for collection, and the most recent evaluation won't be garbage collected even if
it breaches the threshold.
it breaches the threshold. Allocations are garbage collected with their
evaluations, so this field also controls server garbage collection of
allocations. Evaluations with non-terminal allocations cannot be garbage
collected.
- `deployment_gc_threshold` `(string: "1h")` - Specifies the minimum time a
deployment must be in the terminal state before it is eligible for garbage

View File

@@ -42,9 +42,11 @@ Nomad garbage collects those objects, either as part of the job garbage
collection process or by each object's own garbage collection processes running
immediately after. Nomad's scheduled garbage collection processes only garbage
collect objects after they are terminal for at least the specified time
threshold and no longer needed for future scheduling decisions. Note that when
you force garbage collection by running the `nomad system gc` command, Nomad
ignores the specified time threshold.
threshold and no longer needed for future scheduling decisions. This also means
that if a job is stopped it can't be garbage collected until its allocations are
terminal. Note that when you force garbage collection by running the `nomad
system gc` command, Nomad ignores the specified time threshold, but all
other conditions still apply.
## Server-side garbage collection
@@ -73,6 +75,11 @@ for objects without a configurable interval setting.
| **Node** | 5 minutes | [`node_gc_threshold`](/nomad/docs/configuration/server#node_gc_threshold)<br/>Default: 24 hours |
| **Volume** | [`csi_volume_claim_gc_interval`](/nomad/docs/configuration/server#csi_volume_claim_gc_interval)<br/>Default: 5 minutes| [`csi_volume_claim_gc_threshold`](/nomad/docs/configuration/server#csi_volume_claim_gc_threshold)<br/>Default: 1 hour |
Allocations do not have an independent threshold configuration. Because
specific evaluations create allocations, Nomad garbage collects allocations
when garbage collecting their evaluations. This also means that Nomad
cannot garbage collect evaluations until their allocations are terminal.
### Triggers
The server garbage collection processes wake up at configured intervals to scan
@@ -107,10 +114,10 @@ Refer to the [client block in agent configuration
reference](/nomad/docs/configuration/client) for complete parameter descriptions
and examples.
Note that there is no time-based retention setting for allocations. Unlike jobs
or evaluations, you cannot specify a time to keep allocations alive before
garbage collection. As soon as an allocation is terminal, it becomes eligible
for cleanup if the configured thresholds demand it.
Note that there is no time-based retention setting for allocations on the
client. As soon as an allocation is terminal, it becomes eligible for cleanup if
the configured thresholds demand it or if the allocation is garbage collected on
the server.
### Triggers
@@ -119,24 +126,28 @@ Nomad's client runs allocation garbage collection based on these triggers:
- Scheduled interval
The garbage collection process launches a ticker based on the configured
`gc_interval`. On each tick, the garbage collection process checks to see if it needs to remove terminal allocations.
`gc_interval`. On each tick, the garbage collection process checks to see if
it needs to remove terminal allocations.
- Terminal state
When an allocation transitions to a terminal state, Nomad marks
the allocation for garbage collection and then signals the garbage collection
process to run immediately.
When an allocation transitions to a terminal state, Nomad marks the allocation
for garbage collection and then signals the garbage collection process to run
immediately. As with the scheduled interval, this may be a no-op if the
thresholds have not been reached.
- Allocation placement
Nomad may preemptively run garbage collection to make room for new
allocations. The client garbage collects older, terminal allocations if adding new allocations would exceed the `gc_max_allocs` limit.
allocations. The client garbage collects older, terminal allocations if adding
new allocations would exceed the `gc_max_allocs` limit.
- Forced garbage collection
- Server garbage collection
When you force garbage collection by running the `nomad system gc` command,
the garbage collection process removes all terminal objects on all servers and
clients, ignoring thresholds.
When the server garbage collects an allocation or you force garbage collection
by running the `nomad system gc` command, the garbage collection process
removes all terminal objects on all servers and clients, ignoring all
threshold settings cluster-wide.
Nomad does not continuously monitor disk or inode usage to trigger garbage
collection. Instead, Nomad only checks disk and inode thresholds when one of the