mirror of
https://github.com/kemko/nomad.git
synced 2026-01-01 16:05:42 +03:00
docs: expand on allocation GC details (#26792)
Expand on the documentation of allocation garbage collection: * Explain that server-side GC of allocations is tied to the GC of the evaluation that spawned the allocation. * Explain that server-side GC of allocations will force them to be immediately GC'd on the client regardless of the client-side configurations. Ref: https://github.com/hashicorp/nomad/issues/26765 Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com> Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
This commit is contained in:
@@ -154,17 +154,22 @@ client {
|
||||
- `gc_interval` `(string: "1m")` - Specifies the interval at which Nomad
|
||||
attempts to garbage collect terminal allocation directories.
|
||||
|
||||
- `gc_disk_usage_threshold` `(float: 80)` - Specifies the disk usage percent which
|
||||
Nomad tries to maintain by garbage collecting terminal allocations.
|
||||
- `gc_disk_usage_threshold` `(float: 80)` - Specifies the disk usage percent
|
||||
which Nomad tries to maintain by garbage collecting terminal allocations. Note
|
||||
that Nomad immediately garbage collects terminal allocations if garbage
|
||||
collected on the server.
|
||||
|
||||
- `gc_inode_usage_threshold` `(float: 70)` - Specifies the inode usage percent
|
||||
which Nomad tries to maintain by garbage collecting terminal allocations.
|
||||
which Nomad tries to maintain by garbage collecting terminal allocations. Note
|
||||
that Nomad immediately garbage collects terminal allocations if garbage
|
||||
collected on the server.
|
||||
|
||||
- `gc_max_allocs` `(int: 50)` - Specifies the maximum number of allocations
|
||||
which a client will track before triggering a garbage collection of terminal
|
||||
allocations. This will _not_ limit the number of allocations a node can run at
|
||||
a time, however after `gc_max_allocs` every new allocation will cause terminal
|
||||
allocations to be GC'd.
|
||||
allocations to be GC'd. Note that Nomad immediately garbage collects terminal
|
||||
allocations if garbage collected on the server.
|
||||
|
||||
- `gc_parallel_destroys` `(int: 2)` - Specifies the maximum number of
|
||||
parallel destroys allowed by the garbage collector. This value should be
|
||||
|
||||
@@ -103,14 +103,21 @@ server {
|
||||
- `eval_gc_threshold` `(string: "1h")` - Specifies the minimum time an
|
||||
evaluation must be in the terminal state before it is eligible for garbage
|
||||
collection. This is specified using a label suffix like "30s" or "1h". Note
|
||||
that batch job evaluations are controlled via `batch_eval_gc_threshold`.
|
||||
that batch job evaluations are controlled via
|
||||
`batch_eval_gc_threshold`. Nomad garbage collects allocations with their
|
||||
evaluations, so this field also controls server garbage collection of
|
||||
allocations. Evaluations with non-terminal allocations cannot be garbage
|
||||
collected.
|
||||
|
||||
- `batch_eval_gc_threshold` `(string: "24h")` - Specifies the minimum time an
|
||||
evaluation stemming from a batch job must be in the terminal state before it is
|
||||
eligible for garbage collection. This is specified using a label suffix like
|
||||
"30s" or "1h". Note that the threshold is a necessary but insufficient condition
|
||||
for collection, and the most recent evaluation won't be garbage collected even if
|
||||
it breaches the threshold.
|
||||
it breaches the threshold. Allocations are garbage collected with their
|
||||
evaluations, so this field also controls server garbage collection of
|
||||
allocations. Evaluations with non-terminal allocations cannot be garbage
|
||||
collected.
|
||||
|
||||
- `deployment_gc_threshold` `(string: "1h")` - Specifies the minimum time a
|
||||
deployment must be in the terminal state before it is eligible for garbage
|
||||
|
||||
@@ -42,9 +42,11 @@ Nomad garbage collects those objects, either as part of the job garbage
|
||||
collection process or by each object's own garbage collection processes running
|
||||
immediately after. Nomad's scheduled garbage collection processes only garbage
|
||||
collect objects after they are terminal for at least the specified time
|
||||
threshold and no longer needed for future scheduling decisions. Note that when
|
||||
you force garbage collection by running the `nomad system gc` command, Nomad
|
||||
ignores the specified time threshold.
|
||||
threshold and no longer needed for future scheduling decisions. This also means
|
||||
that if a job is stopped it can't be garbage collected until its allocations are
|
||||
terminal. Note that when you force garbage collection by running the `nomad
|
||||
system gc` command, Nomad ignores the specified time threshold, but all
|
||||
other conditions still apply.
|
||||
|
||||
## Server-side garbage collection
|
||||
|
||||
@@ -73,6 +75,11 @@ for objects without a configurable interval setting.
|
||||
| **Node** | 5 minutes | [`node_gc_threshold`](/nomad/docs/configuration/server#node_gc_threshold)<br/>Default: 24 hours |
|
||||
| **Volume** | [`csi_volume_claim_gc_interval`](/nomad/docs/configuration/server#csi_volume_claim_gc_interval)<br/>Default: 5 minutes| [`csi_volume_claim_gc_threshold`](/nomad/docs/configuration/server#csi_volume_claim_gc_threshold)<br/>Default: 1 hour |
|
||||
|
||||
Allocations do not have an independent threshold configuration. Because
|
||||
specific evaluations create allocations, Nomad garbage collects allocations
|
||||
when garbage collecting their evaluations. This also means that Nomad
|
||||
cannot garbage collect evaluations until their allocations are terminal.
|
||||
|
||||
### Triggers
|
||||
|
||||
The server garbage collection processes wake up at configured intervals to scan
|
||||
@@ -107,10 +114,10 @@ Refer to the [client block in agent configuration
|
||||
reference](/nomad/docs/configuration/client) for complete parameter descriptions
|
||||
and examples.
|
||||
|
||||
Note that there is no time-based retention setting for allocations. Unlike jobs
|
||||
or evaluations, you cannot specify a time to keep allocations alive before
|
||||
garbage collection. As soon as an allocation is terminal, it becomes eligible
|
||||
for cleanup if the configured thresholds demand it.
|
||||
Note that there is no time-based retention setting for allocations on the
|
||||
client. As soon as an allocation is terminal, it becomes eligible for cleanup if
|
||||
the configured thresholds demand it or if the allocation is garbage collected on
|
||||
the server.
|
||||
|
||||
### Triggers
|
||||
|
||||
@@ -119,24 +126,28 @@ Nomad's client runs allocation garbage collection based on these triggers:
|
||||
- Scheduled interval
|
||||
|
||||
The garbage collection process launches a ticker based on the configured
|
||||
`gc_interval`. On each tick, the garbage collection process checks to see if it needs to remove terminal allocations.
|
||||
`gc_interval`. On each tick, the garbage collection process checks to see if
|
||||
it needs to remove terminal allocations.
|
||||
|
||||
- Terminal state
|
||||
|
||||
When an allocation transitions to a terminal state, Nomad marks
|
||||
the allocation for garbage collection and then signals the garbage collection
|
||||
process to run immediately.
|
||||
When an allocation transitions to a terminal state, Nomad marks the allocation
|
||||
for garbage collection and then signals the garbage collection process to run
|
||||
immediately. As with the scheduled interval, this may be a no-op if the
|
||||
thresholds have not been reached.
|
||||
|
||||
- Allocation placement
|
||||
|
||||
Nomad may preemptively run garbage collection to make room for new
|
||||
allocations. The client garbage collects older, terminal allocations if adding new allocations would exceed the `gc_max_allocs` limit.
|
||||
allocations. The client garbage collects older, terminal allocations if adding
|
||||
new allocations would exceed the `gc_max_allocs` limit.
|
||||
|
||||
- Forced garbage collection
|
||||
- Server garbage collection
|
||||
|
||||
When you force garbage collection by running the `nomad system gc` command,
|
||||
the garbage collection process removes all terminal objects on all servers and
|
||||
clients, ignoring thresholds.
|
||||
When the server garbage collects an allocation or you force garbage collection
|
||||
by running the `nomad system gc` command, the garbage collection process
|
||||
removes all terminal objects on all servers and clients, ignoring all
|
||||
threshold settings cluster-wide.
|
||||
|
||||
Nomad does not continuously monitor disk or inode usage to trigger garbage
|
||||
collection. Instead, Nomad only checks disk and inode thresholds when one of the
|
||||
|
||||
Reference in New Issue
Block a user