docs: expand on allocation GC details (#26792)

Expand on the documentation of allocation garbage collection: * Explain that server-side GC of allocations is tied to the GC of the evaluation that spawned the allocation. * Explain that server-side GC of allocations will force them to be immediately GC'd on the client regardless of the client-side configurations. Ref: https://github.com/hashicorp/nomad/issues/26765 Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com> Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2026-01-01 16:05:42 +03:00 · 2025-09-19 12:17:17 -04:00
parent 377674f93e
commit b5530128df
3 changed files with 45 additions and 22 deletions
--- a/website/content/docs/configuration/client.mdx
+++ b/website/content/docs/configuration/client.mdx
@@ -154,17 +154,22 @@ client {
 - `gc_interval` `(string: "1m")` - Specifies the interval at which Nomad
  attempts to garbage collect terminal allocation directories.

- `gc_disk_usage_threshold` `(float: 80)` - Specifies the disk usage percent which
-  Nomad tries to maintain by garbage collecting terminal allocations.
+- `gc_disk_usage_threshold` `(float: 80)` - Specifies the disk usage percent
+  which Nomad tries to maintain by garbage collecting terminal allocations. Note
+  that Nomad immediately garbage collects terminal allocations if garbage
+  collected on the server.

 - `gc_inode_usage_threshold` `(float: 70)` - Specifies the inode usage percent
-  which Nomad tries to maintain by garbage collecting terminal allocations.
+  which Nomad tries to maintain by garbage collecting terminal allocations. Note
+  that Nomad immediately garbage collects terminal allocations if garbage
+  collected on the server.

 - `gc_max_allocs` `(int: 50)` - Specifies the maximum number of allocations
  which a client will track before triggering a garbage collection of terminal
  allocations. This will _not_ limit the number of allocations a node can run at
  a time, however after `gc_max_allocs` every new allocation will cause terminal
-  allocations to be GC'd.
+  allocations to be GC'd. Note that Nomad immediately garbage collects terminal
+  allocations if garbage collected on the server.

 - `gc_parallel_destroys` `(int: 2)` - Specifies the maximum number of
  parallel destroys allowed by the garbage collector. This value should be
--- a/website/content/docs/configuration/server.mdx
+++ b/website/content/docs/configuration/server.mdx
@@ -103,14 +103,21 @@ server {
 - `eval_gc_threshold` `(string: "1h")` - Specifies the minimum time an
  evaluation must be in the terminal state before it is eligible for garbage
  collection. This is specified using a label suffix like "30s" or "1h". Note
-  that batch job evaluations are controlled via `batch_eval_gc_threshold`.
+  that batch job evaluations are controlled via
+  `batch_eval_gc_threshold`. Nomad garbage collects allocations with their
+  evaluations, so this field also controls server garbage collection of
+  allocations. Evaluations with non-terminal allocations cannot be garbage
+  collected.

 - `batch_eval_gc_threshold` `(string: "24h")` - Specifies the minimum time an
  evaluation stemming from a batch job must be in the terminal state before it is
  eligible for garbage collection. This is specified using a label suffix like
  "30s" or "1h". Note that the threshold is a necessary but insufficient condition
  for collection, and the most recent evaluation won't be garbage collected even if
-  it breaches the threshold.
+  it breaches the threshold. Allocations are garbage collected with their
+  evaluations, so this field also controls server garbage collection of
+  allocations. Evaluations with non-terminal allocations cannot be garbage
+  collected.

 - `deployment_gc_threshold` `(string: "1h")` - Specifies the minimum time a
  deployment must be in the terminal state before it is eligible for garbage
--- a/website/content/docs/manage/garbage-collection.mdx
+++ b/website/content/docs/manage/garbage-collection.mdx
@@ -42,9 +42,11 @@ Nomad garbage collects those objects, either as part of the job garbage
 collection process or by each object's own garbage collection processes running
 immediately after. Nomad's scheduled garbage collection processes only garbage
 collect objects after they are terminal for at least the specified time
-threshold and no longer needed for future scheduling decisions. Note that when
-you force garbage collection by running the `nomad system gc` command, Nomad
-ignores the specified time threshold.
+threshold and no longer needed for future scheduling decisions. This also means
+that if a job is stopped it can't be garbage collected until its allocations are
+terminal. Note that when you force garbage collection by running the `nomad
+system gc` command, Nomad ignores the specified time threshold, but all
+other conditions still apply.

 ## Server-side garbage collection

@@ -73,6 +75,11 @@ for objects without a configurable interval setting.
 | **Node** | 5 minutes | [`node_gc_threshold`](/nomad/docs/configuration/server#node_gc_threshold)<br/>Default: 24 hours |
 | **Volume** | [`csi_volume_claim_gc_interval`](/nomad/docs/configuration/server#csi_volume_claim_gc_interval)<br/>Default: 5 minutes| [`csi_volume_claim_gc_threshold`](/nomad/docs/configuration/server#csi_volume_claim_gc_threshold)<br/>Default: 1 hour |

+Allocations do not have an independent threshold configuration. Because
+specific evaluations create allocations, Nomad garbage collects allocations
+when garbage collecting their evaluations. This also means that Nomad
+cannot garbage collect evaluations until their allocations are terminal.
+
 ### Triggers

 The server garbage collection processes wake up at configured intervals to scan
@@ -107,10 +114,10 @@ Refer to the [client block in agent configuration
 reference](/nomad/docs/configuration/client) for complete parameter descriptions
 and examples.

-Note that there is no time-based retention setting for allocations. Unlike jobs
-or evaluations, you cannot specify a time to keep allocations alive before
-garbage collection. As soon as an allocation is terminal, it becomes eligible
-for cleanup if the configured thresholds demand it.
+Note that there is no time-based retention setting for allocations on the
+client. As soon as an allocation is terminal, it becomes eligible for cleanup if
+the configured thresholds demand it or if the allocation is garbage collected on
+the server.

 ### Triggers

@@ -119,24 +126,28 @@ Nomad's client runs allocation garbage collection based on these triggers:
 - Scheduled interval

  The garbage collection process launches a ticker based on the configured
-  `gc_interval`. On each tick, the garbage collection process checks to see if it needs to remove terminal allocations.
+  `gc_interval`. On each tick, the garbage collection process checks to see if
+  it needs to remove terminal allocations.

 - Terminal state

-  When an allocation transitions to a terminal state, Nomad marks
-  the allocation for garbage collection and then signals the garbage collection
-  process to run immediately.
+  When an allocation transitions to a terminal state, Nomad marks the allocation
+  for garbage collection and then signals the garbage collection process to run
+  immediately. As with the scheduled interval, this may be a no-op if the
+  thresholds have not been reached.

 - Allocation placement

  Nomad may preemptively run garbage collection to make room for new
-  allocations. The client garbage collects older, terminal allocations if adding new allocations would exceed the `gc_max_allocs` limit.
+  allocations. The client garbage collects older, terminal allocations if adding
+  new allocations would exceed the `gc_max_allocs` limit.

- Forced garbage collection
+- Server garbage collection

-  When you force garbage collection by running the `nomad system gc` command,
-  the garbage collection process removes all terminal objects on all servers and
-  clients, ignoring thresholds.
+  When the server garbage collects an allocation or you force garbage collection
+  by running the `nomad system gc` command, the garbage collection process
+  removes all terminal objects on all servers and clients, ignoring all
+  threshold settings cluster-wide.

 Nomad does not continuously monitor disk or inode usage to trigger garbage
 collection. Instead, Nomad only checks disk and inode thresholds when one of the