Add docs for disconnected block (#20147)

Expand the job settings to include the disconnect block and set as deprecated the fields that will be replaced by it.
2026-01-01 16:05:42 +03:00 · 2024-03-20 10:08:16 +01:00
parent 7b27bc344b
commit 56bf253474
6 changed files with 286 additions and 18 deletions
--- a/contributing/architecture-eval-triggers.md
+++ b/contributing/architecture-eval-triggers.md
@@ -75,7 +75,7 @@ The list below covers each trigger and what can trigger it.
 * **job-scaling**: Scaling a Job will result in 1 Evaluation created, plus any
  follow-up Evaluations associated with scheduling, planning, or deployments.
 * **max-disconnect-timeout**: When an Allocation is in the `unknown` state for
-  longer than the [`max_client_disconnect`][] window, the scheduler will create
+  longer than the [`disconnect.lost_after`][] window, the scheduler will create
  1 Evaluation.
 * **reconnect**: When a Node in the `disconnected` state reconnects, Nomad will
  create 1 Evaluation per job with an allocation on the reconnected Node.
@@ -256,4 +256,4 @@ and eventually need to be garbage collected.
 [`structs.go`]: https://github.com/hashicorp/nomad/blob/v1.4.0-beta.1/nomad/structs/structs.go#L10857-L10875
 [`update`]: https://developer.hashicorp.com/nomad/docs/job-specification/update
 [`restart` attempts]: https://developer.hashicorp.com/nomad/docs/job-specification/restart
-[`max_client_disconnect`]: https://developer.hashicorp.com/nomad/docs/job-specification/group#max-client-disconnect
+[`disconnect.lost_after`]: https://developer.hashicorp.com/nomad/docs/job-specification/disconnect#lost_after
--- a/website/content/docs/configuration/server.mdx
+++ b/website/content/docs/configuration/server.mdx
@@ -424,7 +424,7 @@ server {
 Nomad Clients periodically heartbeat to Nomad Servers to confirm they are
 operating as expected. Nomad Clients which do not heartbeat in the specified
 amount of time are considered `down` and their allocations are marked as `lost`
-or `disconnected` (if [`max_client_disconnect`][max_client_disconnect] is set)
+or `disconnected` (if [`disconnect.lost_after`][disconnect.lost_after] is set)
 and rescheduled.

 The various heartbeat related parameters allow you to tune the following
@@ -509,6 +509,6 @@ work.
 [`nomad operator gossip keyring generate`]: /nomad/docs/commands/operator/gossip/keyring-generate
 [search]: /nomad/docs/configuration/search
 [encryption key]: /nomad/docs/operations/key-management
-[max_client_disconnect]: /nomad/docs/job-specification/group#max-client-disconnect
+[disconnect.lost_after]: /nomad/docs/job-specification/disconnect#lost_after
 [herd]: https://en.wikipedia.org/wiki/Thundering_herd_problem
 [wi]: /nomad/docs/concepts/workload-identity
--- a/website/content/docs/job-specification/disconnect.mdx
+++ b/website/content/docs/job-specification/disconnect.mdx
@@ -0,0 +1,201 @@
+---
+layout: docs
+page_title: disconnect Block - Job Specification
+description: |-
+  The "disconnect" block describes the behavior of both the Nomad server and 
+  client in case of a network partition, as well as how to reconcile the workloads
+  in case of a reconnection.
+---
+
+# `disconnect` Block
+
+<Placement groups={['job', 'group', 'disconnect']} />
+
+The `disconnect` block describes the system's behavior in case of a network 
+partition. By default, without a `disconnect` block, if an allocation is on a 
+node that misses heartbeats, the allocation will be marked `lost` and will be 
+rescheduled.
+
+```hcl
+ job "docs" {
+  	    group "example" {
+    	    disconnect {
+		        lost_after = "6h"
+		        stop_after = "2h"
+		        replace = false
+                reconcile = "keep_original"
+     	    }
+  	    }
+    }
+```
+
+## `disconnect` Parameters
+
+- `lost_after` `(string: "")` - Specifies a duration during which a Nomad client
+  will attempt to reconnect allocations after it fails to heartbeat
+  in the [`heartbeat_grace`][] window.  It defaults to "" which is equivalent to 
+  having the disconnect block be nil.
+  
+  See [the example code below][lost_after] for more details. This setting cannot
+  be used with [`stop_after`].
+
+- `replace` `(bool: false)` - Specifies if the disconnected allocation should 
+  be replaced by a new one rescheduled on a different node. If false and the 
+  node it is running on becomes disconnected or goes down, this allocation
+  won't be rescheduled and will be reported as `unknown` until the node reconnects, 
+  or until the allocation is manually stopped:
+
+  ```plaintext
+  `nomad alloc stop  <alloc ID>`
+  ```
+
+  If true, a new alloc will be placed immediately upon the node becoming 
+  disconnected.
+
+- `stop_after` `(string: "")` - Specifies a duration after which a disconnected 
+  Nomad client will stop its allocations. Setting `stop_after` shorter than 
+  `lost_after` and `replace = false` at the same time is not permitted and 
+  will cause a validation error, because this would lead to a state where no 
+  allocations can be scheduled.
+
+  The Nomad client process must be running for this to occur. This setting
+  cannot be used with [`lost_after`].
+
+- `reconcile` `(string: "best_score")` - Specifies which allocation to keep once
+  the previously disconnected node regains connectivity.
+  It has four possible values which are described below:
+
+    - `keep_original`: Always keep the original allocation. Bear in mind 
+    when choosing this option, it can have crashed while the client was 
+    disconnected.
+    - `keep_replacement`: Always keep the allocation that was rescheduled 
+    to replace the disconnected one.
+    - `best_score`: Keep the allocation running on the node with the best 
+    score.
+    - `longest_running`: Keep the allocation that has been up and running 
+    continuously for the longest time.
+
+
+## `disconnect` Examples
+
+The following examples only show the `disconnect` blocks. Remember that the
+`disconnect` block is only valid in the placements listed above.
+
+### Stop After
+
+This example shows how `stop_after` interacts with
+other blocks. For the `first` group, after the default 10 second
+[`heartbeat_grace`] window expires and 90 more seconds passes, the
+server will reschedule the allocation. The client will wait 90 seconds
+before sending a stop signal (`SIGTERM`) to the `first-task`
+task. After 15 more seconds because of the task's `kill_timeout`, the
+client will send `SIGKILL`. The `second` group does not have
+`stop_after`, so the server will reschedule the
+allocation after the 10 second [`heartbeat_grace`] expires. It will
+not be stopped on the client, regardless of how long the client is out
+of touch.
+
+Note that if the server's clocks are not closely synchronized with
+each other, the server may reschedule the group before the client has
+stopped the allocation. Operators should ensure that clock drift
+between servers is as small as possible.
+
+Note also that a group using this feature will be stopped on the
+client if the Nomad server cluster fails, since the client will be
+unable to contact any server in that case. Groups opting in to this
+feature are therefore exposed to an additional runtime dependency and
+potential point of failure.
+
+```hcl
+group "first" {
+  stop_after_client_disconnect = "90s"
+
+  task "first-task" {
+    kill_timeout = "15s"
+  }
+}
+
+group "second" {
+
+  task "second-task" {
+    kill_timeout = "5s"
+  }
+}
+```
+
+### Lost After
+
+By default, allocations running on a client that fails to heartbeat will be
+marked "lost". When a client reconnects, its allocations, which may still be
+healthy, will restart because they have been marked "lost". This can cause
+issues with stateful tasks or tasks with long restart times.
+
+Instead, an operator may desire that these allocations reconnect without a
+restart. When `lost_after` is specified, the Nomad server will mark
+clients that fail to heartbeat as "disconnected" rather than "down", and will
+mark allocations on a disconnected client as "unknown" rather than "lost".
+These allocations may continue to run on the disconnected client. Replacement
+allocations will be scheduled according to the allocations' `replace` settings
+until the disconnected client reconnects. Once a disconnected client reconnects,
+Nomad will compare the "unknown" allocations with their replacements will 
+decide which ones to keep according to the `reconcile` setting. 
+If the `lost_after` duration expires before the client reconnects, 
+the allocations will be marked "lost". Clients that contain "unknown" 
+allocations will transition to "disconnected" rather than "down" until the last
+`lost_after` duration has expired.
+
+In the example code below, if both of these task groups were placed on the same
+client and that client experienced a network outage, both of the group's
+allocations would be marked as "disconnected" at two minutes because of the
+client's `heartbeat_grace` value of "2m". If the network outage continued for
+eight hours, and the client continued to fail to heartbeat, the client would
+remain in a "disconnected" state, as the first group's `lost_after`
+is twelve hours. Once all groups' `lost_after` durations are
+exceeded, in this case in twelve hours, the client node will be marked as "down"
+and the allocation will be marked as "lost". If the client had reconnected
+before twelve hours had passed, the allocations would gracefully reconnect
+using the strategy defined by [`reconcile`].
+
+Lost After is useful for edge deployments, or scenarios when
+operators want zero on-client downtime due to node connectivity issues. This
+setting cannot be used with [`stop_after`].
+
+```hcl
+# server_config.hcl
+
+server {
+  enabled         = true
+  heartbeat_grace = "2m"
+}
+```
+
+```hcl
+# jobspec.nomad
+
+group "first" {
+  disconnect {
+    lost_after = "12h"
+    reconcile = "best_score"
+  }
+
+  task "first-task" {
+    ...
+  }
+}
+
+group "second" {
+  disconnect {
+    lost_after = "12h"
+    reconcile = "keep_original"
+  }
+
+  task "second-task" {
+    ...
+  }
+}
+```
+
+[`heartbeat_grace`]: /nomad/docs/configuration/server#heartbeat_grace
+[`stop_after`]: /nomad/docs/job-specification/disconnect#stop_after
+[`lost_after`]: /nomad/docs/job-specification/disconnect#replace_after
+[`reconcile`]: /nomad/docs/job-specification/disconnect#reconcile
--- a/website/content/docs/job-specification/group.mdx
+++ b/website/content/docs/job-specification/group.mdx
@@ -48,6 +48,12 @@ job "docs" {
  ephemeral disk requirements of the group. Ephemeral disks can be marked as
  sticky and support live data migrations.

+- `disconnect` <code>([disconnect][]: nil)</code> - Specifies the disconnect 
+  strategy for the server and client for all tasks in this group in case of a 
+  network partition. The tasks can be left unconnected, stopped or replaced 
+  when the client disconnects. The policy for reconciliation in case the client
+  regains connectivity is also specified here.
+
 - `meta` <code>([Meta][]: nil)</code> - Specifies a key-value map that annotates
  with user-defined metadata.

@@ -59,10 +65,6 @@ job "docs" {
  requirements and configuration, including static and dynamic port allocations,
  for the group.

- `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a
-  rescheduling strategy. Nomad will then attempt to schedule the task on another
-  node if any of the group allocation statuses become "failed".
-
 - `prevent_reschedule_on_lost` `(bool: false)` - Defines the reschedule behaviour
  of an allocation when the node it is running on misses heartbeats.
  When enabled, if the node it is running on becomes disconnected
@@ -82,6 +84,13 @@ job "docs" {
  Setting `max_client_disconnect` and `prevent_reschedule_on_lost = true` at the
  same time requires that [rescheduling is disabled entirely][`disable_rescheduling`].

+  This field was deprecated in favour of `replace` on the [`disconnect`] block, 
+  see [example below][disconect_migration] for more details about migrating.
+
+- `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a
+  rescheduling strategy. Nomad will then attempt to schedule the task on another
+  node if any of the group allocation statuses become "failed".
+
 - `restart` <code>([Restart][]: nil)</code> - Specifies the restart policy for
  all tasks in this group. If omitted, a default policy exists for each job
  type, which can be found in the [restart block documentation][restart].
@@ -115,12 +124,16 @@ job "docs" {
  The Nomad client process must be running for this to occur. This setting
  cannot be used with [`max_client_disconnect`].

+  This field was deprecated in favour of `stop_after` on the [`disconnect`] block.
+
 - `max_client_disconnect` `(string: "")` - Specifies a duration during which a
  Nomad client will attempt to reconnect allocations after it fails to heartbeat
  in the [`heartbeat_grace`] window. See [the example code
  below][max-client-disconnect] for more details. This setting cannot be used
  with [`stop_after_client_disconnect`].

+  This field was deprecated in favour of `lost_after` on the [`disconnect`] block.
+
 - `task` <code>([Task][]: &lt;required&gt;)</code> - Specifies one or more tasks to run
  within this group. This can be specified multiple times, to add a task as part
  of the group.
@@ -285,17 +298,20 @@ healthy, will restart because they have been marked "lost". This can cause
 issues with stateful tasks or tasks with long restart times.

 Instead, an operator may desire that these allocations reconnect without a
-restart. When `max_client_disconnect` is specified, the Nomad server will mark
-clients that fail to heartbeat as "disconnected" rather than "down", and will
-mark allocations on a disconnected client as "unknown" rather than "lost". These
-allocations may continue to run on the disconnected client. Replacement
-allocations will be scheduled according to the allocations' reschedule policy
-until the disconnected client reconnects. Once a disconnected client reconnects,
-Nomad will compare the "unknown" allocations with their replacements and keep
-the one with the best node score. If the `max_client_disconnect` duration
-expires before the client reconnects, the allocations will be marked "lost".
+restart. When `max_client_disconnect` or `disconnect.lost_after` is specified,
+the Nomad server will mark clients that fail to heartbeat as "disconnected" 
+rather than "down", and will mark allocations on a disconnected client as
+"unknown" rather than "lost". These allocations may continue to run on the
+disconnected client. Replacement allocations will be scheduled according to the
+allocations' `disconnect.replace` settings. until the disconnected client 
+reconnects. Once a disconnected client reconnects, Nomad will compare the "unknown" 
+allocations with their replacements and will decide which ones to keep according 
+to the `disconnect.replace` setting. If the `max_client_disconnect` or 
+`disconnect.losta_after` duration expires before the client reconnects, 
+the allocations will be marked "lost".
 Clients that contain "unknown" allocations will transition to "disconnected"
-rather than "down" until the last `max_client_disconnect` duration has expired.
+rather than "down" until the last `max_client_disconnect` or `disconnect.lost_after` 
+duration has expired.

 In the example code below, if both of these task groups were placed on the same
 client and that client experienced a network outage, both of the group's
@@ -371,6 +387,45 @@ If [`max_client_disconnect`](#max_client_disconnect) is set and
 the node will be transition from `disconnected` to `down`. The allocation
 will remain as `unknown` and won't be rescheduled.

+#### Migration to `disconnect` block
+
+The new configuration fileds in the disconnect block work exactly the same as the
+ones they are replacing: 
+  * `stop_after_client_disconnect` is replaced by `stop_after`
+  * `max_client_disconnect` is replaced by `lost_after`
+  * `prevent_reschedule_on_lost` is replaced by `replace`
+
+To keep the same behaviour as the old configuration upon reconnection, the
+`reconcile` option should be set to `best_score`.
+
+The following example shows how to migrate from the old configuration to the new one:
+
+```hcl
+job "docs" {
+  group "example" {
+    max_client_disconnect        = "6h"
+    stop_after_client_disconnect = "2h"
+    prevent_reschedule_on_lost   = true
+  }
+}
+```
+Can be directly translated to:
+
+```hcl
+job "docs" {
+  	group "example" {
+      disconnect {
+        lost_after = "6h"
+        stop_after = "2h"
+        replace = false
+        reconcile = "best_score"
+      }
+    }
+  }
+```
+
+All use constrains still apply with the disconnect block as they did before:
+ - `stop_after` and `lost_after` can't be used together.

 [task]: /nomad/docs/job-specification/task 'Nomad task Job Specification'
 [job]: /nomad/docs/job-specification/job 'Nomad job Job Specification'
@@ -389,6 +444,7 @@ will remain as `unknown` and won't be rescheduled.
 [migrate]: /nomad/docs/job-specification/migrate 'Nomad migrate Job Specification'
 [network]: /nomad/docs/job-specification/network 'Nomad network Job Specification'
 [reschedule]: /nomad/docs/job-specification/reschedule 'Nomad reschedule Job Specification'
+[disconnect]: /nomad/docs/job-specification/disconnect 'Nomad disconnect Job Specification'
 [restart]: /nomad/docs/job-specification/restart 'Nomad restart Job Specification'
 [service]: /nomad/docs/job-specification/service 'Nomad service Job Specification'
 [service_discovery]: /nomad/docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery'
@@ -396,3 +452,4 @@ will remain as `unknown` and won't be rescheduled.
 [vault]: /nomad/docs/job-specification/vault 'Nomad vault Job Specification'
 [volume]: /nomad/docs/job-specification/volume 'Nomad volume Job Specification'
 [`consul.name`]: /nomad/docs/configuration/consul#name
+[disconect_migration]: /nomad/docs/job-specification/group#migration_to_disconnect_block
--- a/website/content/docs/upgrade/upgrade-specific.mdx
+++ b/website/content/docs/upgrade/upgrade-specific.mdx
@@ -14,6 +14,12 @@ their upgrades as a result of new features or changed behavior. This page is
 used to document those details separately from the standard upgrade flow.

 ## Nomad 1.8.0
+Nomad 1.8.0 introduces a `disconnect` block meant to group all the configuration
+options related to disconnected client's and server's behavior, causing the 
+deprecation of the fileds `stop_after_client_disconnect`, `max_client_disconnect`
+and `prevent_reschedule_on_lost`. This block also introduces new options for
+allocations reconciliation if the client regains connectivity.
+

 #### Removal of `raw_exec` option `no_cgroups`

--- a/website/data/docs-nav-data.json
+++ b/website/data/docs-nav-data.json
@@ -1703,6 +1703,10 @@
        "title": "expose",
        "path": "job-specification/expose"
      },
+      {
+        "title": "disconnect",
+        "path": "job-specification/disconnect"
+      },
      {
        "title": "gateway",
        "path": "job-specification/gateway"