Files
nomad/website/content/docs/job-specification/disconnect.mdx
Tim Gross dc58f247ed docs: clarify reschedule, migrate, and replacement terminology (#24929)
Our vocabulary around scheduler behaviors outside of the `reschedule` and
`migrate` blocks leaves room for confusion around whether the reschedule tracker
should be propagated between allocations. There are effectively five different
behaviors we need to cover:

* restart: when the tasks of an allocation fail and we try to restart the tasks
  in place.

* reschedule: when the `restart` block runs out of attempts (or the allocation
  fails before tasks even start), and we need to move
  the allocation to another node to try again.

* migrate: when the user has asked to drain a node and we need to move the
  allocations. These are not failures, so we don't want to propagate the
  reschedule tracker.

* replacement: when a node is lost, we don't count that against the `reschedule`
  tracker for the allocations on the node (it's not the allocation's "fault",
  after all). We don't want to run the `migrate` machinery here here either, as we
  can't contact the down node. To the scheduler, this is effectively the same as
  if we bumped the `group.count`

* replacement for `disconnect.replace = true`: this is a replacement, but the
  replacement is intended to be temporary, so we propagate the reschedule tracker.

Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining
when each item applies. Update the use of the word "reschedule" in several
places where "replacement" is correct, and vice-versa.

Fixes: https://github.com/hashicorp/nomad/issues/24918
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-18 09:31:03 -05:00

232 lines
8.1 KiB
Plaintext

---
layout: docs
page_title: disconnect Block - Job Specification
description: |-
The "disconnect" block describes the behavior of both the Nomad server and
client in case of a network partition, as well as how to reconcile the workloads
in case of a reconnection.
---
# `disconnect` Block
<Placement groups={['job', 'group', 'disconnect']} />
The `disconnect` block describes the system's behavior in case of a network
partition. By default, without a `disconnect` block, if an allocation is on a
node that misses heartbeats, the allocation will be marked `lost` and will be
replaced.
Replacement happens when a node is lost. When a node is drained, Nomad
[migrates][] the allocations instead, and Nomad ignores the `disconnect`
block. When a Nomad agent fails to set up the allocation or the tasks of an
allocation fail more than their [`restart`][] block allows, Nomad
[reschedules][] the allocations and ignores the `disconnect`.
```hcl
job "docs" {
group "example" {
disconnect {
lost_after = "6h"
replace = false
reconcile = "keep_original"
}
}
group "example2" {
disconnect {
stop_on_client_after = "12h"
replace = false
reconcile = "keep_original"
}
}
}
```
~> Note that you cannot use both`lost_after` and `stop_on_client_after` in the
same `disconnect` block.
## `disconnect` Parameters
- `lost_after` `(string: "")` - Specifies a duration during which a Nomad client
will attempt to reconnect allocations after it fails to heartbeat in the
[`heartbeat_grace`][] window. It defaults to "", which is equivalent to
having the disconnect block be nil.
You cannot use `lost_after` and `stop_on_client_after` in the same
`disconnect` block.
Refer to [the Lost After section][lost-after] for more details.
- `replace` `(bool: false)` - Specifies if Nomad should replace the disconnected
allocation with a new one rescheduled on a different node. Nomad considers the
replacement allocation a reschedule and obeys the job's [`reschedule`][]
block. If false and the node the allocation is running on disconnects
or goes down, Nomad does not replace this allocation and reports `unknown`
until the node reconnects, or until you manually stop the allocation.
```plaintext
`nomad alloc stop <alloc ID>`
```
If true, a new alloc will be placed immediately upon the node becoming
disconnected.
- `stop_on_client_after` `(string: "")` - Specifies a duration after which a
disconnected Nomad client will stop its allocations. Setting
`stop_on_client_after` shorter than `lost_after` and `replace = false` at the
same time is not permitted and will cause a validation error, because this
would lead to a state where no allocations can be scheduled.
The Nomad client process must be running for this to occur.
You cannot use `stop_on_client_after` and `lost_after` in the same
`disconnect` block.
Refer to [the Stop After section][stop-after] for more details.
- `reconcile` `(string: "best_score")` - Specifies which allocation to keep once
the previously disconnected node regains connectivity.
It has four possible values which are described below:
- `keep_original`: Always keep the original allocation. Bear in mind
when choosing this option, it can have crashed while the client was
disconnected.
- `keep_replacement`: Always keep the allocation that was replaced
to replace the disconnected one.
- `best_score`: Keep the allocation running on the node with the best
score.
- `longest_running`: Keep the allocation that has been up and running
continuously for the longest time.
## `disconnect` Examples
The following examples only show the `disconnect` blocks. Remember that the
`disconnect` block is only valid in the placements listed previously.
### Stop After
This example shows how `stop_on_client_after` interacts with
other blocks. For the `first` group, after the default 10 second
[`heartbeat_grace`] window expires and 90 more seconds passes, the
server replaces the allocation. The client waits 90 seconds
before sending a stop signal (`SIGTERM`) to the `first-task`
task. After 15 more seconds because of the task's `kill_timeout`, the
client will send `SIGKILL`. The `second` group does not have
`stop_on_client_after`, so the server replaces the
allocation after the 10 second [`heartbeat_grace`] expires. It will
not be stopped on the client, regardless of how long the client is out
of touch.
Note that if the server's clocks are not closely synchronized with
each other, the server may replace the group before the client has
stopped the allocation. Operators should ensure that clock drift
between servers is as small as possible.
Note also that a group using this feature will be stopped on the
client if the Nomad server cluster fails, since the client will be
unable to contact any server in that case. Groups opting in to this
feature are therefore exposed to an additional runtime dependency and
potential point of failure.
```hcl
group "first" {
disconnect {
stop_on_client_after = "90s"
}
task "first-task" {
kill_timeout = "15s"
}
}
group "second" {
task "second-task" {
kill_timeout = "5s"
}
}
```
### Lost After
By default, allocations running on a client that fails to heartbeat will be
marked "lost". When a client reconnects, its allocations, which may still be
healthy, will restart because they have been marked "lost". This can cause
issues with stateful tasks or tasks with long restart times.
Instead, an operator may desire that these allocations reconnect without a
restart. When `lost_after` is specified, the Nomad server will mark
clients that fail to heartbeat as "disconnected" rather than "down", and will
mark allocations on a disconnected client as "unknown" rather than "lost".
These allocations may continue to run on the disconnected client. Replacement
allocations will be scheduled according to the allocations' `replace` settings
until the disconnected client reconnects. Once a disconnected client reconnects,
Nomad will compare the "unknown" allocations with their replacements will
decide which ones to keep according to the `reconcile` setting.
If the `lost_after` duration expires before the client reconnects,
the allocations will be marked "lost". Clients that contain "unknown"
allocations will transition to "disconnected" rather than "down" until the last
`lost_after` duration has expired.
In the example code below, if both of these task groups were placed on the same
client and that client experienced a network outage, both of the group's
allocations would be marked as "disconnected" at two minutes because of the
client's `heartbeat_grace` value of "2m". If the network outage continued for
eight hours, and the client continued to fail to heartbeat, the client would
remain in a "disconnected" state, as the first group's `lost_after`
is twelve hours. Once all groups' `lost_after` durations are
exceeded, in this case in twelve hours, the client node will be marked as "down"
and the allocation will be marked as "lost". If the client had reconnected
before twelve hours had passed, the allocations would gracefully reconnect
using the strategy defined by [`reconcile`].
Lost After is useful for edge deployments, or scenarios when
operators want zero on-client downtime due to node connectivity issues. This
setting cannot be used with `stop_on_client_after`.
```hcl
# server_config.hcl
server {
enabled = true
heartbeat_grace = "2m"
}
```
```hcl
# jobspec.nomad
group "first" {
disconnect {
lost_after = "12h"
reconcile = "best_score"
}
task "first-task" {
...
}
}
group "second" {
disconnect {
lost_after = "12h"
reconcile = "keep_original"
}
task "second-task" {
...
}
}
```
[`heartbeat_grace`]: /nomad/docs/configuration/server#heartbeat_grace
[stop-after]: /nomad/docs/job-specification/disconnect#stop-after
[lost-after]: /nomad/docs/job-specification/disconnect#lost-after
[`reconcile`]: /nomad/docs/job-specification/disconnect#reconcile
[migrates]: /nomad/docs/job-specification/migrate
[`restart`]: /nomad/docs/job-specification/restart
[reschedules]: /nomad/docs/job-specification/reschedule
[`reschedule`]: /nomad/docs/job-specification/reschedule