mirror of
https://github.com/kemko/nomad.git
synced 2026-01-01 16:05:42 +03:00
Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: https://github.com/hashicorp/nomad/issues/24918 Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
232 lines
8.1 KiB
Plaintext
232 lines
8.1 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: disconnect Block - Job Specification
|
|
description: |-
|
|
The "disconnect" block describes the behavior of both the Nomad server and
|
|
client in case of a network partition, as well as how to reconcile the workloads
|
|
in case of a reconnection.
|
|
---
|
|
|
|
# `disconnect` Block
|
|
|
|
<Placement groups={['job', 'group', 'disconnect']} />
|
|
|
|
The `disconnect` block describes the system's behavior in case of a network
|
|
partition. By default, without a `disconnect` block, if an allocation is on a
|
|
node that misses heartbeats, the allocation will be marked `lost` and will be
|
|
replaced.
|
|
|
|
Replacement happens when a node is lost. When a node is drained, Nomad
|
|
[migrates][] the allocations instead, and Nomad ignores the `disconnect`
|
|
block. When a Nomad agent fails to set up the allocation or the tasks of an
|
|
allocation fail more than their [`restart`][] block allows, Nomad
|
|
[reschedules][] the allocations and ignores the `disconnect`.
|
|
|
|
|
|
```hcl
|
|
job "docs" {
|
|
group "example" {
|
|
disconnect {
|
|
lost_after = "6h"
|
|
replace = false
|
|
reconcile = "keep_original"
|
|
}
|
|
}
|
|
|
|
group "example2" {
|
|
disconnect {
|
|
stop_on_client_after = "12h"
|
|
replace = false
|
|
reconcile = "keep_original"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
~> Note that you cannot use both`lost_after` and `stop_on_client_after` in the
|
|
same `disconnect` block.
|
|
|
|
## `disconnect` Parameters
|
|
|
|
- `lost_after` `(string: "")` - Specifies a duration during which a Nomad client
|
|
will attempt to reconnect allocations after it fails to heartbeat in the
|
|
[`heartbeat_grace`][] window. It defaults to "", which is equivalent to
|
|
having the disconnect block be nil.
|
|
|
|
You cannot use `lost_after` and `stop_on_client_after` in the same
|
|
`disconnect` block.
|
|
|
|
Refer to [the Lost After section][lost-after] for more details.
|
|
|
|
- `replace` `(bool: false)` - Specifies if Nomad should replace the disconnected
|
|
allocation with a new one rescheduled on a different node. Nomad considers the
|
|
replacement allocation a reschedule and obeys the job's [`reschedule`][]
|
|
block. If false and the node the allocation is running on disconnects
|
|
or goes down, Nomad does not replace this allocation and reports `unknown`
|
|
until the node reconnects, or until you manually stop the allocation.
|
|
|
|
```plaintext
|
|
`nomad alloc stop <alloc ID>`
|
|
```
|
|
|
|
If true, a new alloc will be placed immediately upon the node becoming
|
|
disconnected.
|
|
|
|
- `stop_on_client_after` `(string: "")` - Specifies a duration after which a
|
|
disconnected Nomad client will stop its allocations. Setting
|
|
`stop_on_client_after` shorter than `lost_after` and `replace = false` at the
|
|
same time is not permitted and will cause a validation error, because this
|
|
would lead to a state where no allocations can be scheduled.
|
|
|
|
The Nomad client process must be running for this to occur.
|
|
|
|
You cannot use `stop_on_client_after` and `lost_after` in the same
|
|
`disconnect` block.
|
|
|
|
Refer to [the Stop After section][stop-after] for more details.
|
|
|
|
- `reconcile` `(string: "best_score")` - Specifies which allocation to keep once
|
|
the previously disconnected node regains connectivity.
|
|
It has four possible values which are described below:
|
|
|
|
- `keep_original`: Always keep the original allocation. Bear in mind
|
|
when choosing this option, it can have crashed while the client was
|
|
disconnected.
|
|
- `keep_replacement`: Always keep the allocation that was replaced
|
|
to replace the disconnected one.
|
|
- `best_score`: Keep the allocation running on the node with the best
|
|
score.
|
|
- `longest_running`: Keep the allocation that has been up and running
|
|
continuously for the longest time.
|
|
|
|
|
|
## `disconnect` Examples
|
|
|
|
The following examples only show the `disconnect` blocks. Remember that the
|
|
`disconnect` block is only valid in the placements listed previously.
|
|
|
|
### Stop After
|
|
|
|
This example shows how `stop_on_client_after` interacts with
|
|
other blocks. For the `first` group, after the default 10 second
|
|
[`heartbeat_grace`] window expires and 90 more seconds passes, the
|
|
server replaces the allocation. The client waits 90 seconds
|
|
before sending a stop signal (`SIGTERM`) to the `first-task`
|
|
task. After 15 more seconds because of the task's `kill_timeout`, the
|
|
client will send `SIGKILL`. The `second` group does not have
|
|
`stop_on_client_after`, so the server replaces the
|
|
allocation after the 10 second [`heartbeat_grace`] expires. It will
|
|
not be stopped on the client, regardless of how long the client is out
|
|
of touch.
|
|
|
|
Note that if the server's clocks are not closely synchronized with
|
|
each other, the server may replace the group before the client has
|
|
stopped the allocation. Operators should ensure that clock drift
|
|
between servers is as small as possible.
|
|
|
|
Note also that a group using this feature will be stopped on the
|
|
client if the Nomad server cluster fails, since the client will be
|
|
unable to contact any server in that case. Groups opting in to this
|
|
feature are therefore exposed to an additional runtime dependency and
|
|
potential point of failure.
|
|
|
|
```hcl
|
|
group "first" {
|
|
disconnect {
|
|
stop_on_client_after = "90s"
|
|
}
|
|
|
|
task "first-task" {
|
|
kill_timeout = "15s"
|
|
}
|
|
}
|
|
|
|
group "second" {
|
|
|
|
task "second-task" {
|
|
kill_timeout = "5s"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Lost After
|
|
|
|
By default, allocations running on a client that fails to heartbeat will be
|
|
marked "lost". When a client reconnects, its allocations, which may still be
|
|
healthy, will restart because they have been marked "lost". This can cause
|
|
issues with stateful tasks or tasks with long restart times.
|
|
|
|
Instead, an operator may desire that these allocations reconnect without a
|
|
restart. When `lost_after` is specified, the Nomad server will mark
|
|
clients that fail to heartbeat as "disconnected" rather than "down", and will
|
|
mark allocations on a disconnected client as "unknown" rather than "lost".
|
|
These allocations may continue to run on the disconnected client. Replacement
|
|
allocations will be scheduled according to the allocations' `replace` settings
|
|
until the disconnected client reconnects. Once a disconnected client reconnects,
|
|
Nomad will compare the "unknown" allocations with their replacements will
|
|
decide which ones to keep according to the `reconcile` setting.
|
|
If the `lost_after` duration expires before the client reconnects,
|
|
the allocations will be marked "lost". Clients that contain "unknown"
|
|
allocations will transition to "disconnected" rather than "down" until the last
|
|
`lost_after` duration has expired.
|
|
|
|
In the example code below, if both of these task groups were placed on the same
|
|
client and that client experienced a network outage, both of the group's
|
|
allocations would be marked as "disconnected" at two minutes because of the
|
|
client's `heartbeat_grace` value of "2m". If the network outage continued for
|
|
eight hours, and the client continued to fail to heartbeat, the client would
|
|
remain in a "disconnected" state, as the first group's `lost_after`
|
|
is twelve hours. Once all groups' `lost_after` durations are
|
|
exceeded, in this case in twelve hours, the client node will be marked as "down"
|
|
and the allocation will be marked as "lost". If the client had reconnected
|
|
before twelve hours had passed, the allocations would gracefully reconnect
|
|
using the strategy defined by [`reconcile`].
|
|
|
|
Lost After is useful for edge deployments, or scenarios when
|
|
operators want zero on-client downtime due to node connectivity issues. This
|
|
setting cannot be used with `stop_on_client_after`.
|
|
|
|
```hcl
|
|
# server_config.hcl
|
|
|
|
server {
|
|
enabled = true
|
|
heartbeat_grace = "2m"
|
|
}
|
|
```
|
|
|
|
```hcl
|
|
# jobspec.nomad
|
|
|
|
group "first" {
|
|
disconnect {
|
|
lost_after = "12h"
|
|
reconcile = "best_score"
|
|
}
|
|
|
|
task "first-task" {
|
|
...
|
|
}
|
|
}
|
|
|
|
group "second" {
|
|
disconnect {
|
|
lost_after = "12h"
|
|
reconcile = "keep_original"
|
|
}
|
|
|
|
task "second-task" {
|
|
...
|
|
}
|
|
}
|
|
```
|
|
|
|
[`heartbeat_grace`]: /nomad/docs/configuration/server#heartbeat_grace
|
|
[stop-after]: /nomad/docs/job-specification/disconnect#stop-after
|
|
[lost-after]: /nomad/docs/job-specification/disconnect#lost-after
|
|
[`reconcile`]: /nomad/docs/job-specification/disconnect#reconcile
|
|
[migrates]: /nomad/docs/job-specification/migrate
|
|
[`restart`]: /nomad/docs/job-specification/restart
|
|
[reschedules]: /nomad/docs/job-specification/reschedule
|
|
[`reschedule`]: /nomad/docs/job-specification/reschedule
|