Add docs for disconnected block (#20147)

Expand the job settings to include the disconnect block and set as deprecated the fields that will be replaced by it.
This commit is contained in:
Juana De La Cuesta
2024-03-20 10:08:16 +01:00
committed by GitHub
parent 7b27bc344b
commit 56bf253474
6 changed files with 286 additions and 18 deletions

View File

@@ -75,7 +75,7 @@ The list below covers each trigger and what can trigger it.
* **job-scaling**: Scaling a Job will result in 1 Evaluation created, plus any
follow-up Evaluations associated with scheduling, planning, or deployments.
* **max-disconnect-timeout**: When an Allocation is in the `unknown` state for
longer than the [`max_client_disconnect`][] window, the scheduler will create
longer than the [`disconnect.lost_after`][] window, the scheduler will create
1 Evaluation.
* **reconnect**: When a Node in the `disconnected` state reconnects, Nomad will
create 1 Evaluation per job with an allocation on the reconnected Node.
@@ -256,4 +256,4 @@ and eventually need to be garbage collected.
[`structs.go`]: https://github.com/hashicorp/nomad/blob/v1.4.0-beta.1/nomad/structs/structs.go#L10857-L10875
[`update`]: https://developer.hashicorp.com/nomad/docs/job-specification/update
[`restart` attempts]: https://developer.hashicorp.com/nomad/docs/job-specification/restart
[`max_client_disconnect`]: https://developer.hashicorp.com/nomad/docs/job-specification/group#max-client-disconnect
[`disconnect.lost_after`]: https://developer.hashicorp.com/nomad/docs/job-specification/disconnect#lost_after

View File

@@ -424,7 +424,7 @@ server {
Nomad Clients periodically heartbeat to Nomad Servers to confirm they are
operating as expected. Nomad Clients which do not heartbeat in the specified
amount of time are considered `down` and their allocations are marked as `lost`
or `disconnected` (if [`max_client_disconnect`][max_client_disconnect] is set)
or `disconnected` (if [`disconnect.lost_after`][disconnect.lost_after] is set)
and rescheduled.
The various heartbeat related parameters allow you to tune the following
@@ -509,6 +509,6 @@ work.
[`nomad operator gossip keyring generate`]: /nomad/docs/commands/operator/gossip/keyring-generate
[search]: /nomad/docs/configuration/search
[encryption key]: /nomad/docs/operations/key-management
[max_client_disconnect]: /nomad/docs/job-specification/group#max-client-disconnect
[disconnect.lost_after]: /nomad/docs/job-specification/disconnect#lost_after
[herd]: https://en.wikipedia.org/wiki/Thundering_herd_problem
[wi]: /nomad/docs/concepts/workload-identity

View File

@@ -0,0 +1,201 @@
---
layout: docs
page_title: disconnect Block - Job Specification
description: |-
The "disconnect" block describes the behavior of both the Nomad server and
client in case of a network partition, as well as how to reconcile the workloads
in case of a reconnection.
---
# `disconnect` Block
<Placement groups={['job', 'group', 'disconnect']} />
The `disconnect` block describes the system's behavior in case of a network
partition. By default, without a `disconnect` block, if an allocation is on a
node that misses heartbeats, the allocation will be marked `lost` and will be
rescheduled.
```hcl
job "docs" {
group "example" {
disconnect {
lost_after = "6h"
stop_after = "2h"
replace = false
reconcile = "keep_original"
}
}
}
```
## `disconnect` Parameters
- `lost_after` `(string: "")` - Specifies a duration during which a Nomad client
will attempt to reconnect allocations after it fails to heartbeat
in the [`heartbeat_grace`][] window. It defaults to "" which is equivalent to
having the disconnect block be nil.
See [the example code below][lost_after] for more details. This setting cannot
be used with [`stop_after`].
- `replace` `(bool: false)` - Specifies if the disconnected allocation should
be replaced by a new one rescheduled on a different node. If false and the
node it is running on becomes disconnected or goes down, this allocation
won't be rescheduled and will be reported as `unknown` until the node reconnects,
or until the allocation is manually stopped:
```plaintext
`nomad alloc stop <alloc ID>`
```
If true, a new alloc will be placed immediately upon the node becoming
disconnected.
- `stop_after` `(string: "")` - Specifies a duration after which a disconnected
Nomad client will stop its allocations. Setting `stop_after` shorter than
`lost_after` and `replace = false` at the same time is not permitted and
will cause a validation error, because this would lead to a state where no
allocations can be scheduled.
The Nomad client process must be running for this to occur. This setting
cannot be used with [`lost_after`].
- `reconcile` `(string: "best_score")` - Specifies which allocation to keep once
the previously disconnected node regains connectivity.
It has four possible values which are described below:
- `keep_original`: Always keep the original allocation. Bear in mind
when choosing this option, it can have crashed while the client was
disconnected.
- `keep_replacement`: Always keep the allocation that was rescheduled
to replace the disconnected one.
- `best_score`: Keep the allocation running on the node with the best
score.
- `longest_running`: Keep the allocation that has been up and running
continuously for the longest time.
## `disconnect` Examples
The following examples only show the `disconnect` blocks. Remember that the
`disconnect` block is only valid in the placements listed above.
### Stop After
This example shows how `stop_after` interacts with
other blocks. For the `first` group, after the default 10 second
[`heartbeat_grace`] window expires and 90 more seconds passes, the
server will reschedule the allocation. The client will wait 90 seconds
before sending a stop signal (`SIGTERM`) to the `first-task`
task. After 15 more seconds because of the task's `kill_timeout`, the
client will send `SIGKILL`. The `second` group does not have
`stop_after`, so the server will reschedule the
allocation after the 10 second [`heartbeat_grace`] expires. It will
not be stopped on the client, regardless of how long the client is out
of touch.
Note that if the server's clocks are not closely synchronized with
each other, the server may reschedule the group before the client has
stopped the allocation. Operators should ensure that clock drift
between servers is as small as possible.
Note also that a group using this feature will be stopped on the
client if the Nomad server cluster fails, since the client will be
unable to contact any server in that case. Groups opting in to this
feature are therefore exposed to an additional runtime dependency and
potential point of failure.
```hcl
group "first" {
stop_after_client_disconnect = "90s"
task "first-task" {
kill_timeout = "15s"
}
}
group "second" {
task "second-task" {
kill_timeout = "5s"
}
}
```
### Lost After
By default, allocations running on a client that fails to heartbeat will be
marked "lost". When a client reconnects, its allocations, which may still be
healthy, will restart because they have been marked "lost". This can cause
issues with stateful tasks or tasks with long restart times.
Instead, an operator may desire that these allocations reconnect without a
restart. When `lost_after` is specified, the Nomad server will mark
clients that fail to heartbeat as "disconnected" rather than "down", and will
mark allocations on a disconnected client as "unknown" rather than "lost".
These allocations may continue to run on the disconnected client. Replacement
allocations will be scheduled according to the allocations' `replace` settings
until the disconnected client reconnects. Once a disconnected client reconnects,
Nomad will compare the "unknown" allocations with their replacements will
decide which ones to keep according to the `reconcile` setting.
If the `lost_after` duration expires before the client reconnects,
the allocations will be marked "lost". Clients that contain "unknown"
allocations will transition to "disconnected" rather than "down" until the last
`lost_after` duration has expired.
In the example code below, if both of these task groups were placed on the same
client and that client experienced a network outage, both of the group's
allocations would be marked as "disconnected" at two minutes because of the
client's `heartbeat_grace` value of "2m". If the network outage continued for
eight hours, and the client continued to fail to heartbeat, the client would
remain in a "disconnected" state, as the first group's `lost_after`
is twelve hours. Once all groups' `lost_after` durations are
exceeded, in this case in twelve hours, the client node will be marked as "down"
and the allocation will be marked as "lost". If the client had reconnected
before twelve hours had passed, the allocations would gracefully reconnect
using the strategy defined by [`reconcile`].
Lost After is useful for edge deployments, or scenarios when
operators want zero on-client downtime due to node connectivity issues. This
setting cannot be used with [`stop_after`].
```hcl
# server_config.hcl
server {
enabled = true
heartbeat_grace = "2m"
}
```
```hcl
# jobspec.nomad
group "first" {
disconnect {
lost_after = "12h"
reconcile = "best_score"
}
task "first-task" {
...
}
}
group "second" {
disconnect {
lost_after = "12h"
reconcile = "keep_original"
}
task "second-task" {
...
}
}
```
[`heartbeat_grace`]: /nomad/docs/configuration/server#heartbeat_grace
[`stop_after`]: /nomad/docs/job-specification/disconnect#stop_after
[`lost_after`]: /nomad/docs/job-specification/disconnect#replace_after
[`reconcile`]: /nomad/docs/job-specification/disconnect#reconcile

View File

@@ -48,6 +48,12 @@ job "docs" {
ephemeral disk requirements of the group. Ephemeral disks can be marked as
sticky and support live data migrations.
- `disconnect` <code>([disconnect][]: nil)</code> - Specifies the disconnect
strategy for the server and client for all tasks in this group in case of a
network partition. The tasks can be left unconnected, stopped or replaced
when the client disconnects. The policy for reconciliation in case the client
regains connectivity is also specified here.
- `meta` <code>([Meta][]: nil)</code> - Specifies a key-value map that annotates
with user-defined metadata.
@@ -59,10 +65,6 @@ job "docs" {
requirements and configuration, including static and dynamic port allocations,
for the group.
- `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a
rescheduling strategy. Nomad will then attempt to schedule the task on another
node if any of the group allocation statuses become "failed".
- `prevent_reschedule_on_lost` `(bool: false)` - Defines the reschedule behaviour
of an allocation when the node it is running on misses heartbeats.
When enabled, if the node it is running on becomes disconnected
@@ -82,6 +84,13 @@ job "docs" {
Setting `max_client_disconnect` and `prevent_reschedule_on_lost = true` at the
same time requires that [rescheduling is disabled entirely][`disable_rescheduling`].
This field was deprecated in favour of `replace` on the [`disconnect`] block,
see [example below][disconect_migration] for more details about migrating.
- `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a
rescheduling strategy. Nomad will then attempt to schedule the task on another
node if any of the group allocation statuses become "failed".
- `restart` <code>([Restart][]: nil)</code> - Specifies the restart policy for
all tasks in this group. If omitted, a default policy exists for each job
type, which can be found in the [restart block documentation][restart].
@@ -115,12 +124,16 @@ job "docs" {
The Nomad client process must be running for this to occur. This setting
cannot be used with [`max_client_disconnect`].
This field was deprecated in favour of `stop_after` on the [`disconnect`] block.
- `max_client_disconnect` `(string: "")` - Specifies a duration during which a
Nomad client will attempt to reconnect allocations after it fails to heartbeat
in the [`heartbeat_grace`] window. See [the example code
below][max-client-disconnect] for more details. This setting cannot be used
with [`stop_after_client_disconnect`].
This field was deprecated in favour of `lost_after` on the [`disconnect`] block.
- `task` <code>([Task][]: &lt;required&gt;)</code> - Specifies one or more tasks to run
within this group. This can be specified multiple times, to add a task as part
of the group.
@@ -285,17 +298,20 @@ healthy, will restart because they have been marked "lost". This can cause
issues with stateful tasks or tasks with long restart times.
Instead, an operator may desire that these allocations reconnect without a
restart. When `max_client_disconnect` is specified, the Nomad server will mark
clients that fail to heartbeat as "disconnected" rather than "down", and will
mark allocations on a disconnected client as "unknown" rather than "lost". These
allocations may continue to run on the disconnected client. Replacement
allocations will be scheduled according to the allocations' reschedule policy
until the disconnected client reconnects. Once a disconnected client reconnects,
Nomad will compare the "unknown" allocations with their replacements and keep
the one with the best node score. If the `max_client_disconnect` duration
expires before the client reconnects, the allocations will be marked "lost".
restart. When `max_client_disconnect` or `disconnect.lost_after` is specified,
the Nomad server will mark clients that fail to heartbeat as "disconnected"
rather than "down", and will mark allocations on a disconnected client as
"unknown" rather than "lost". These allocations may continue to run on the
disconnected client. Replacement allocations will be scheduled according to the
allocations' `disconnect.replace` settings. until the disconnected client
reconnects. Once a disconnected client reconnects, Nomad will compare the "unknown"
allocations with their replacements and will decide which ones to keep according
to the `disconnect.replace` setting. If the `max_client_disconnect` or
`disconnect.losta_after` duration expires before the client reconnects,
the allocations will be marked "lost".
Clients that contain "unknown" allocations will transition to "disconnected"
rather than "down" until the last `max_client_disconnect` duration has expired.
rather than "down" until the last `max_client_disconnect` or `disconnect.lost_after`
duration has expired.
In the example code below, if both of these task groups were placed on the same
client and that client experienced a network outage, both of the group's
@@ -371,6 +387,45 @@ If [`max_client_disconnect`](#max_client_disconnect) is set and
the node will be transition from `disconnected` to `down`. The allocation
will remain as `unknown` and won't be rescheduled.
#### Migration to `disconnect` block
The new configuration fileds in the disconnect block work exactly the same as the
ones they are replacing:
* `stop_after_client_disconnect` is replaced by `stop_after`
* `max_client_disconnect` is replaced by `lost_after`
* `prevent_reschedule_on_lost` is replaced by `replace`
To keep the same behaviour as the old configuration upon reconnection, the
`reconcile` option should be set to `best_score`.
The following example shows how to migrate from the old configuration to the new one:
```hcl
job "docs" {
group "example" {
max_client_disconnect = "6h"
stop_after_client_disconnect = "2h"
prevent_reschedule_on_lost = true
}
}
```
Can be directly translated to:
```hcl
job "docs" {
group "example" {
disconnect {
lost_after = "6h"
stop_after = "2h"
replace = false
reconcile = "best_score"
}
}
}
```
All use constrains still apply with the disconnect block as they did before:
- `stop_after` and `lost_after` can't be used together.
[task]: /nomad/docs/job-specification/task 'Nomad task Job Specification'
[job]: /nomad/docs/job-specification/job 'Nomad job Job Specification'
@@ -389,6 +444,7 @@ will remain as `unknown` and won't be rescheduled.
[migrate]: /nomad/docs/job-specification/migrate 'Nomad migrate Job Specification'
[network]: /nomad/docs/job-specification/network 'Nomad network Job Specification'
[reschedule]: /nomad/docs/job-specification/reschedule 'Nomad reschedule Job Specification'
[disconnect]: /nomad/docs/job-specification/disconnect 'Nomad disconnect Job Specification'
[restart]: /nomad/docs/job-specification/restart 'Nomad restart Job Specification'
[service]: /nomad/docs/job-specification/service 'Nomad service Job Specification'
[service_discovery]: /nomad/docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery'
@@ -396,3 +452,4 @@ will remain as `unknown` and won't be rescheduled.
[vault]: /nomad/docs/job-specification/vault 'Nomad vault Job Specification'
[volume]: /nomad/docs/job-specification/volume 'Nomad volume Job Specification'
[`consul.name`]: /nomad/docs/configuration/consul#name
[disconect_migration]: /nomad/docs/job-specification/group#migration_to_disconnect_block

View File

@@ -14,6 +14,12 @@ their upgrades as a result of new features or changed behavior. This page is
used to document those details separately from the standard upgrade flow.
## Nomad 1.8.0
Nomad 1.8.0 introduces a `disconnect` block meant to group all the configuration
options related to disconnected client's and server's behavior, causing the
deprecation of the fileds `stop_after_client_disconnect`, `max_client_disconnect`
and `prevent_reschedule_on_lost`. This block also introduces new options for
allocations reconciliation if the client regains connectivity.
#### Removal of `raw_exec` option `no_cgroups`

View File

@@ -1703,6 +1703,10 @@
"title": "expose",
"path": "job-specification/expose"
},
{
"title": "disconnect",
"path": "job-specification/disconnect"
},
{
"title": "gateway",
"path": "job-specification/gateway"