mirror of
https://github.com/kemko/nomad.git
synced 2026-01-01 16:05:42 +03:00
docs: clarify reschedule, migrate, and replacement terminology (#24929)
Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: https://github.com/hashicorp/nomad/issues/24918 Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
This commit is contained in:
@@ -124,18 +124,18 @@ Usage: nomad job restart [options] <job>
|
||||
batch. It is also possible to specify additional time to wait between
|
||||
batches.
|
||||
|
||||
Allocations can be restarted in-place or rescheduled. When restarting
|
||||
in-place the command may target specific tasks in the allocations, restart
|
||||
only tasks that are currently running, or restart all tasks, even the ones
|
||||
that have already run. Allocations can also be targeted by group. When both
|
||||
groups and tasks are defined only the tasks for the allocations of those
|
||||
groups are restarted.
|
||||
You may restart in-place or migrated allocations. When restarting in-place,
|
||||
the command may target specific tasks in the allocations, restart only tasks
|
||||
that are currently running, or restart all tasks, even the ones that have
|
||||
already run. Groups and tasks can also target allocations. When you define
|
||||
both groups and tasks, Nomad restarts only the tasks for the allocations of
|
||||
those groups.
|
||||
|
||||
When rescheduling, the current allocations are stopped triggering the Nomad
|
||||
scheduler to create replacement allocations that may be placed in different
|
||||
When migrating, Nomad stops the current allocations, triggering the Nomad
|
||||
scheduler to create new allocations that may be placed in different
|
||||
clients. The command waits until the new allocations have client status
|
||||
'ready' before proceeding with the remaining batches. Services health checks
|
||||
are not taken into account.
|
||||
'ready' before proceeding with the remaining batches. The command does not
|
||||
consider service health checks.
|
||||
|
||||
By default the command restarts all running tasks in-place with one
|
||||
allocation per batch.
|
||||
@@ -183,12 +183,13 @@ Restart Options:
|
||||
proceed. If 'fail' the command exits immediately. Defaults to 'ask'.
|
||||
|
||||
-reschedule
|
||||
If set, allocations are stopped and rescheduled instead of restarted
|
||||
If set, allocations are stopped and migrated instead of restarted
|
||||
in-place. Since the group is not modified the restart does not create a new
|
||||
deployment, and so values defined in 'update' blocks, such as
|
||||
'max_parallel', are not taken into account. This option cannot be used with
|
||||
'-task'. Only jobs of type 'batch', 'service', and 'system' can be
|
||||
rescheduled.
|
||||
migrated. Note that despite the name of this flag, this command migrates but
|
||||
does not reschedule allocations, so it ignores the 'reschedule' block.
|
||||
|
||||
-task=<task-name>
|
||||
Specify the task to restart. Can be specified multiple times. If groups are
|
||||
|
||||
@@ -469,7 +469,8 @@ func (s *GenericScheduler) computeJobAllocs() error {
|
||||
return s.computePlacements(destructive, place, results.taskGroupAllocNameIndexes)
|
||||
}
|
||||
|
||||
// downgradedJobForPlacement returns the job appropriate for non-canary placement replacement
|
||||
// downgradedJobForPlacement returns the previous stable version of the job for
|
||||
// downgrading a placement for non-canaries
|
||||
func (s *GenericScheduler) downgradedJobForPlacement(p placementResult) (string, *structs.Job, error) {
|
||||
ns, jobID := s.job.Namespace, s.job.ID
|
||||
tgName := p.TaskGroup().Name
|
||||
@@ -587,8 +588,8 @@ func (s *GenericScheduler) computePlacements(destructive, place []placementResul
|
||||
}
|
||||
|
||||
// Check if we should stop the previous allocation upon successful
|
||||
// placement of its replacement. This allow atomic placements/stops. We
|
||||
// stop the allocation before trying to find a replacement because this
|
||||
// placement of the new alloc. This allow atomic placements/stops. We
|
||||
// stop the allocation before trying to place the new alloc because this
|
||||
// frees the resources currently used by the previous allocation.
|
||||
stopPrevAlloc, stopPrevAllocDesc := missing.StopPreviousAlloc()
|
||||
prevAllocation := missing.PreviousAllocation()
|
||||
@@ -715,7 +716,7 @@ func (s *GenericScheduler) computePlacements(destructive, place []placementResul
|
||||
// Track the fact that we didn't find a placement
|
||||
s.failedTGAllocs[tg.Name] = s.ctx.Metrics()
|
||||
|
||||
// If we weren't able to find a replacement for the allocation, back
|
||||
// If we weren't able to find a placement for the allocation, back
|
||||
// out the fact that we asked to stop the allocation.
|
||||
if stopPrevAlloc {
|
||||
s.plan.PopUpdate(prevAllocation)
|
||||
|
||||
@@ -32,18 +32,17 @@ The command can operate in batches and wait until all restarted or
|
||||
rescheduled allocations are running again before proceeding to the next batch.
|
||||
It is also possible to specify additional time to wait between batches.
|
||||
|
||||
Allocations can be restarted in-place or rescheduled. When restarting
|
||||
in-place the command may target specific tasks in the allocations, restart
|
||||
only tasks that are currently running, or restart all tasks, even the ones
|
||||
that have already run. Allocations can also be targeted by groups and tasks.
|
||||
When both groups and tasks are defined only the tasks for the allocations of
|
||||
those groups are restarted.
|
||||
You may restart in-place or migrated allocations. When restarting in-place, the
|
||||
command may target specific tasks in the allocations, restart only tasks that
|
||||
are currently running, or restart all tasks, even the ones that have already
|
||||
run. Groups and tasks can also target allocations. When you define both groups
|
||||
and tasks, Nomad restarts only the tasks for the allocations of those groups.
|
||||
|
||||
When rescheduling, the current allocations are stopped triggering the Nomad
|
||||
scheduler to create replacement allocations that may be placed in different
|
||||
clients. The command waits until the new allocations have client status `ready`
|
||||
before proceeding with the remaining batches. Services health checks are not
|
||||
taken into account.
|
||||
When migrating, Nomad stops the current allocations, triggering the Nomad
|
||||
scheduler to create new allocations that may be placed in different clients. The
|
||||
command waits until the new allocations have client status `ready` before
|
||||
proceeding with the remaining batches. The command does not consider service
|
||||
health checks.
|
||||
|
||||
By default the command restarts all running tasks in-place with one allocation
|
||||
per batch.
|
||||
@@ -82,12 +81,13 @@ of the exact job ID.
|
||||
shutdown or restart. Note that using this flag will result in failed network
|
||||
connections to the allocation being restarted.
|
||||
|
||||
- `-reschedule`: If set, allocations are stopped and rescheduled instead of
|
||||
restarted in-place. Since the group is not modified the restart does not
|
||||
create a new deployment, and so values defined in [`update`][] blocks, such
|
||||
as [`max_parallel`][], are not taken into account. This option cannot be used
|
||||
with `-task`. Only jobs of type `batch`, `service`, and `system` can be
|
||||
rescheduled.
|
||||
- `-reschedule`: If set, Nomad stops and migrates allocations instead of
|
||||
restarting in-place. Since the group is not modified, the restart does not
|
||||
create a new deployment, and so values defined in [`update`][] blocks, such as
|
||||
[`max_parallel`][], are not considered. This option cannot be used with
|
||||
`-task`. You may only migrate jobs of type `batch`, `service`, and `system`.
|
||||
Note that despite the name of this flag, this command migrates but does not
|
||||
reschedule allocations, so it ignores the `reschedule` block.
|
||||
|
||||
- `-on-error=<ask|fail>`: Determines what action to take when an error happens
|
||||
during a restart batch. If `ask` the command stops and waits for user
|
||||
|
||||
@@ -438,17 +438,17 @@ Nomad Clients periodically heartbeat to Nomad Servers to confirm they are
|
||||
operating as expected. Nomad Clients which do not heartbeat in the specified
|
||||
amount of time are considered `down` and their allocations are marked as `lost`
|
||||
or `disconnected` (if [`disconnect.lost_after`][disconnect.lost_after] is set)
|
||||
and rescheduled.
|
||||
and replaced.
|
||||
|
||||
The various heartbeat related parameters allow you to tune the following
|
||||
tradeoffs:
|
||||
|
||||
- The longer the heartbeat period, the longer a `down` Client's workload will
|
||||
take to be rescheduled.
|
||||
- The longer the heartbeat period, the longer Nomad takes to replace a `down`
|
||||
Client's workload.
|
||||
- The shorter the heartbeat period, the more likely transient network issues,
|
||||
leader elections, and other temporary issues could cause a perfectly
|
||||
functional Client and its workloads to be marked as `down` and the work
|
||||
rescheduled.
|
||||
replaced.
|
||||
|
||||
While Nomad Clients can connect to any Server, all heartbeats are forwarded to
|
||||
the leader for processing. Since this heartbeat processing consumes resources,
|
||||
@@ -510,7 +510,7 @@ system has for a delay in noticing crashed Clients. For example a
|
||||
`failover_heartbeat_ttl` of 30 minutes may give even the slowest clients in the
|
||||
largest clusters ample time to heartbeat after an election. However if the
|
||||
election was due to a datacenter-wide failure affecting Clients, it will be 30
|
||||
minutes before Nomad recognizes that they are `down` and reschedules their
|
||||
minutes before Nomad recognizes that they are `down` and replaces their
|
||||
work.
|
||||
|
||||
[encryption]: /nomad/tutorials/transport-security/security-gossip-encryption 'Nomad Encryption Overview'
|
||||
|
||||
@@ -14,7 +14,14 @@ description: |-
|
||||
The `disconnect` block describes the system's behavior in case of a network
|
||||
partition. By default, without a `disconnect` block, if an allocation is on a
|
||||
node that misses heartbeats, the allocation will be marked `lost` and will be
|
||||
rescheduled.
|
||||
replaced.
|
||||
|
||||
Replacement happens when a node is lost. When a node is drained, Nomad
|
||||
[migrates][] the allocations instead, and Nomad ignores the `disconnect`
|
||||
block. When a Nomad agent fails to set up the allocation or the tasks of an
|
||||
allocation fail more than their [`restart`][] block allows, Nomad
|
||||
[reschedules][] the allocations and ignores the `disconnect`.
|
||||
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
@@ -51,11 +58,12 @@ same `disconnect` block.
|
||||
|
||||
Refer to [the Lost After section][lost-after] for more details.
|
||||
|
||||
- `replace` `(bool: false)` - Specifies if the disconnected allocation should
|
||||
be replaced by a new one rescheduled on a different node. If false and the
|
||||
node it is running on becomes disconnected or goes down, this allocation
|
||||
won't be rescheduled and will be reported as `unknown` until the node reconnects,
|
||||
or until the allocation is manually stopped:
|
||||
- `replace` `(bool: false)` - Specifies if Nomad should replace the disconnected
|
||||
allocation with a new one rescheduled on a different node. Nomad considers the
|
||||
replacement allocation a reschedule and obeys the job's [`reschedule`][]
|
||||
block. If false and the node the allocation is running on disconnects
|
||||
or goes down, Nomad does not replace this allocation and reports `unknown`
|
||||
until the node reconnects, or until you manually stop the allocation.
|
||||
|
||||
```plaintext
|
||||
`nomad alloc stop <alloc ID>`
|
||||
@@ -84,7 +92,7 @@ same `disconnect` block.
|
||||
- `keep_original`: Always keep the original allocation. Bear in mind
|
||||
when choosing this option, it can have crashed while the client was
|
||||
disconnected.
|
||||
- `keep_replacement`: Always keep the allocation that was rescheduled
|
||||
- `keep_replacement`: Always keep the allocation that was replaced
|
||||
to replace the disconnected one.
|
||||
- `best_score`: Keep the allocation running on the node with the best
|
||||
score.
|
||||
@@ -102,17 +110,17 @@ The following examples only show the `disconnect` blocks. Remember that the
|
||||
This example shows how `stop_on_client_after` interacts with
|
||||
other blocks. For the `first` group, after the default 10 second
|
||||
[`heartbeat_grace`] window expires and 90 more seconds passes, the
|
||||
server will reschedule the allocation. The client will wait 90 seconds
|
||||
server replaces the allocation. The client waits 90 seconds
|
||||
before sending a stop signal (`SIGTERM`) to the `first-task`
|
||||
task. After 15 more seconds because of the task's `kill_timeout`, the
|
||||
client will send `SIGKILL`. The `second` group does not have
|
||||
`stop_on_client_after`, so the server will reschedule the
|
||||
`stop_on_client_after`, so the server replaces the
|
||||
allocation after the 10 second [`heartbeat_grace`] expires. It will
|
||||
not be stopped on the client, regardless of how long the client is out
|
||||
of touch.
|
||||
|
||||
Note that if the server's clocks are not closely synchronized with
|
||||
each other, the server may reschedule the group before the client has
|
||||
each other, the server may replace the group before the client has
|
||||
stopped the allocation. Operators should ensure that clock drift
|
||||
between servers is as small as possible.
|
||||
|
||||
@@ -217,3 +225,7 @@ group "second" {
|
||||
[stop-after]: /nomad/docs/job-specification/disconnect#stop-after
|
||||
[lost-after]: /nomad/docs/job-specification/disconnect#lost-after
|
||||
[`reconcile`]: /nomad/docs/job-specification/disconnect#reconcile
|
||||
[migrates]: /nomad/docs/job-specification/migrate
|
||||
[`restart`]: /nomad/docs/job-specification/restart
|
||||
[reschedules]: /nomad/docs/job-specification/reschedule
|
||||
[`reschedule`]: /nomad/docs/job-specification/reschedule
|
||||
|
||||
@@ -65,14 +65,14 @@ job "docs" {
|
||||
requirements and configuration, including static and dynamic port allocations,
|
||||
for the group.
|
||||
|
||||
- `prevent_reschedule_on_lost` `(bool: false)` - Defines the reschedule behaviour
|
||||
of an allocation when the node it is running on misses heartbeats.
|
||||
When enabled, if the node it is running on becomes disconnected
|
||||
or goes down, this allocations wont be rescheduled and will show up as `unknown`
|
||||
until the node comes back up or it is manually restarted.
|
||||
- `prevent_reschedule_on_lost` `(bool: false)` - Defines the replacement
|
||||
behavior of an allocation when the node it is running on misses heartbeats.
|
||||
When enabled, if the node disconnects or goes down,
|
||||
Nomad does not replace this allocation and shows it as `unknown` until the node
|
||||
reconnects or you manually restart the node.
|
||||
|
||||
This behaviour will only modify the reschedule process on the server.
|
||||
To modify the allocation behaviour on the client, see
|
||||
This behavior only modifies the replacement process on the server. To
|
||||
modify the allocation behavior on the client, refer to
|
||||
[`stop_after_client_disconnect`](#stop_after_client_disconnect).
|
||||
|
||||
The `unknown` allocation has to be manually stopped to run it again.
|
||||
@@ -84,7 +84,7 @@ job "docs" {
|
||||
Setting `max_client_disconnect` and `prevent_reschedule_on_lost = true` at the
|
||||
same time requires that [rescheduling is disabled entirely][`disable_rescheduling`].
|
||||
|
||||
This field was deprecated in favour of `replace` on the [`disconnect`] block,
|
||||
We deprecated this field in favor of `replace` on the [`disconnect`] block,
|
||||
see [example below][disconect_migration] for more details about migrating.
|
||||
|
||||
- `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a
|
||||
@@ -299,13 +299,13 @@ issues with stateful tasks or tasks with long restart times.
|
||||
|
||||
Instead, an operator may desire that these allocations reconnect without a
|
||||
restart. When `max_client_disconnect` or `disconnect.lost_after` is specified,
|
||||
the Nomad server will mark clients that fail to heartbeat as "disconnected"
|
||||
the Nomad server marks clients that fail to heartbeat as "disconnected"
|
||||
rather than "down", and will mark allocations on a disconnected client as
|
||||
"unknown" rather than "lost". These allocations may continue to run on the
|
||||
disconnected client. Replacement allocations will be scheduled according to the
|
||||
allocations' `disconnect.replace` settings. until the disconnected client
|
||||
reconnects. Once a disconnected client reconnects, Nomad will compare the "unknown"
|
||||
allocations with their replacements and will decide which ones to keep according
|
||||
reconnects. Once a disconnected client reconnects, Nomad compares the "unknown"
|
||||
allocations with their replacements and decides which ones to keep according
|
||||
to the `disconnect.replace` setting. If the `max_client_disconnect` or
|
||||
`disconnect.losta_after` duration expires before the client reconnects,
|
||||
the allocations will be marked "lost".
|
||||
|
||||
@@ -22,6 +22,13 @@ If specified at the job level, the configuration will apply to all groups
|
||||
within the job. Only service jobs with a count greater than 1 support migrate
|
||||
blocks.
|
||||
|
||||
Migrating happens when a Nomad node is drained. When a node is lost, Nomad
|
||||
[replaces][] the allocations instead and ignores the `migrate` block. When the
|
||||
agent fails to set up the allocation or the tasks of an allocation more than
|
||||
their [`restart`][] block allows, Nomad [reschedules][] the allocations instead
|
||||
and ignores the `migrate` block.
|
||||
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
migrate {
|
||||
@@ -78,3 +85,6 @@ on node draining.
|
||||
[count]: /nomad/docs/job-specification/group#count
|
||||
[drain]: /nomad/docs/commands/node/drain
|
||||
[deadline]: /nomad/docs/commands/node/drain#deadline
|
||||
[replaces]: /nomad/docs/job-specification/disconnect#replace
|
||||
[`restart`]: /nomad/docs/job-specification/restart
|
||||
[reschedules]: /nomad/docs/job-specification/reschedule
|
||||
|
||||
@@ -22,15 +22,21 @@ description: >-
|
||||
]}
|
||||
/>
|
||||
|
||||
The `reschedule` block specifies the group's rescheduling strategy. If specified at the job
|
||||
level, the configuration will apply to all groups within the job. If the
|
||||
reschedule block is present on both the job and the group, they are merged with
|
||||
the group block taking the highest precedence and then the job.
|
||||
The `reschedule` block specifies the group's rescheduling strategy. If specified
|
||||
at the job level, the configuration will apply to all groups within the job. If
|
||||
the reschedule block is present on both the job and the group, they are merged
|
||||
with the group block taking the highest precedence and then the job.
|
||||
|
||||
Nomad will attempt to schedule the allocation on another node if any of its
|
||||
task statuses become `failed`. The scheduler prefers to create a replacement
|
||||
Nomad will attempt to schedule the allocation on another node if any of its task
|
||||
statuses become `failed`. The scheduler prefers to create a replacement
|
||||
allocation on a node that was not used by a previous allocation.
|
||||
|
||||
Rescheduling happens when the Nomad agent fails to set up the allocation or the
|
||||
tasks of an allocation fail more than their [`restart`][] block allows. When a
|
||||
node is drained, Nomad [migrates][] the allocations instead and ignores the
|
||||
`reschedule` block. When a node is lost, Nomad [replaces][] the allocations
|
||||
instead and ignores the `reschedule` block.
|
||||
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
@@ -131,3 +137,7 @@ job "docs" {
|
||||
```
|
||||
|
||||
[`progress_deadline`]: /nomad/docs/job-specification/update#progress_deadline
|
||||
[`restart`]: /nomad/docs/job-specification/restart
|
||||
[migrates]: /nomad/docs/job-specification/migrate
|
||||
[replaces]: /nomad/docs/job-specification/disconnect#replace
|
||||
[reschedules]: /nomad/docs/job-specification/reschedule
|
||||
|
||||
@@ -14,7 +14,8 @@ description: The "restart" block configures a group's behavior on task failure.
|
||||
/>
|
||||
|
||||
The `restart` block configures a task's behavior on task failure. Restarts
|
||||
happen on the client that is running the task.
|
||||
happen on the client that is running the task. Restarts are different from
|
||||
[rescheduling][], which happens when the tasks run out of restart attempts.
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
@@ -192,3 +193,4 @@ restart {
|
||||
|
||||
[sidecar_task]: /nomad/docs/job-specification/sidecar_task
|
||||
[`reschedule`]: /nomad/docs/job-specification/reschedule
|
||||
[rescheduling]: /nomad/docs/job-specification/reschedule
|
||||
|
||||
Reference in New Issue
Block a user