mirror of
https://github.com/kemko/nomad.git
synced 2026-01-01 16:05:42 +03:00
Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: https://github.com/hashicorp/nomad/issues/24918 Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
456 lines
16 KiB
Plaintext
456 lines
16 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: group Block - Job Specification
|
|
description: |-
|
|
The "group" block defines a series of tasks that should be co-located on the
|
|
same Nomad client. Any task within a group will be placed on the same client.
|
|
---
|
|
|
|
# `group` Block
|
|
|
|
<Placement groups={['job', 'group']} />
|
|
|
|
The `group` block defines a series of tasks that should be co-located on the
|
|
same Nomad client. Any [task][] within a group will be placed on the same
|
|
client.
|
|
|
|
```hcl
|
|
job "docs" {
|
|
group "example" {
|
|
# ...
|
|
}
|
|
}
|
|
```
|
|
|
|
## `group` Parameters
|
|
|
|
- `constraint` <code>([Constraint][]: nil)</code> -
|
|
This can be provided multiple times to define additional constraints.
|
|
|
|
- `affinity` <code>([Affinity][]: nil)</code> - This can be provided
|
|
multiple times to define preferred placement criteria.
|
|
|
|
- `spread` <code>([Spread][spread]: nil)</code> - This can be provided
|
|
multiple times to define criteria for spreading allocations across a
|
|
node attribute or metadata. See the
|
|
[Nomad spread reference](/nomad/docs/job-specification/spread) for more details.
|
|
|
|
- `count` `(int)` - Specifies the number of instances that should be running
|
|
under for this group. This value must be non-negative. This defaults to the
|
|
`min` value specified in the [`scaling`](/nomad/docs/job-specification/scaling)
|
|
block, if present; otherwise, this defaults to `1`.
|
|
|
|
- `consul` <code>([Consul][consul]: nil)</code> - Specifies Consul configuration
|
|
options specific to the group. These options will be applied to all tasks and
|
|
services in the group unless a task has its own `consul` block.
|
|
|
|
- `ephemeral_disk` <code>([EphemeralDisk][]: nil)</code> - Specifies the
|
|
ephemeral disk requirements of the group. Ephemeral disks can be marked as
|
|
sticky and support live data migrations.
|
|
|
|
- `disconnect` <code>([disconnect][]: nil)</code> - Specifies the disconnect
|
|
strategy for the server and client for all tasks in this group in case of a
|
|
network partition. The tasks can be left unconnected, stopped or replaced
|
|
when the client disconnects. The policy for reconciliation in case the client
|
|
regains connectivity is also specified here.
|
|
|
|
- `meta` <code>([Meta][]: nil)</code> - Specifies a key-value map that annotates
|
|
with user-defined metadata.
|
|
|
|
- `migrate` <code>([Migrate][]: nil)</code> - Specifies the group strategy for
|
|
migrating off of draining nodes. Only service jobs with a count greater than
|
|
1 support migrate blocks.
|
|
|
|
- `network` <code>([Network][]: <optional>)</code> - Specifies the network
|
|
requirements and configuration, including static and dynamic port allocations,
|
|
for the group.
|
|
|
|
- `prevent_reschedule_on_lost` `(bool: false)` - Defines the replacement
|
|
behavior of an allocation when the node it is running on misses heartbeats.
|
|
When enabled, if the node disconnects or goes down,
|
|
Nomad does not replace this allocation and shows it as `unknown` until the node
|
|
reconnects or you manually restart the node.
|
|
|
|
This behavior only modifies the replacement process on the server. To
|
|
modify the allocation behavior on the client, refer to
|
|
[`stop_after_client_disconnect`](#stop_after_client_disconnect).
|
|
|
|
The `unknown` allocation has to be manually stopped to run it again.
|
|
|
|
```plaintext
|
|
`nomad alloc stop <alloc ID>`
|
|
```
|
|
|
|
Setting `max_client_disconnect` and `prevent_reschedule_on_lost = true` at the
|
|
same time requires that [rescheduling is disabled entirely][`disable_rescheduling`].
|
|
|
|
We deprecated this field in favor of `replace` on the [`disconnect`] block,
|
|
see [example below][disconect_migration] for more details about migrating.
|
|
|
|
- `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a
|
|
rescheduling strategy. Nomad will then attempt to schedule the task on another
|
|
node if any of the group allocation statuses become "failed".
|
|
|
|
- `restart` <code>([Restart][]: nil)</code> - Specifies the restart policy for
|
|
all tasks in this group. If omitted, a default policy exists for each job
|
|
type, which can be found in the [restart block documentation][restart].
|
|
|
|
- `service` <code>([Service][]: nil)</code> - Specifies integrations with Nomad
|
|
or [Consul](/nomad/docs/configuration/consul) for service discovery. Nomad
|
|
automatically registers each service when an allocation is started and
|
|
de-registers them when the allocation is destroyed.
|
|
|
|
- `shutdown_delay` `(string: "0s")` - Specifies the duration to wait when
|
|
stopping a group's tasks. The delay occurs between Consul or Nomad service
|
|
deregistration and sending each task a shutdown signal. Ideally, services
|
|
would fail health checks once they receive a shutdown signal. Alternatively,
|
|
`shutdown_delay` may be set to give in-flight requests time to complete
|
|
before shutting down. A group level `shutdown_delay` will run regardless
|
|
if there are any defined group [services](/nomad/docs/job-specification/group#service)
|
|
and only applies to these services. In addition, tasks may have their own
|
|
[`shutdown_delay`](/nomad/docs/job-specification/task#shutdown_delay) which waits
|
|
between de-registering task services and stopping the task.
|
|
|
|
- `stop_after_client_disconnect` `(string: "")` - Specifies a duration after
|
|
which a Nomad client will stop allocations, if it cannot communicate with the
|
|
servers. By default, a client will not stop an allocation until explicitly
|
|
told to by a server. A client that fails to heartbeat to a server within the
|
|
[`heartbeat_grace`] window and any allocations running on it will be marked
|
|
"lost" and Nomad will schedule replacement allocations. The replaced
|
|
allocations will normally continue to run on the non-responsive client. But
|
|
you may want them to stop instead — for example, allocations requiring
|
|
exclusive access to an external resource. When specified, the Nomad client
|
|
will stop them after this duration.
|
|
The Nomad client process must be running for this to occur. This setting
|
|
cannot be used with [`max_client_disconnect`].
|
|
|
|
This field was deprecated in favour of `stop_after` on the [`disconnect`] block.
|
|
|
|
- `max_client_disconnect` `(string: "")` - Specifies a duration during which a
|
|
Nomad client will attempt to reconnect allocations after it fails to heartbeat
|
|
in the [`heartbeat_grace`] window. See [the example code
|
|
below][max-client-disconnect] for more details. This setting cannot be used
|
|
with [`stop_after_client_disconnect`].
|
|
|
|
This field was deprecated in favour of `lost_after` on the [`disconnect`] block.
|
|
|
|
- `task` <code>([Task][]: <required>)</code> - Specifies one or more tasks to run
|
|
within this group. This can be specified multiple times, to add a task as part
|
|
of the group.
|
|
|
|
- `update` <code>([Update][update]: nil)</code> - Specifies the task's update
|
|
strategy. When omitted, a default update strategy is applied.
|
|
|
|
- `vault` <code>([Vault][]: nil)</code> - Specifies the set of Vault policies
|
|
required by all tasks in this group. Overrides a `vault` block set at the
|
|
`job` level.
|
|
|
|
- `volume` <code>([Volume][]: nil)</code> - Specifies the volumes that are
|
|
required by tasks within the group.
|
|
|
|
## `group` Examples
|
|
|
|
The following examples only show the `group` blocks. Remember that the
|
|
`group` block is only valid in the placements listed above.
|
|
|
|
### Specifying Count
|
|
|
|
This example specifies that 5 instances of the tasks within this group should be
|
|
running:
|
|
|
|
```hcl
|
|
group "example" {
|
|
count = 5
|
|
}
|
|
```
|
|
|
|
### Tasks with Constraint
|
|
|
|
This example shows two abbreviated tasks with a constraint on the group. This
|
|
will restrict the tasks to 64-bit operating systems.
|
|
|
|
```hcl
|
|
group "example" {
|
|
constraint {
|
|
attribute = "${attr.cpu.arch}"
|
|
value = "amd64"
|
|
}
|
|
|
|
task "cache" {
|
|
# ...
|
|
}
|
|
|
|
task "server" {
|
|
# ...
|
|
}
|
|
}
|
|
```
|
|
|
|
### Metadata
|
|
|
|
This example show arbitrary user-defined metadata on the group:
|
|
|
|
```hcl
|
|
group "example" {
|
|
meta {
|
|
my-key = "my-value"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Network
|
|
|
|
This example shows network constraints as specified in the [network][] block
|
|
which uses the `bridge` networking mode, dynamically allocates two ports, and
|
|
statically allocates one port:
|
|
|
|
```hcl
|
|
group "example" {
|
|
network {
|
|
mode = "bridge"
|
|
port "http" {}
|
|
port "https" {}
|
|
port "lb" {
|
|
static = "8889"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Service Discovery
|
|
|
|
This example creates a service in Consul. To read more about service discovery
|
|
in Nomad, please see the [Nomad service discovery documentation][service_discovery].
|
|
|
|
```hcl
|
|
group "example" {
|
|
network {
|
|
port "api" {}
|
|
}
|
|
|
|
service {
|
|
name = "example"
|
|
port = "api"
|
|
tags = ["default"]
|
|
|
|
check {
|
|
type = "tcp"
|
|
interval = "10s"
|
|
timeout = "2s"
|
|
}
|
|
}
|
|
|
|
task "api" { ... }
|
|
}
|
|
```
|
|
|
|
### Stop After Client Disconnect
|
|
|
|
This example shows how `stop_after_client_disconnect` interacts with
|
|
other blocks. For the `first` group, after the default 10 second
|
|
[`heartbeat_grace`] window expires and 90 more seconds passes, the
|
|
server will reschedule the allocation. The client will wait 90 seconds
|
|
before sending a stop signal (`SIGTERM`) to the `first-task`
|
|
task. After 15 more seconds because of the task's `kill_timeout`, the
|
|
client will send `SIGKILL`. The `second` group does not have
|
|
`stop_after_client_disconnect`, so the server will reschedule the
|
|
allocation after the 10 second [`heartbeat_grace`] expires. It will
|
|
not be stopped on the client, regardless of how long the client is out
|
|
of touch.
|
|
|
|
Note that if the server's clocks are not closely synchronized with
|
|
each other, the server may reschedule the group before the client has
|
|
stopped the allocation. Operators should ensure that clock drift
|
|
between servers is as small as possible.
|
|
|
|
Note also that a group using this feature will be stopped on the
|
|
client if the Nomad server cluster fails, since the client will be
|
|
unable to contact any server in that case. Groups opting in to this
|
|
feature are therefore exposed to an additional runtime dependency and
|
|
potential point of failure.
|
|
|
|
```hcl
|
|
group "first" {
|
|
stop_after_client_disconnect = "90s"
|
|
|
|
task "first-task" {
|
|
kill_timeout = "15s"
|
|
}
|
|
}
|
|
|
|
group "second" {
|
|
|
|
task "second-task" {
|
|
kill_timeout = "5s"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Max Client Disconnect
|
|
|
|
`max_client_disconnect` specifies a duration during which a Nomad client will
|
|
attempt to reconnect allocations after it fails to heartbeat in the
|
|
[`heartbeat_grace`] window.
|
|
|
|
By default, allocations running on a client that fails to heartbeat will be
|
|
marked "lost". When a client reconnects, its allocations, which may still be
|
|
healthy, will restart because they have been marked "lost". This can cause
|
|
issues with stateful tasks or tasks with long restart times.
|
|
|
|
Instead, an operator may desire that these allocations reconnect without a
|
|
restart. When `max_client_disconnect` or `disconnect.lost_after` is specified,
|
|
the Nomad server marks clients that fail to heartbeat as "disconnected"
|
|
rather than "down", and will mark allocations on a disconnected client as
|
|
"unknown" rather than "lost". These allocations may continue to run on the
|
|
disconnected client. Replacement allocations will be scheduled according to the
|
|
allocations' `disconnect.replace` settings. until the disconnected client
|
|
reconnects. Once a disconnected client reconnects, Nomad compares the "unknown"
|
|
allocations with their replacements and decides which ones to keep according
|
|
to the `disconnect.replace` setting. If the `max_client_disconnect` or
|
|
`disconnect.losta_after` duration expires before the client reconnects,
|
|
the allocations will be marked "lost".
|
|
Clients that contain "unknown" allocations will transition to "disconnected"
|
|
rather than "down" until the last `max_client_disconnect` or `disconnect.lost_after`
|
|
duration has expired.
|
|
|
|
In the example code below, if both of these task groups were placed on the same
|
|
client and that client experienced a network outage, both of the group's
|
|
allocations would be marked as "disconnected" at two minutes because of the
|
|
client's `heartbeat_grace` value of "2m". If the network outage continued for
|
|
eight hours, and the client continued to fail to heartbeat, the client would
|
|
remain in a "disconnected" state, as the first group's `max_client_disconnect`
|
|
is twelve hours. Once all groups' `max_client_disconnect` durations are
|
|
exceeded, in this case in twelve hours, the client node will be marked as "down"
|
|
and the allocation will be marked as "lost". If the client had reconnected
|
|
before twelve hours had passed, the allocations would gracefully reconnect
|
|
without a restart.
|
|
|
|
Max Client Disconnect is useful for edge deployments, or scenarios when
|
|
operators want zero on-client downtime due to node connectivity issues. This
|
|
setting cannot be used with [`stop_after_client_disconnect`].
|
|
|
|
```hcl
|
|
# server_config.hcl
|
|
|
|
server {
|
|
enabled = true
|
|
heartbeat_grace = "2m"
|
|
}
|
|
```
|
|
|
|
```hcl
|
|
# jobspec.nomad
|
|
|
|
group "first" {
|
|
max_client_disconnect = "12h"
|
|
|
|
task "first-task" {
|
|
...
|
|
}
|
|
}
|
|
|
|
group "second" {
|
|
max_client_disconnect = "6h"
|
|
|
|
task "second-task" {
|
|
...
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Max Client Disconnect and Prevent Reschedule On Lost
|
|
|
|
Setting `max_client_disconnect` and `prevent_reschedule_on_lost = true` at the
|
|
same time requires that [rescheduling is disabled entirely][`disable_rescheduling`].
|
|
|
|
```hcl
|
|
# jobspec.nomad
|
|
|
|
group "first" {
|
|
max_client_disconnect = "12h"
|
|
prevent_reschedule_on_lost = true
|
|
|
|
reschedule {
|
|
attempts = 0
|
|
unlimited = false
|
|
}
|
|
|
|
task "first-task" {
|
|
...
|
|
}
|
|
}
|
|
```
|
|
|
|
If [`max_client_disconnect`](#max_client_disconnect) is set and
|
|
`prevent_reschedule_on_lost = true`, allocations on disconnected nodes will be
|
|
`unknown` until the `max_client_disconnect` window expires, at which point
|
|
the node will be transition from `disconnected` to `down`. The allocation
|
|
will remain as `unknown` and won't be rescheduled.
|
|
|
|
#### Migration to `disconnect` block
|
|
|
|
The new configuration fileds in the disconnect block work exactly the same as the
|
|
ones they are replacing:
|
|
* `stop_after_client_disconnect` is replaced by `stop_after`
|
|
* `max_client_disconnect` is replaced by `lost_after`
|
|
* `prevent_reschedule_on_lost` is replaced by `replace`
|
|
|
|
To keep the same behaviour as the old configuration upon reconnection, the
|
|
`reconcile` option should be set to `best_score`.
|
|
|
|
The following example shows how to migrate from the old configuration to the new one:
|
|
|
|
```hcl
|
|
job "docs" {
|
|
group "example" {
|
|
max_client_disconnect = "6h"
|
|
stop_after_client_disconnect = "2h"
|
|
prevent_reschedule_on_lost = true
|
|
}
|
|
}
|
|
```
|
|
Can be directly translated to:
|
|
|
|
```hcl
|
|
job "docs" {
|
|
group "example" {
|
|
disconnect {
|
|
lost_after = "6h"
|
|
stop_after = "2h"
|
|
replace = false
|
|
reconcile = "best_score"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
All use constrains still apply with the disconnect block as they did before:
|
|
- `stop_after` and `lost_after` can't be used together.
|
|
|
|
[task]: /nomad/docs/job-specification/task 'Nomad task Job Specification'
|
|
[job]: /nomad/docs/job-specification/job 'Nomad job Job Specification'
|
|
[constraint]: /nomad/docs/job-specification/constraint 'Nomad constraint Job Specification'
|
|
[consul]: /nomad/docs/job-specification/consul
|
|
[consul_namespace]: /nomad/docs/commands/job/run#consul-namespace
|
|
[spread]: /nomad/docs/job-specification/spread 'Nomad spread Job Specification'
|
|
[affinity]: /nomad/docs/job-specification/affinity 'Nomad affinity Job Specification'
|
|
[ephemeraldisk]: /nomad/docs/job-specification/ephemeral_disk 'Nomad ephemeral_disk Job Specification'
|
|
[`heartbeat_grace`]: /nomad/docs/configuration/server#heartbeat_grace
|
|
[`max_client_disconnect`]: /nomad/docs/job-specification/group#max_client_disconnect
|
|
[`disable_rescheduling`]: /nomad/docs/job-specification/reschedule#disabling-rescheduling
|
|
[max-client-disconnect]: /nomad/docs/job-specification/group#max-client-disconnect 'the example code below'
|
|
[`stop_after_client_disconnect`]: /nomad/docs/job-specification/group#stop_after_client_disconnect
|
|
[meta]: /nomad/docs/job-specification/meta 'Nomad meta Job Specification'
|
|
[migrate]: /nomad/docs/job-specification/migrate 'Nomad migrate Job Specification'
|
|
[network]: /nomad/docs/job-specification/network 'Nomad network Job Specification'
|
|
[reschedule]: /nomad/docs/job-specification/reschedule 'Nomad reschedule Job Specification'
|
|
[disconnect]: /nomad/docs/job-specification/disconnect 'Nomad disconnect Job Specification'
|
|
[restart]: /nomad/docs/job-specification/restart 'Nomad restart Job Specification'
|
|
[service]: /nomad/docs/job-specification/service 'Nomad service Job Specification'
|
|
[service_discovery]: /nomad/docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery'
|
|
[update]: /nomad/docs/job-specification/update 'Nomad update Job Specification'
|
|
[vault]: /nomad/docs/job-specification/vault 'Nomad vault Job Specification'
|
|
[volume]: /nomad/docs/job-specification/volume 'Nomad volume Job Specification'
|
|
[`consul.name`]: /nomad/docs/configuration/consul#name
|
|
[disconect_migration]: /nomad/docs/job-specification/group#migration_to_disconnect_block
|