docs: multiregion deployment feature (#8185)

This commit is contained in:
Tim Gross
2020-06-18 08:41:22 -04:00
committed by GitHub
parent ae4f368c4c
commit 6fd4f3e985
5 changed files with 399 additions and 1 deletions

View File

@@ -86,7 +86,7 @@ export default [
{
category: 'deployment',
content: ['fail', 'list', 'pause', 'promote', 'resume', 'status']
content: ['fail', 'list', 'pause', 'promote', 'resume', 'status', 'unblock']
},
'eval-status',
{
@@ -167,6 +167,7 @@ export default [
'logs',
'meta',
'migrate',
'multiregion',
'network',
'parameterized',
'periodic',

View File

@@ -491,3 +491,49 @@ $ curl \
"Index": 20
}
```
## Unblock Deployment
This endpoint is used to manually mark a blocked multiregion deployment as
successful. A blocked deployment is a multiregion deployment within a region
that has completed within a region but is waiting on the other federated
regions. The endpoint can be used in cases where a failed peer region is
unable to communicate its failed deployment status to other regions to force a
deployment to complete.
| Method | Path | Produces |
| ------ | ------------------------------------ | ------------------ |
| `POST` | `/v1/deployment/unblock/:deployment_id` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | ---------------------- |
| `NO` | `namespace:submit-job` |
### Parameters
- `:deployment_id` `(string: <required>)`- Specifies the UUID of the deployment.
This must be the full UUID, not the short 8-character one. This is specified
as part of the path.
### Sample Request
```shell-session
$ curl \
--request POST \
https://localhost:4646/v1/deployment/unblock/5456bd7a-9fc0-c0dd-6131-cbee77f57577
```
### Sample Response
```json
{
"EvalID": "0d834913-58a0-81ac-6e33-e452d83a0c66",
"EvalCreateIndex": 20,
"DeploymentModifyIndex": 20,
"Index": 20
}
```

View File

@@ -0,0 +1,73 @@
---
layout: docs
page_title: 'Commands: deployment unblock'
sidebar_title: unblock
description: |
The deployment unblock command is used to manually unblock a deployment.
---
# Command: deployment unblock
The `deployment unblock` command is used to manually mark a blocked
multiregion deployment as successful. A blocked deployment is a multiregion
deployment within a region that has completed within a region but is waiting
on the other [federated regions]. The `deployment unblock` command can be used
in cases where a failed peer region is unable to communicate its failed
deployment status to other regions to force a deployment to complete.
## Usage
```plaintext
nomad deployment unblock [options] <deployment id>
```
The `deployment unblock` command requires a single argument, a deployment ID or
prefix.
## General Options
@include 'general_options.mdx'
## Unblock Options
- `-detach`: Return immediately instead of monitoring. A new evaluation ID
will be output, which can be used to examine the evaluation using the
[eval status] command.
- `-verbose`: Show full information.
## Examples
Manually mark an ongoing deployment as unblocked. The deployment status shows
an error on the unreachable "east" region.
```shell-session
$ nomad deployment unblock 8990cfbc
Deployment "8990cfbc-28c0-cb28-ca31-856cf691b987" unblocked
==> Monitoring evaluation "a2d97ad5"
Evaluation triggered by job "example"
Evaluation within deployment: "8990cfbc"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "a2d97ad5" finished with status "complete"
$ nomad deployment status 8990cfbc
ID = 8990cfbc
Job ID = example
Job Version = 2
Status = successful
Description = Deployment successful
Multi-region Deployment
Region ID Status
west 8990cfbc successful
south 085787e3 blocked
east (error) <none> <none>
Deployed
Task Group Desired Placed Healthy Unhealthy
cache 3 2 1 0
```
[eval status]: /docs/commands/eval-status
[federated regions]: https://learn.hashicorp.com/nomad/operating-nomad/federation

View File

@@ -86,7 +86,21 @@ When a Nomad cluster is at capacity for a given set of placement constraints, an
Preemption enables Nomad's scheduler to automatically evict lower priority allocations of service and batch jobs so that allocations from higher priority jobs can be placed. This behavior ensures that critical workloads can run when resources are limited or when partial outages require workloads to be rescheduled across a smaller set of client nodes.
## Multi-Cluster & Efficiency
Multi-Cluster & Efficiency features are part of an add-on module that enables
an organization to operate Nomad at scale across multiple clusters through
features such as Multiregion Deployments.
### Multiregion Deployments
[Multiregion Deployments] enable an operator to deploy a single job to multiple
federated regions. This includes the ability to control the order of rollouts
and how each region will behave in the event of a deployment failure.
## Try Nomad Enterprise
Click [here](https://www.hashicorp.com/go/nomad-enterprise) to set up a demo or request a trial
of Nomad Enterprise.
[Multiregion Deployments]: /docs/job-specification/multiregion

View File

@@ -0,0 +1,264 @@
---
layout: docs
page_title: multiregion Stanza - Job Specification
sidebar_title: multiregion
description: |-
The "multiregion" stanza specifies that a job will be deployed to multiple federated
regions.
---
# `multiregion` Stanza
<Placement
groups={[
['job', 'multiregion'],
]}
/>
~> **Enterprise Only!** This functionality only exists in Nomad
Enterprise. This is not present in the open source version of Nomad.
The `multiregion` stanza specifies that a job will be deployed to multiple
[federated regions]. If omitted, the job will be deployed to a single region
— the one specified by the `region` field or the `-region` command line
flag to `nomad job run`.
Federated Nomad clusters are members of the same gossip cluster but not the
same raft cluster; they don't share their data stores. Each region in a
multiregion deployment gets an independent copy of the job, parameterized with
the values of the `region` stanza. Nomad regions coordinate to rollout each
region's deployment using rules determined by the `strategy` stanza.
```hcl
job "docs" {
multiregion {
strategy {
max_parallel = 1
on_failure = "fail_all"
}
region "west" {
count = 2
datacenters = ["west-1"]
meta {
my-key = "my-value-west"
}
}
region "east" {
count = 5
datacenters = ["east-1", "east-2"]
meta {
my-key = "my-value-east"
}
}
}
}
```
## Multiregion Deployment States
A single region deployment using one of the various [upgrade strategies]
begins in the `running` state, and ends in the `successful` state, the
`cancelled` state (if another deployment supercedes it before it it's
complete), or the `failed` state. A failed single region deployment may
automatically revert to the previous version of the job if its `update`
stanza has the [`auto_revert`][update-auto-revert] setting.
In a multiregion deployment, regions begin in the `pending` state. This allows
Nomad to determine that all regions have accepted the job before
continuing. At this point up to `max_parallel` regions will enter `running` at
a time. When each region completes its local deployment, it enters a `blocked`
state where it waits until the last region has completed the deployment. The
final region will unblock the regions to mark them as `successful`.
## `multiregion` Parameters
- `strategy` <code>([Strategy](#strategy-parameters): nil)</code> - Specifies
a rollout strategy for the regions.
- `region` <code>([Region](#region-parameters): nil)</code> - Specifies the
parameters for a specific region. This can be specified multiple times to
define the set of regions for the multiregion deployment. Regions are
ordered; depending on the rollout strategy Nomad may roll out to each region
in order or to several at a time.
### `strategy` Parameters
- `max_parallel` `(int: <optional>)` - Specifies the maximum number
of region deployments that a multiregion will have in a running state at a
time. By default, Nomad will deploy all regions simultaneously.
- `on_failure` `(string: <optional>)` - Specifies the behavior when a region
deployment fails. Available options are `"fail_all"`, `"fail_local"`, or
the default (empty `""`). This field and its interactions with the job's
[`update` stanza] is described in the [examples] below.
Each region within a multiregion deployment follows the `auto_revert`
strategy of its own `update` stanza (if any). The multiregion `on_failure`
field tells Nomad how many other regions should be marked as failed when one
region's deployment fails:
- The default behavior is that the failed region and all regions that come
after it in order are marked as failed.
- If `on_failure: "fail_all"` is set, all regions will be marked as
failed. If all regions have already completed their deployments, it's
possible that a region may transition from `blocked` to `successful` while
another region is failing. This successful region cannot be rolled back.
- If `on_failure: "fail_local"` is set, only the failed region will be marked
as failed. The remaining regions will move on to `blocked` status. At this
point, you'll need to manually unblock regions to mark them successful
with the [`nomad deployment unblock`] command or correct the conditions
that led to the failure and resubmit the job.
~> For `system` jobs, only [`max_parallel`](#max_parallel) is enforced. The
`system` scheduler will be updated to support `on_failure` when the the
[`update` stanza] is fully supported for system jobs in a future release.
### `region` Parameters
The name of a region must match the name of one of the [federated regions].
- `count` `(int: <optional>)` - Specifies a default count for task
groups in the region. If a task group specifies its own `count`, this value
will be ignored. This value must be non-negative.
- `datacenters` `(array<string>: <optional>)` - A list of
datacenters in the region which are eligible for task placement. If not
provided, the `datacenters` field of the job will be used.
- `meta` - `Meta: nil` - The meta stanza allows for user-defined arbitrary
key-value pairs. The meta specified for each region will be merged with the
meta stanza at the job level.
As described above, the parameters for each region replace the default values
for the field with the same name for each region.
## `multiregion` Examples
The following examples only show the `multiregion` stanza and the other
stanzas it might be interacting with.
### Max Parallel
This example shows the use of `max_parallel`. This job will deploy first to
the "north" and "south" regions. If either "north" finishes and enters the
`blocked` state, then "east" will be next. At most 2 regions will be in a
`running` state at any given time.
```hcl
multiregion {
strategy {
max_parallel = 2
}
region "north" {}
region "south" {}
region "east" {}
region "west" {}
}
```
### Rollback Regions
This example shows the default value of `on_failure`. Because `max_parallel = 1`,
the "north" region will deploy first, followed by "south", and so on. But
supposing the "east" region failed, both the "east" region and the "west"
region would be marked `failed`. Because the job has an `update` stanza with
`auto_revert=true`, both regions would then rollback to the previous job
version. The "north" and "south" regions would remain `blocked` until an
operator intervenes.
```hcl
multiregion {
strategy {
on_failure = ""
max_parallel = 1
}
region "north" {}
region "south" {}
region "east" {}
region "west" {}
}
update {
auto_revert = true
}
```
### Override Counts
This example shows how the `count` field override the default `count` of the
task group. The job the deploys 2 "worker" and 1 "controller" allocations to
the "west" region, and 5 "worker" and 1 "controller" task groups to the "east"
region.
```hcl
multiregion {
region "west" {
count = 2
}
region "east" {
count = 5
}
}
}
group "worker" {}
group "controller" {
count = 1
}
```
### Merging Meta
This example shows how the `meta` is merged with the `meta` field of the job,
group, and task. A task in "west" will have the values
`first-key="regional-west"`, `second-key="group-level"`, whereas a task in
"east" will have the values `first-key="job-level"`,
`second-key="group-level"`.
```hcl
multiregion {
region "west" {
meta {
first-key = "regional-west"
second-key = "regional-west"
}
}
region "east" {
meta {
second-key = "regional-east"
}
}
}
}
meta {
first-key = "job-level"
}
group "worker" {
meta {
second-key = "group-level"
}
}
```
[federated regions]: https://learn.hashicorp.com/nomad/operating-nomad/federation
[`update` stanza]: /docs/job-specification/update
[update-auto-revert]: /docs/job-specification/update#auto_revert
[examples]: #multiregion-examples
[upgrade strategies]: https://learn.hashicorp.com/nomad?track=update-strategies#update-strategies
[`nomad deployment unblock`]: /docs/commands/deployment/unblock