diff --git a/website/data/docs-navigation.js b/website/data/docs-navigation.js index 5964d13c5..afe9623d5 100644 --- a/website/data/docs-navigation.js +++ b/website/data/docs-navigation.js @@ -86,7 +86,7 @@ export default [ { category: 'deployment', - content: ['fail', 'list', 'pause', 'promote', 'resume', 'status'] + content: ['fail', 'list', 'pause', 'promote', 'resume', 'status', 'unblock'] }, 'eval-status', { @@ -167,6 +167,7 @@ export default [ 'logs', 'meta', 'migrate', + 'multiregion', 'network', 'parameterized', 'periodic', diff --git a/website/pages/api-docs/deployments.mdx b/website/pages/api-docs/deployments.mdx index 655445466..b92b1e0cf 100644 --- a/website/pages/api-docs/deployments.mdx +++ b/website/pages/api-docs/deployments.mdx @@ -491,3 +491,49 @@ $ curl \ "Index": 20 } ``` + +## Unblock Deployment + +This endpoint is used to manually mark a blocked multiregion deployment as +successful. A blocked deployment is a multiregion deployment within a region +that has completed within a region but is waiting on the other federated +regions. The endpoint can be used in cases where a failed peer region is +unable to communicate its failed deployment status to other regions to force a +deployment to complete. + +| Method | Path | Produces | +| ------ | ------------------------------------ | ------------------ | +| `POST` | `/v1/deployment/unblock/:deployment_id` | `application/json` | + +The table below shows this endpoint's support for +[blocking queries](/api-docs#blocking-queries) and +[required ACLs](/api-docs#acls). + +| Blocking Queries | ACL Required | +| ---------------- | ---------------------- | +| `NO` | `namespace:submit-job` | + +### Parameters + +- `:deployment_id` `(string: )`- Specifies the UUID of the deployment. + This must be the full UUID, not the short 8-character one. This is specified + as part of the path. + +### Sample Request + +```shell-session +$ curl \ + --request POST \ + https://localhost:4646/v1/deployment/unblock/5456bd7a-9fc0-c0dd-6131-cbee77f57577 +``` + +### Sample Response + +```json +{ + "EvalID": "0d834913-58a0-81ac-6e33-e452d83a0c66", + "EvalCreateIndex": 20, + "DeploymentModifyIndex": 20, + "Index": 20 +} +``` diff --git a/website/pages/docs/commands/deployment/unblock.mdx b/website/pages/docs/commands/deployment/unblock.mdx new file mode 100644 index 000000000..a3b8a59d4 --- /dev/null +++ b/website/pages/docs/commands/deployment/unblock.mdx @@ -0,0 +1,73 @@ +--- +layout: docs +page_title: 'Commands: deployment unblock' +sidebar_title: unblock +description: | + The deployment unblock command is used to manually unblock a deployment. +--- + +# Command: deployment unblock + +The `deployment unblock` command is used to manually mark a blocked +multiregion deployment as successful. A blocked deployment is a multiregion +deployment within a region that has completed within a region but is waiting +on the other [federated regions]. The `deployment unblock` command can be used +in cases where a failed peer region is unable to communicate its failed +deployment status to other regions to force a deployment to complete. + +## Usage + +```plaintext +nomad deployment unblock [options] +``` + +The `deployment unblock` command requires a single argument, a deployment ID or +prefix. + +## General Options + +@include 'general_options.mdx' + +## Unblock Options + +- `-detach`: Return immediately instead of monitoring. A new evaluation ID + will be output, which can be used to examine the evaluation using the + [eval status] command. + +- `-verbose`: Show full information. + +## Examples + +Manually mark an ongoing deployment as unblocked. The deployment status shows +an error on the unreachable "east" region. + +```shell-session +$ nomad deployment unblock 8990cfbc +Deployment "8990cfbc-28c0-cb28-ca31-856cf691b987" unblocked + +==> Monitoring evaluation "a2d97ad5" + Evaluation triggered by job "example" + Evaluation within deployment: "8990cfbc" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "a2d97ad5" finished with status "complete" + +$ nomad deployment status 8990cfbc +ID = 8990cfbc +Job ID = example +Job Version = 2 +Status = successful +Description = Deployment successful + +Multi-region Deployment +Region ID Status +west 8990cfbc successful +south 085787e3 blocked +east (error) + +Deployed +Task Group Desired Placed Healthy Unhealthy +cache 3 2 1 0 +``` + +[eval status]: /docs/commands/eval-status +[federated regions]: https://learn.hashicorp.com/nomad/operating-nomad/federation diff --git a/website/pages/docs/enterprise/index.mdx b/website/pages/docs/enterprise/index.mdx index 8cf96255c..704c4cb5f 100644 --- a/website/pages/docs/enterprise/index.mdx +++ b/website/pages/docs/enterprise/index.mdx @@ -86,7 +86,21 @@ When a Nomad cluster is at capacity for a given set of placement constraints, an Preemption enables Nomad's scheduler to automatically evict lower priority allocations of service and batch jobs so that allocations from higher priority jobs can be placed. This behavior ensures that critical workloads can run when resources are limited or when partial outages require workloads to be rescheduled across a smaller set of client nodes. +## Multi-Cluster & Efficiency + +Multi-Cluster & Efficiency features are part of an add-on module that enables +an organization to operate Nomad at scale across multiple clusters through +features such as Multiregion Deployments. + +### Multiregion Deployments + +[Multiregion Deployments] enable an operator to deploy a single job to multiple +federated regions. This includes the ability to control the order of rollouts +and how each region will behave in the event of a deployment failure. + ## Try Nomad Enterprise Click [here](https://www.hashicorp.com/go/nomad-enterprise) to set up a demo or request a trial of Nomad Enterprise. + +[Multiregion Deployments]: /docs/job-specification/multiregion diff --git a/website/pages/docs/job-specification/multiregion.mdx b/website/pages/docs/job-specification/multiregion.mdx new file mode 100644 index 000000000..79902536d --- /dev/null +++ b/website/pages/docs/job-specification/multiregion.mdx @@ -0,0 +1,264 @@ +--- +layout: docs +page_title: multiregion Stanza - Job Specification +sidebar_title: multiregion +description: |- + The "multiregion" stanza specifies that a job will be deployed to multiple federated + regions. +--- + +# `multiregion` Stanza + + + +~> **Enterprise Only!** This functionality only exists in Nomad +Enterprise. This is not present in the open source version of Nomad. + +The `multiregion` stanza specifies that a job will be deployed to multiple +[federated regions]. If omitted, the job will be deployed to a single region +— the one specified by the `region` field or the `-region` command line +flag to `nomad job run`. + +Federated Nomad clusters are members of the same gossip cluster but not the +same raft cluster; they don't share their data stores. Each region in a +multiregion deployment gets an independent copy of the job, parameterized with +the values of the `region` stanza. Nomad regions coordinate to rollout each +region's deployment using rules determined by the `strategy` stanza. + +```hcl +job "docs" { + multiregion { + + strategy { + max_parallel = 1 + on_failure = "fail_all" + } + + region "west" { + count = 2 + datacenters = ["west-1"] + meta { + my-key = "my-value-west" + } + } + + region "east" { + count = 5 + datacenters = ["east-1", "east-2"] + meta { + my-key = "my-value-east" + } + } + } +} +``` + +## Multiregion Deployment States + +A single region deployment using one of the various [upgrade strategies] +begins in the `running` state, and ends in the `successful` state, the +`cancelled` state (if another deployment supercedes it before it it's +complete), or the `failed` state. A failed single region deployment may +automatically revert to the previous version of the job if its `update` +stanza has the [`auto_revert`][update-auto-revert] setting. + +In a multiregion deployment, regions begin in the `pending` state. This allows +Nomad to determine that all regions have accepted the job before +continuing. At this point up to `max_parallel` regions will enter `running` at +a time. When each region completes its local deployment, it enters a `blocked` +state where it waits until the last region has completed the deployment. The +final region will unblock the regions to mark them as `successful`. + +## `multiregion` Parameters + +- `strategy` ([Strategy](#strategy-parameters): nil) - Specifies + a rollout strategy for the regions. + +- `region` ([Region](#region-parameters): nil) - Specifies the + parameters for a specific region. This can be specified multiple times to + define the set of regions for the multiregion deployment. Regions are + ordered; depending on the rollout strategy Nomad may roll out to each region + in order or to several at a time. + +### `strategy` Parameters + +- `max_parallel` `(int: )` - Specifies the maximum number + of region deployments that a multiregion will have in a running state at a + time. By default, Nomad will deploy all regions simultaneously. + +- `on_failure` `(string: )` - Specifies the behavior when a region + deployment fails. Available options are `"fail_all"`, `"fail_local"`, or + the default (empty `""`). This field and its interactions with the job's + [`update` stanza] is described in the [examples] below. + + Each region within a multiregion deployment follows the `auto_revert` + strategy of its own `update` stanza (if any). The multiregion `on_failure` + field tells Nomad how many other regions should be marked as failed when one + region's deployment fails: + + - The default behavior is that the failed region and all regions that come + after it in order are marked as failed. + + - If `on_failure: "fail_all"` is set, all regions will be marked as + failed. If all regions have already completed their deployments, it's + possible that a region may transition from `blocked` to `successful` while + another region is failing. This successful region cannot be rolled back. + + - If `on_failure: "fail_local"` is set, only the failed region will be marked + as failed. The remaining regions will move on to `blocked` status. At this + point, you'll need to manually unblock regions to mark them successful + with the [`nomad deployment unblock`] command or correct the conditions + that led to the failure and resubmit the job. + +~> For `system` jobs, only [`max_parallel`](#max_parallel) is enforced. The +`system` scheduler will be updated to support `on_failure` when the the +[`update` stanza] is fully supported for system jobs in a future release. + +### `region` Parameters + +The name of a region must match the name of one of the [federated regions]. + +- `count` `(int: )` - Specifies a default count for task + groups in the region. If a task group specifies its own `count`, this value + will be ignored. This value must be non-negative. + +- `datacenters` `(array: )` - A list of + datacenters in the region which are eligible for task placement. If not + provided, the `datacenters` field of the job will be used. + +- `meta` - `Meta: nil` - The meta stanza allows for user-defined arbitrary + key-value pairs. The meta specified for each region will be merged with the + meta stanza at the job level. + +As described above, the parameters for each region replace the default values +for the field with the same name for each region. + +## `multiregion` Examples + +The following examples only show the `multiregion` stanza and the other +stanzas it might be interacting with. + +### Max Parallel + +This example shows the use of `max_parallel`. This job will deploy first to +the "north" and "south" regions. If either "north" finishes and enters the +`blocked` state, then "east" will be next. At most 2 regions will be in a +`running` state at any given time. + +```hcl +multiregion { + + strategy { + max_parallel = 2 + } + + region "north" {} + region "south" {} + region "east" {} + region "west" {} +} +``` + +### Rollback Regions + +This example shows the default value of `on_failure`. Because `max_parallel = 1`, +the "north" region will deploy first, followed by "south", and so on. But +supposing the "east" region failed, both the "east" region and the "west" +region would be marked `failed`. Because the job has an `update` stanza with +`auto_revert=true`, both regions would then rollback to the previous job +version. The "north" and "south" regions would remain `blocked` until an +operator intervenes. + +```hcl +multiregion { + + strategy { + on_failure = "" + max_parallel = 1 + } + + region "north" {} + region "south" {} + region "east" {} + region "west" {} +} + +update { + auto_revert = true +} +``` + +### Override Counts + +This example shows how the `count` field override the default `count` of the +task group. The job the deploys 2 "worker" and 1 "controller" allocations to +the "west" region, and 5 "worker" and 1 "controller" task groups to the "east" +region. + +```hcl +multiregion { + + region "west" { + count = 2 + } + + region "east" { + count = 5 + } + } +} + +group "worker" {} + +group "controller" { + count = 1 +} +``` + +### Merging Meta + +This example shows how the `meta` is merged with the `meta` field of the job, +group, and task. A task in "west" will have the values +`first-key="regional-west"`, `second-key="group-level"`, whereas a task in +"east" will have the values `first-key="job-level"`, +`second-key="group-level"`. + +```hcl +multiregion { + + region "west" { + meta { + first-key = "regional-west" + second-key = "regional-west" + } + } + + region "east" { + meta { + second-key = "regional-east" + } + } + } +} + +meta { + first-key = "job-level" +} + +group "worker" { + meta { + second-key = "group-level" + } +} +``` + +[federated regions]: https://learn.hashicorp.com/nomad/operating-nomad/federation +[`update` stanza]: /docs/job-specification/update +[update-auto-revert]: /docs/job-specification/update#auto_revert +[examples]: #multiregion-examples +[upgrade strategies]: https://learn.hashicorp.com/nomad?track=update-strategies#update-strategies +[`nomad deployment unblock`]: /docs/commands/deployment/unblock