Files
nomad/website/content/docs/job-scheduling/spread.mdx
Aimee Ukasick 53b083b8c5 Docs: Nomad IA (#26063)
* Move commands from docs to its own root-level directory

* temporarily use modified dev-portal branch with nomad ia changes

* explicitly clone nomad ia exp branch

* retrigger build, fixed dev-portal broken build

* architecture, concepts and get started individual pages

* fix get started section destinations

* reference section

* update repo comment in website-build.sh to show branch

* docs nav file update capitalization

* update capitalization to force deploy

* remove nomad-vs-kubernetes dir; move content to what is nomad pg

* job section

* Nomad operations category, deploy section

* operations category, govern section

* operations - manage

* operations/scale; concepts scheduling fix

* networking

* monitor

* secure section

* remote auth-methods folder and move up pages to sso; linkcheck

* Fix install2deploy redirects

* fix architecture redirects

* Job section: Add missing section index pages

* Add section index pages so breadcrumbs build correctly

* concepts/index fix front matter indentation

* move task driver plugin config to new deploy section

* Finish adding full URL to tutorials links in nav

* change SSO to Authentication in nav and file system

* Docs NomadIA: Move tutorials into NomadIA branch (#26132)

* Move governance and policy from tutorials to docs

* Move tutorials content to job-declare section

* run jobs section

* stateful workloads

* advanced job scheduling

* deploy section

* manage section

* monitor section

* secure/acl and secure/authorization

* fix example that contains an unseal key in real format

* remove images from sso-vault

* secure/traffic

* secure/workload-identities

* vault-acl change unseal key and root token in command output sample

* remove lines from sample output

* fix front matter

* move nomad pack tutorials to tools

* search/replace /nomad/tutorials links

* update acl overview with content from deleted architecture/acl

* fix spelling mistake

* linkcheck - fix broken links

* fix link to Nomad variables tutorial

* fix link to Prometheus tutorial

* move who uses Nomad to use cases page; move spec/config shortcuts

add dividers

* Move Consul out of Integrations; move namespaces to govern

* move integrations/vault to secure/vault; delete integrations

* move ref arch to docs; rename Deploy Nomad back to Install Nomad

* address feedback

* linkcheck fixes

* Fixed raw_exec redirect

* add info from /nomad/tutorials/manage-jobs/jobs

* update page content with newer tutorial

* link updates for architecture sub-folders

* Add redirects for removed section index pages. Fix links.

* fix broken links from linkcheck

* Revert to use dev-portal main branch instead of nomadIA branch

* build workaround: add intro-nav-data.json with single entry

* fix content-check error

* add intro directory to get around Vercel build error

* workound for emtpry directory

* remove mdx from /intro/ to fix content-check and git snafu

* Add intro index.mdx so Vercel build should work

---------

Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>
2025-07-08 19:24:52 -05:00

278 lines
10 KiB
Plaintext

---
layout: docs
page_title: Use spread to increase failure tolerance
description: >-
Create a job with the spread stanza to prevent application downtime
as a result of a physical domain failure in a datacenter or rack.
---
# Use spread to increase failure tolerance
The Nomad scheduler uses a bin-packing algorithm when making job placements on
nodes to optimize resource utilization and density of applications. Although bin
packing ensures optimal resource utilization, it can lead to some nodes carrying
a majority of allocations for a given job. This can cause cascading failures
where the failure of a single node or a single data center can lead to
application unavailability.
The [spread stanza][spread-stanza] solves this problem by allowing operators to
distribute their workloads in a customized way based on [attributes] and/or
[client metadata][client-metadata]. By using spread criteria in their job
specification, Nomad job operators can ensure that failures across a domain such
as datacenter or rack don't affect application availability.
Consider a Nomad application that needs to be deployed to multiple datacenters
within a region. Datacenter `dc1` has four nodes while `dc2` has one node. This
application has ten instances and seven of them must be deployed to `dc1` since
it receives more user traffic and you need to make sure the application doesn't
suffer downtime due to not enough running instances to process requests. The
remaining 3 allocations can be deployed to `dc2`.
Use the `spread` stanza in the Nomad [job specification][job-specification] to
ensure the 70% of the workload is being placed in datacenter `dc1` and 30% is
being placed in `dc2`. The Nomad operator can use the [percent] option with a
[target] to customize the spread.
### Prerequisites
To perform the tasks described in this guide, you need to have a Nomad
environment with Consul installed. You can use this [repository] to provision a
sandbox environment. This guide will assume a cluster with one server node and
five client nodes.
<Tip>
This guide is for demo purposes and is only using a single
server node. In a production cluster, 3 or 5 server nodes are recommended.
</Tip>
## Place one of the client nodes in a different datacenter
In this guide, you are going to customize the spread for our job placement
between the datacenters our nodes are located in. Choose one of your client
nodes and edit `/etc/nomad.d/nomad.hcl` to change its location to `dc2`. A
snippet of an example configuration file is show below with the required change
is shown below.
```hcl
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
datacenter = "dc2"
# Enable the client
client {
enabled = true
# ...
}
```
After making the change on your chosen client node, restart the Nomad service
```shell-session
$ sudo systemctl restart nomad
```
If everything is configured correctly, you should be able to run the [`nomad node status`][node-status] command and confirm that one of your nodes is now in
datacenter `dc2`.
```shell-session
$ nomad node status
ID DC Name Class Drain Eligibility Status
5d16d949 dc2 ip-172-31-62-240 <none> false eligible ready
7b381152 dc1 ip-172-31-59-115 <none> false eligible ready
10cc48cc dc1 ip-172-31-58-46 <none> false eligible ready
93f1e628 dc1 ip-172-31-58-113 <none> false eligible ready
12894b80 dc1 ip-172-31-62-90 <none> false eligible ready
```
## Create a job with the spread stanza
Create a file with the name `redis.nomad.hcl` and place the following content in it:
```hcl
job "redis" {
datacenters = ["dc1", "dc2"]
type = "service"
spread {
attribute = "${node.datacenter}"
weight = 100
target "dc1" {
percent = 70
}
target "dc2" {
percent = 30
}
}
group "cache1" {
count = 10
network {
port "db" {
to = 6379
}
}
task "redis" {
driver = "docker"
config {
image = "redis:latest"
ports = ["db"]
}
service {
name = "redis-cache"
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
}
```
Note that the job specifies the `spread` stanza and specified the
[datacenter][attributes] attribute while targeting `dc1` and `dc2` with the
percent options. This will tell the Nomad scheduler to make an attempt to
distribute 70% of the workload on `dc1` and 30% of the workload on `dc2`.
## Register the redis.nomad.hcl job
Run the Nomad job with the following command:
```shell-session
$ nomad run redis.nomad.hcl
==> Monitoring evaluation "c3dc5ebd"
Evaluation triggered by job "redis"
Allocation "7a374183" created: node "5d16d949", group "cache1"
Allocation "f4361df1" created: node "7b381152", group "cache1"
Allocation "f7af42dc" created: node "5d16d949", group "cache1"
Allocation "0638edf2" created: node "10cc48cc", group "cache1"
Allocation "49bc6038" created: node "12894b80", group "cache1"
Allocation "c7e5679a" created: node "5d16d949", group "cache1"
Allocation "cf91bf65" created: node "7b381152", group "cache1"
Allocation "d16b606c" created: node "12894b80", group "cache1"
Allocation "27866df0" created: node "93f1e628", group "cache1"
Allocation "8531a6fc" created: node "7b381152", group "cache1"
Evaluation status changed: "pending" -> "complete"
```
Note that three of the ten allocations have been placed on node `5d16d949`. This
is the node configured to be in datacenter `dc2`. The Nomad scheduler has
distributed 30% of the workload to `dc2` as specified in the `spread` stanza.
Keep in mind that the Nomad scheduler still factors in other components into the
overall scoring of nodes when making placements, so you should not expect the
spread stanza to strictly implement your distribution preferences like a
[constraint][constraint-stanza]. Now, take a detailed look at the scoring in
the next few steps.
## Check the status of the job
Check the status of the job and verify where allocations have been placed. Run
the following command:
```shell-session
$ nomad status redis
```
The output should list ten running instances of your job in the `Summary`
section as show below:
```plaintext
...
Summary
Task Group Queued Starting Running Failed Complete Lost
cache1 0 0 10 0 0 0
Allocations
ID Node ID Task Group Version Desired Status Created Modified
0638edf2 10cc48cc cache1 0 run running 2m20s ago 2m ago
27866df0 93f1e628 cache1 0 run running 2m20s ago 1m57s ago
49bc6038 12894b80 cache1 0 run running 2m20s ago 1m58s ago
7a374183 5d16d949 cache1 0 run running 2m20s ago 2m1s ago
8531a6fc 7b381152 cache1 0 run running 2m20s ago 2m2s ago
c7e5679a 5d16d949 cache1 0 run running 2m20s ago 1m55s ago
cf91bf65 7b381152 cache1 0 run running 2m20s ago 1m57s ago
d16b606c 12894b80 cache1 0 run running 2m20s ago 2m1s ago
f4361df1 7b381152 cache1 0 run running 2m20s ago 2m3s ago
f7af42dc 5d16d949 cache1 0 run running 2m20s ago 1m54s ago
```
You can cross-check this output with the results of the `nomad node status`
command to verify that 30% of your workload has been placed on the node in `dc2`
(in our case, that node is `5d16d949`).
## Obtain detailed scoring information on job placement
The Nomad scheduler will not always spread your workload in the way you have
specified in the `spread` stanza even if the resources are available. This is
because spread scoring is factored in with other metrics as well before making a
scheduling decision. In this step, you will take a look at some of those other
factors.
Using the output from the previous step, take any allocation that has been
placed on a node and use the nomad [alloc status][alloc status] command with the
[verbose][verbose] option to obtain detailed scoring information on it. In this
example, the guide refers to allocation ID `0638edf2`—your allocation IDs will
be different.
```shell-session
$ nomad alloc status -verbose 0638edf2
```
The resulting output will show the `Placement Metrics` section at the bottom.
```plaintext
...
Placement Metrics
Node node-affinity allocation-spread binpack job-anti-affinity node-reschedule-penalty final score
10cc48cc-2913-af54-74d5-d7559f373ff2 0 0.429 0.33 0 0 0.379
93f1e628-e509-b1ab-05b7-0944056f781d 0 0.429 0.515 -0.2 0 0.248
12894b80-4943-4d5c-5716-c626c6b99be3 0 0.429 0.515 -0.2 0 0.248
7b381152-3802-258b-4155-6d7dfb344dd4 0 0.429 0.515 -0.2 0 0.248
5d16d949-85aa-3fd3-b5f4-51094cbeb77a 0 0.333 0.515 -0.2 0 0.216
```
Note that the results from the `allocation-spread`, `binpack`,
`job-anti-affinity`, `node-reschedule-penalty`, and `node-affinity` columns are
combined to produce the numbers listed in the `final score` column for each
node. The Nomad scheduler uses the final score for each node in deciding where
to make placements.
## Next steps
Change the values of the `percent` options on your targets in the `spread`
stanza and observe how the placement behavior along with the final score given
to each node changes (use the `nomad alloc status` command as shown in the
previous step).
### Reference material
- The [spread][spread-stanza] stanza documentation
- [Scheduling][scheduling] with Nomad
[alloc status]: /nomad/commands/alloc/status
[attributes]: /nomad/docs/reference/runtime-variable-interpolation
[client-metadata]: /nomad/docs/configuration/client#meta
[constraint-stanza]: /nomad/docs/job-specification/constraint
[job-specification]: /nomad/docs/job-specification
[node-status]: /nomad/commands/node/status
[percent]: /nomad/docs/job-specification/spread#percent
[repository]: https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud
[scheduling]: /nomad/docs/concepts/scheduling/how-scheduling-works
[spread-stanza]: /nomad/docs/job-specification/spread
[target]: /nomad/docs/job-specification/spread#target
[verbose]: /nomad/commands/alloc/status#verbose