mirror of
https://github.com/kemko/nomad.git
synced 2026-01-06 18:35:44 +03:00
* Move commands from docs to its own root-level directory * temporarily use modified dev-portal branch with nomad ia changes * explicitly clone nomad ia exp branch * retrigger build, fixed dev-portal broken build * architecture, concepts and get started individual pages * fix get started section destinations * reference section * update repo comment in website-build.sh to show branch * docs nav file update capitalization * update capitalization to force deploy * remove nomad-vs-kubernetes dir; move content to what is nomad pg * job section * Nomad operations category, deploy section * operations category, govern section * operations - manage * operations/scale; concepts scheduling fix * networking * monitor * secure section * remote auth-methods folder and move up pages to sso; linkcheck * Fix install2deploy redirects * fix architecture redirects * Job section: Add missing section index pages * Add section index pages so breadcrumbs build correctly * concepts/index fix front matter indentation * move task driver plugin config to new deploy section * Finish adding full URL to tutorials links in nav * change SSO to Authentication in nav and file system * Docs NomadIA: Move tutorials into NomadIA branch (#26132) * Move governance and policy from tutorials to docs * Move tutorials content to job-declare section * run jobs section * stateful workloads * advanced job scheduling * deploy section * manage section * monitor section * secure/acl and secure/authorization * fix example that contains an unseal key in real format * remove images from sso-vault * secure/traffic * secure/workload-identities * vault-acl change unseal key and root token in command output sample * remove lines from sample output * fix front matter * move nomad pack tutorials to tools * search/replace /nomad/tutorials links * update acl overview with content from deleted architecture/acl * fix spelling mistake * linkcheck - fix broken links * fix link to Nomad variables tutorial * fix link to Prometheus tutorial * move who uses Nomad to use cases page; move spec/config shortcuts add dividers * Move Consul out of Integrations; move namespaces to govern * move integrations/vault to secure/vault; delete integrations * move ref arch to docs; rename Deploy Nomad back to Install Nomad * address feedback * linkcheck fixes * Fixed raw_exec redirect * add info from /nomad/tutorials/manage-jobs/jobs * update page content with newer tutorial * link updates for architecture sub-folders * Add redirects for removed section index pages. Fix links. * fix broken links from linkcheck * Revert to use dev-portal main branch instead of nomadIA branch * build workaround: add intro-nav-data.json with single entry * fix content-check error * add intro directory to get around Vercel build error * workound for emtpry directory * remove mdx from /intro/ to fix content-check and git snafu * Add intro index.mdx so Vercel build should work --------- Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>
278 lines
10 KiB
Plaintext
278 lines
10 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Use spread to increase failure tolerance
|
|
description: >-
|
|
Create a job with the spread stanza to prevent application downtime
|
|
as a result of a physical domain failure in a datacenter or rack.
|
|
---
|
|
|
|
# Use spread to increase failure tolerance
|
|
|
|
The Nomad scheduler uses a bin-packing algorithm when making job placements on
|
|
nodes to optimize resource utilization and density of applications. Although bin
|
|
packing ensures optimal resource utilization, it can lead to some nodes carrying
|
|
a majority of allocations for a given job. This can cause cascading failures
|
|
where the failure of a single node or a single data center can lead to
|
|
application unavailability.
|
|
|
|
The [spread stanza][spread-stanza] solves this problem by allowing operators to
|
|
distribute their workloads in a customized way based on [attributes] and/or
|
|
[client metadata][client-metadata]. By using spread criteria in their job
|
|
specification, Nomad job operators can ensure that failures across a domain such
|
|
as datacenter or rack don't affect application availability.
|
|
|
|
Consider a Nomad application that needs to be deployed to multiple datacenters
|
|
within a region. Datacenter `dc1` has four nodes while `dc2` has one node. This
|
|
application has ten instances and seven of them must be deployed to `dc1` since
|
|
it receives more user traffic and you need to make sure the application doesn't
|
|
suffer downtime due to not enough running instances to process requests. The
|
|
remaining 3 allocations can be deployed to `dc2`.
|
|
|
|
Use the `spread` stanza in the Nomad [job specification][job-specification] to
|
|
ensure the 70% of the workload is being placed in datacenter `dc1` and 30% is
|
|
being placed in `dc2`. The Nomad operator can use the [percent] option with a
|
|
[target] to customize the spread.
|
|
|
|
### Prerequisites
|
|
|
|
To perform the tasks described in this guide, you need to have a Nomad
|
|
environment with Consul installed. You can use this [repository] to provision a
|
|
sandbox environment. This guide will assume a cluster with one server node and
|
|
five client nodes.
|
|
|
|
<Tip>
|
|
|
|
This guide is for demo purposes and is only using a single
|
|
server node. In a production cluster, 3 or 5 server nodes are recommended.
|
|
|
|
</Tip>
|
|
|
|
## Place one of the client nodes in a different datacenter
|
|
|
|
In this guide, you are going to customize the spread for our job placement
|
|
between the datacenters our nodes are located in. Choose one of your client
|
|
nodes and edit `/etc/nomad.d/nomad.hcl` to change its location to `dc2`. A
|
|
snippet of an example configuration file is show below with the required change
|
|
is shown below.
|
|
|
|
```hcl
|
|
data_dir = "/opt/nomad/data"
|
|
bind_addr = "0.0.0.0"
|
|
datacenter = "dc2"
|
|
|
|
# Enable the client
|
|
client {
|
|
enabled = true
|
|
# ...
|
|
}
|
|
```
|
|
|
|
After making the change on your chosen client node, restart the Nomad service
|
|
|
|
```shell-session
|
|
$ sudo systemctl restart nomad
|
|
```
|
|
|
|
If everything is configured correctly, you should be able to run the [`nomad node status`][node-status] command and confirm that one of your nodes is now in
|
|
datacenter `dc2`.
|
|
|
|
```shell-session
|
|
$ nomad node status
|
|
ID DC Name Class Drain Eligibility Status
|
|
5d16d949 dc2 ip-172-31-62-240 <none> false eligible ready
|
|
7b381152 dc1 ip-172-31-59-115 <none> false eligible ready
|
|
10cc48cc dc1 ip-172-31-58-46 <none> false eligible ready
|
|
93f1e628 dc1 ip-172-31-58-113 <none> false eligible ready
|
|
12894b80 dc1 ip-172-31-62-90 <none> false eligible ready
|
|
```
|
|
|
|
## Create a job with the spread stanza
|
|
|
|
Create a file with the name `redis.nomad.hcl` and place the following content in it:
|
|
|
|
```hcl
|
|
job "redis" {
|
|
datacenters = ["dc1", "dc2"]
|
|
type = "service"
|
|
|
|
spread {
|
|
attribute = "${node.datacenter}"
|
|
weight = 100
|
|
|
|
target "dc1" {
|
|
percent = 70
|
|
}
|
|
|
|
target "dc2" {
|
|
percent = 30
|
|
}
|
|
}
|
|
|
|
group "cache1" {
|
|
count = 10
|
|
|
|
network {
|
|
port "db" {
|
|
to = 6379
|
|
}
|
|
}
|
|
|
|
task "redis" {
|
|
driver = "docker"
|
|
|
|
config {
|
|
image = "redis:latest"
|
|
|
|
ports = ["db"]
|
|
}
|
|
|
|
service {
|
|
name = "redis-cache"
|
|
port = "db"
|
|
|
|
check {
|
|
name = "alive"
|
|
type = "tcp"
|
|
interval = "10s"
|
|
timeout = "2s"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Note that the job specifies the `spread` stanza and specified the
|
|
[datacenter][attributes] attribute while targeting `dc1` and `dc2` with the
|
|
percent options. This will tell the Nomad scheduler to make an attempt to
|
|
distribute 70% of the workload on `dc1` and 30% of the workload on `dc2`.
|
|
|
|
## Register the redis.nomad.hcl job
|
|
|
|
Run the Nomad job with the following command:
|
|
|
|
```shell-session
|
|
$ nomad run redis.nomad.hcl
|
|
==> Monitoring evaluation "c3dc5ebd"
|
|
Evaluation triggered by job "redis"
|
|
Allocation "7a374183" created: node "5d16d949", group "cache1"
|
|
Allocation "f4361df1" created: node "7b381152", group "cache1"
|
|
Allocation "f7af42dc" created: node "5d16d949", group "cache1"
|
|
Allocation "0638edf2" created: node "10cc48cc", group "cache1"
|
|
Allocation "49bc6038" created: node "12894b80", group "cache1"
|
|
Allocation "c7e5679a" created: node "5d16d949", group "cache1"
|
|
Allocation "cf91bf65" created: node "7b381152", group "cache1"
|
|
Allocation "d16b606c" created: node "12894b80", group "cache1"
|
|
Allocation "27866df0" created: node "93f1e628", group "cache1"
|
|
Allocation "8531a6fc" created: node "7b381152", group "cache1"
|
|
Evaluation status changed: "pending" -> "complete"
|
|
```
|
|
|
|
Note that three of the ten allocations have been placed on node `5d16d949`. This
|
|
is the node configured to be in datacenter `dc2`. The Nomad scheduler has
|
|
distributed 30% of the workload to `dc2` as specified in the `spread` stanza.
|
|
|
|
Keep in mind that the Nomad scheduler still factors in other components into the
|
|
overall scoring of nodes when making placements, so you should not expect the
|
|
spread stanza to strictly implement your distribution preferences like a
|
|
[constraint][constraint-stanza]. Now, take a detailed look at the scoring in
|
|
the next few steps.
|
|
|
|
## Check the status of the job
|
|
|
|
Check the status of the job and verify where allocations have been placed. Run
|
|
the following command:
|
|
|
|
```shell-session
|
|
$ nomad status redis
|
|
```
|
|
|
|
The output should list ten running instances of your job in the `Summary`
|
|
section as show below:
|
|
|
|
```plaintext
|
|
...
|
|
Summary
|
|
Task Group Queued Starting Running Failed Complete Lost
|
|
cache1 0 0 10 0 0 0
|
|
|
|
Allocations
|
|
ID Node ID Task Group Version Desired Status Created Modified
|
|
0638edf2 10cc48cc cache1 0 run running 2m20s ago 2m ago
|
|
27866df0 93f1e628 cache1 0 run running 2m20s ago 1m57s ago
|
|
49bc6038 12894b80 cache1 0 run running 2m20s ago 1m58s ago
|
|
7a374183 5d16d949 cache1 0 run running 2m20s ago 2m1s ago
|
|
8531a6fc 7b381152 cache1 0 run running 2m20s ago 2m2s ago
|
|
c7e5679a 5d16d949 cache1 0 run running 2m20s ago 1m55s ago
|
|
cf91bf65 7b381152 cache1 0 run running 2m20s ago 1m57s ago
|
|
d16b606c 12894b80 cache1 0 run running 2m20s ago 2m1s ago
|
|
f4361df1 7b381152 cache1 0 run running 2m20s ago 2m3s ago
|
|
f7af42dc 5d16d949 cache1 0 run running 2m20s ago 1m54s ago
|
|
```
|
|
|
|
You can cross-check this output with the results of the `nomad node status`
|
|
command to verify that 30% of your workload has been placed on the node in `dc2`
|
|
(in our case, that node is `5d16d949`).
|
|
|
|
## Obtain detailed scoring information on job placement
|
|
|
|
The Nomad scheduler will not always spread your workload in the way you have
|
|
specified in the `spread` stanza even if the resources are available. This is
|
|
because spread scoring is factored in with other metrics as well before making a
|
|
scheduling decision. In this step, you will take a look at some of those other
|
|
factors.
|
|
|
|
Using the output from the previous step, take any allocation that has been
|
|
placed on a node and use the nomad [alloc status][alloc status] command with the
|
|
[verbose][verbose] option to obtain detailed scoring information on it. In this
|
|
example, the guide refers to allocation ID `0638edf2`—your allocation IDs will
|
|
be different.
|
|
|
|
```shell-session
|
|
$ nomad alloc status -verbose 0638edf2
|
|
```
|
|
|
|
The resulting output will show the `Placement Metrics` section at the bottom.
|
|
|
|
```plaintext
|
|
...
|
|
Placement Metrics
|
|
Node node-affinity allocation-spread binpack job-anti-affinity node-reschedule-penalty final score
|
|
10cc48cc-2913-af54-74d5-d7559f373ff2 0 0.429 0.33 0 0 0.379
|
|
93f1e628-e509-b1ab-05b7-0944056f781d 0 0.429 0.515 -0.2 0 0.248
|
|
12894b80-4943-4d5c-5716-c626c6b99be3 0 0.429 0.515 -0.2 0 0.248
|
|
7b381152-3802-258b-4155-6d7dfb344dd4 0 0.429 0.515 -0.2 0 0.248
|
|
5d16d949-85aa-3fd3-b5f4-51094cbeb77a 0 0.333 0.515 -0.2 0 0.216
|
|
```
|
|
|
|
Note that the results from the `allocation-spread`, `binpack`,
|
|
`job-anti-affinity`, `node-reschedule-penalty`, and `node-affinity` columns are
|
|
combined to produce the numbers listed in the `final score` column for each
|
|
node. The Nomad scheduler uses the final score for each node in deciding where
|
|
to make placements.
|
|
|
|
## Next steps
|
|
|
|
Change the values of the `percent` options on your targets in the `spread`
|
|
stanza and observe how the placement behavior along with the final score given
|
|
to each node changes (use the `nomad alloc status` command as shown in the
|
|
previous step).
|
|
|
|
### Reference material
|
|
|
|
- The [spread][spread-stanza] stanza documentation
|
|
- [Scheduling][scheduling] with Nomad
|
|
|
|
[alloc status]: /nomad/commands/alloc/status
|
|
[attributes]: /nomad/docs/reference/runtime-variable-interpolation
|
|
[client-metadata]: /nomad/docs/configuration/client#meta
|
|
[constraint-stanza]: /nomad/docs/job-specification/constraint
|
|
[job-specification]: /nomad/docs/job-specification
|
|
[node-status]: /nomad/commands/node/status
|
|
[percent]: /nomad/docs/job-specification/spread#percent
|
|
[repository]: https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud
|
|
[scheduling]: /nomad/docs/concepts/scheduling/how-scheduling-works
|
|
[spread-stanza]: /nomad/docs/job-specification/spread
|
|
[target]: /nomad/docs/job-specification/spread#target
|
|
[verbose]: /nomad/commands/alloc/status#verbose
|