Files
nomad/website/content/docs/job-specification/spread.mdx
Tim Gross 17093d62f0 docs: describe omitted spread behavior and perf impact (#23184)
Update the documentation for the `spread` block:
* Make it clear that the default behavior within a given job when the `spread`
  block is omitted is to spread out allocs among feasible nodes.
* Describe the difference between the `spread` block and `spread` scheduler
  algorithm.
* Add warnings about the performance impact of using `spread` and how to
  mitigate it.
2024-06-05 13:28:09 -04:00

206 lines
7.5 KiB
Plaintext

---
layout: docs
page_title: spread Block - Job Specification
description: >-
The "spread" block is used to spread placements across a certain node
attributes such as datacenter.
Spread may be specified at the job or group levels for ultimate flexibility.
More than one spread block may be specified with relative weights between
each.
---
# `spread` Block
<Placement
groups={[
['job', 'spread'],
['job', 'group', 'spread'],
]}
/>
The `spread` block allows operators to increase the failure tolerance of their
applications by specifying a node attribute that allocations should be spread
over. This allows operators to spread allocations over attributes such as
datacenter, availability zone, or even rack in a physical datacenter.
By default, when `spread` is omitted, the scheduler will attempt to place
allocations from the same job on different nodes (and binpacked between
jobs). When using `spread` the scheduler will attempt to place allocations
equally among the available values of the given target.
```hcl
job "docs" {
# Spread allocations over all datacenter
spread {
attribute = "${node.datacenter}"
}
group "example" {
# Spread allocations over each rack based on desired percentage
spread {
attribute = "${meta.rack}"
target "r1" {
percent = 60
}
target "r2" {
percent = 40
}
}
}
}
```
Nodes are scored according to how closely they match the desired target
percentage defined in the spread block. Spread scores are combined with other
scoring factors such as bin packing.
A job or task group can have more than one spread criteria, with weights to
express relative preference.
Spread criteria are treated as a soft preference by the Nomad scheduler. If no
nodes match a given spread criteria, placement is still successful. To avoid
scoring every node for every placement, allocations may not be perfectly
spread. Spread works best on attributes with similar number of nodes:
identically configured racks or similarly configured datacenters.
Spread may be expressed on [attributes][interpolation] or [client
metadata][client-meta]. Additionally, spread may be specified at the [job][job]
and [group][group] levels for ultimate flexibility. Job level spread criteria
are inherited by all task groups in the job.
## `spread` Parameters
- `attribute` `(string: "")` - Specifies the name or reference of the attribute
to use. This can be any of the [Nomad interpolated
values](/nomad/docs/runtime/interpolation#interpreted_node_vars).
- `target` <code>([target](#target-parameters): &lt;required&gt;)</code> - Specifies one or more target
percentages for each value of the `attribute` in the spread block. If this is omitted,
Nomad will spread allocations evenly across all values of the attribute.
- `weight` `(integer:0)` - Specifies a weight for the spread block. The weight is used
during scoring and must be an integer between 0 to 100. Weights can be used
when there is more than one spread or affinity block to express relative preference across them.
## `target` Parameters
- `value` `(string:"")` - Specifies a target value of the attribute from a `spread` block.
- `percent` `(integer:0)` - Specifies the percentage associated with the target value.
## Comparison to `spread` Scheduling Algorithm
The `spread` block is not the same concept as setting the [scheduler
algorithm][] to `"spread"` instead of `"binpack"`. Setting the scheduler
algorithm impacts all jobs on a cluster (or node pool), and adjusts the tendency
of the scheduler to place workloads from different jobs on the same set of nodes
or not. The `spread` block impacts how the scheduler places allocations for a
given job.
## Scheduling Performance
Using the `spread` block can have significant impact on scheduling
performance. For each allocation in a `service` and `batch` job, the scheduler
iterates over nodes until it finds a small number of feasible nodes. Those
feasible nodes are then scored to find the best placement.
When `spread` is omitted, this limit is 2 for batch jobs and the log<sub>2</sub>
of the total number of nodes in the datacenter and node pool (with a minimum of
2) for service jobs. When the `spread` block is present, the scheduler instead
scores a number of nodes in the datacenter and node pool equal to the task group
count (with a maximum of 100) per allocation. This can result in
order-of-magnitude increases in scheduling times.
To monitor scheduling times potentially impacted by `spread` blocks, examine the
`nomad.nomad.worker.invoke_scheduler.*` found in the [Key Metrics][] table. You
can reduce scheduling times by avoiding `spread` and instead relying on the
default distribution of a job across multiple nodes. If this is not possible,
you may consider reducing the size of the node pool or datacenter to reduce the
number of nodes available for the scheduler to consider.
## `spread` Examples
The following examples show different ways to use the `spread` block.
### Even Spread Across Data Center
This example shows a spread block across the node's `datacenter` attribute. If we have
two datacenters `us-east1` and `us-west1`, and a task group of `count = 10`,
Nomad will attempt to place 5 allocations in each datacenter.
```hcl
spread {
attribute = "${node.datacenter}"
weight = 100
}
```
### Spread With Target Percentages
This example shows a spread block that specifies one target percentage. If we
have three datacenters `us-east1`, `us-east2`, and `us-west1`, and a task group
of `count = 10`, Nomad will attempt to place 5 of the allocations in "us-east1",
and will spread the remaining among the other two datacenters.
```hcl
spread {
attribute = "${node.datacenter}"
weight = 100
target "us-east1" {
percent = 50
}
}
```
This example shows a spread block that specifies target percentages for two
different datacenters. If we have two datacenters `us-east1` and `us-west1`,
and a task group of `count = 10`, Nomad will attempt to place 6 allocations
in `us-east1` and 4 in `us-west1`.
```hcl
spread {
attribute = "${node.datacenter}"
weight = 100
target "us-east1" {
percent = 60
}
target "us-west1" {
percent = 40
}
}
```
### Spread Across Multiple Attributes
This example shows spread blocks with multiple attributes. Consider a Nomad cluster
where there are two datacenters `us-east1` and `us-west1`, and each datacenter has nodes
with `${meta.rack}` being `r1` or `r2`. With the following spread block used on a job with `count=12`, Nomad
will attempt to place 6 allocations in each datacenter. Within a datacenter, Nomad will
attempt to place 3 allocations in nodes on rack `r1`, and 3 allocations in nodes on rack `r2`.
```hcl
spread {
attribute = "${node.datacenter}"
weight = 50
}
spread {
attribute = "${meta.rack}"
weight = 50
}
```
[job]: /nomad/docs/job-specification/job 'Nomad job Job Specification'
[group]: /nomad/docs/job-specification/group 'Nomad group Job Specification'
[client-meta]: /nomad/docs/configuration/client#meta 'Nomad meta Job Specification'
[task]: /nomad/docs/job-specification/task 'Nomad task Job Specification'
[interpolation]: /nomad/docs/runtime/interpolation 'Nomad interpolation'
[node-variables]: /nomad/docs/runtime/interpolation#node-variables- 'Nomad interpolation-Node variables'
[constraint]: /nomad/docs/job-specification/constraint 'Nomad Constraint job Specification'
[Key Metrics]: /nomad/docs/operations/metrics-reference#key-metrics
[scheduler algorithm]: /nomad/docs/commands/operator/scheduler/set-config#scheduler-algorithm