Added sentence about job anti-affinity; Reflowed

This will create a concrete mention of job anti-affminity in the Nomad documentation. The only place we discuss it currently is in a similar sentence on the website itself. I borrowed liberally from that sentence in crafting this line.
2026-01-06 18:35:44 +03:00 · 2017-08-01 13:13:42 -04:00
parent 589b33bfaf
commit 17ccab2d26
1 changed files with 77 additions and 67 deletions
--- a/website/source/docs/internals/scheduling.html.md
+++ b/website/source/docs/internals/scheduling.html.md
@@ -9,91 +9,101 @@ description: |-
 # Scheduling

 Scheduling is a core function of Nomad. It is the process of assigning tasks
-from jobs to client machines. This process must respect the constraints as declared
-in the job, and optimize for resource utilization. This page documents the details
-of how scheduling works in Nomad to help both users and developers
-build a mental model. The design is heavily inspired by Google's
-work on both [Omega: flexible, scalable schedulers for large compute clusters](https://research.google.com/pubs/pub41684.html)
-and [Large-scale cluster management at Google with Borg](https://research.google.com/pubs/pub43438.html).
+from jobs to client machines. This process must respect the constraints as
+declared in the job, and optimize for resource utilization. This page documents
+the details of how scheduling works in Nomad to help both users and developers
+build a mental model. The design is heavily inspired by Google's work on both
+[Omega: flexible, scalable schedulers for large compute clusters][Omega] and
+[Large-scale cluster management at Google with Borg][Borg].

-~> **Advanced Topic!** This page covers technical details
-of Nomad. You do not need to understand these details to
-effectively use Nomad. The details are documented here for
-those who wish to learn about them without having to go
-spelunking through the source code.
+~> **Advanced Topic!** This page covers technical details of Nomad. You do not
+~> need to understand these details to effectively use Nomad. The details are
+~> documented here for those who wish to learn about them without having to
+~> go spelunking through the source code.

 # Scheduling in Nomad

-[![Nomad Data Model](/assets/images/nomad-data-model.png)](/assets/images/nomad-data-model.png)
+[![Nomad Data Model][img-data-model]][img-data-model]

-There are four primary "nouns" in Nomad; jobs, nodes, allocations, and evaluations.
-Jobs are submitted by users and represent a _desired state_. A job is a declarative description
-of tasks to run which are bounded by constraints and require resources. Tasks can be scheduled on 
-nodes in the cluster running the Nomad client. The mapping of tasks in a job to clients is done
-using allocations. An allocation is used to declare that a set of tasks in a job should be run
-on a particular node. Scheduling is the process of determining the appropriate allocations and
-is done as part of an evaluation.
+There are four primary "nouns" in Nomad; jobs, nodes, allocations, and
+evaluations. Jobs are submitted by users and represent a _desired state_. A job
+is a declarative description of tasks to run which are bounded by constraints
+and require resources. Tasks can be scheduled on  nodes in the cluster running
+the Nomad client. The mapping of tasks in a job to clients is done using
+allocations. An allocation is used to declare that a set of tasks in a job
+should be run on a particular node. Scheduling is the process of determining
+the appropriate allocations and is done as part of an evaluation.

-An evaluation is created any time the external state, either desired or emergent, changes. The desired
-state is based on jobs, meaning the desired state changes if a new job is submitted, an
-existing job is updated, or a job is deregistered. The emergent state is based on the client
-nodes, and so we must handle the failure of any clients in the system. These events trigger
-the creation of a new evaluation, as Nomad must _evaluate_ the state of the world and reconcile
-it with the desired state.
+An evaluation is created any time the external state, either desired or
+emergent, changes. The desired state is based on jobs, meaning the desired
+state changes if a new job is submitted, an existing job is updated, or a job
+is deregistered. The emergent state is based on the client nodes, and so we
+must handle the failure of any clients in the system. These events trigger the
+creation of a new evaluation, as Nomad must _evaluate_ the state of the world
+and reconcile it with the desired state.

 This diagram shows the flow of an evaluation through Nomad:

-[![Nomad Evaluation Flow](/assets/images/nomad-evaluation-flow.png)](/assets/images/nomad-evaluation-flow.png)
+[![Nomad Evaluation Flow][img-eval-flow]][img-eval-flow]

-The lifecycle of an evaluation begins with an event causing the evaluation to be
-created. Evaluations are created in the `pending` state and are enqueued into the
-evaluation broker. There is a single evaluation broker which runs on the leader server.
-The evaluation broker is used to manage the queue of pending evaluations, provide priority ordering,
-and ensure at least once delivery.
+The lifecycle of an evaluation begins with an event causing the evaluation to
+be created. Evaluations are created in the `pending` state and are enqueued
+into the evaluation broker. There is a single evaluation broker which runs on
+the leader server. The evaluation broker is used to manage the queue of pending
+evaluations, provide priority ordering, and ensure at least once delivery.

-Nomad servers run scheduling workers, defaulting to one per CPU core, which are used to
-process evaluations. The workers dequeue evaluations from the broker, and then invoke
-the appropriate scheduler as specified by the job. Nomad ships with a `service` scheduler
-that optimizes for long-lived services, a `batch` scheduler that is used for fast placement
-of batch jobs, a `system` scheduler that is used to run jobs on every node,
-and a `core` scheduler which is used for internal maintenance.
-Nomad can be extended to support custom schedulers as well.
+Nomad servers run scheduling workers, defaulting to one per CPU core, which are
+used to process evaluations. The workers dequeue evaluations from the broker,
+and then invoke the appropriate scheduler as specified by the job. Nomad ships
+with a `service` scheduler that optimizes for long-lived services, a `batch`
+scheduler that is used for fast placement of batch jobs, a `system` scheduler
+that is used to run jobs on every node, and a `core` scheduler which is used
+for internal maintenance. Nomad can be extended to support custom schedulers as
+well.

-Schedulers are responsible for processing an evaluation and generating an allocation _plan_.
-The plan is the set of allocations to evict, update, or create. The specific logic used to
-generate a plan may vary by scheduler, but generally the scheduler needs to first reconcile
-the desired state with the real state to determine what must be done. New allocations need
-to be placed and existing allocations may need to be updated, migrated, or stopped.
+Schedulers are responsible for processing an evaluation and generating an
+allocation _plan_. The plan is the set of allocations to evict, update, or
+create. The specific logic used to generate a plan may vary by scheduler, but
+generally the scheduler needs to first reconcile the desired state with the
+real state to determine what must be done. New allocations need to be placed
+and existing allocations may need to be updated, migrated, or stopped.

-Placing allocations is split into two distinct phases, feasibility
-checking and ranking. In the first phase the scheduler finds nodes that are
-feasible by filtering unhealthy nodes, those missing necessary drivers, and those
-failing the specified constraints.
+Placing allocations is split into two distinct phases, feasibility checking and
+ranking. In the first phase the scheduler finds nodes that are feasible by
+filtering unhealthy nodes, those missing necessary drivers, and those failing
+the specified constraints.

-The second phase is ranking, where the scheduler scores feasible nodes to find the best fit.
-Scoring is primarily based on bin packing, which is used to optimize the resource utilization
-and density of applications, but is also augmented by affinity and anti-affinity rules. One such
-anti-affinity rule exists to avoid colocating instances of the same service to reduce the 
+The second phase is ranking, where the scheduler scores feasible nodes to find
+the best fit. Scoring is primarily based on bin packing, which is used to
+optimize the resource utilization and density of applications, but is also
+augmented by affinity and anti-affinity rules. One anti-affinity rule
+attempts to avoid colocating instances of the same service to reduce the
 probability of correlated failures.

-Once the scheduler has ranked enough nodes, the highest ranking node is selected and
-added to the allocation plan.
+Once the scheduler has ranked enough nodes, the highest ranking node is
+selected and added to the allocation plan.

-When planning is complete, the scheduler submits the plan to the leader which adds
-the plan to the plan queue. The plan queue manages pending plans, provides priority
-ordering, and allows Nomad to handle concurrency races. Multiple schedulers are running
-in parallel without locking or reservations, making Nomad optimistically concurrent.
-As a result, schedulers might overlap work on the same node and cause resource
-over-subscription. The plan queue allows the leader node to protect against this and
-do partial or complete rejections of a plan.
+When planning is complete, the scheduler submits the plan to the leader which
+adds the plan to the plan queue. The plan queue manages pending plans, provides
+priority ordering, and allows Nomad to handle concurrency races. Multiple
+schedulers are running in parallel without locking or reservations, making
+Nomad optimistically concurrent. As a result, schedulers might overlap work on
+the same node and cause resource over-subscription. The plan queue allows the
+leader node to protect against this and do partial or complete rejections of a
+plan.

 As the leader processes plans, it creates allocations when there is no conflict
-and otherwise informs the scheduler of a failure in the plan result. The plan result
-provides feedback to the scheduler, allowing it to terminate or explore alternate plans
-if the previous plan was partially or completely rejected.
+and otherwise informs the scheduler of a failure in the plan result. The plan
+result provides feedback to the scheduler, allowing it to terminate or explore
+alternate plans if the previous plan was partially or completely rejected.

-Once the scheduler has finished processing an evaluation, it updates the status of
-the evaluation and acknowledges delivery with the evaluation broker. This completes
-the lifecycle of an evaluation. Allocations that were created, modified or deleted
-as a result will be picked up by client nodes and will begin execution.
+Once the scheduler has finished processing an evaluation, it updates the status
+of the evaluation and acknowledges delivery with the evaluation broker. This
+completes the lifecycle of an evaluation. Allocations that were created,
+modified or deleted as a result will be picked up by client nodes and will
+begin execution.

+[Omega]: https://research.google.com/pubs/pub41684.html
+[Borg]: https://research.google.com/pubs/pub43438.html
+[img-data-model]: /assets/images/nomad-data-model.png
+[img-eval-flow]: /assets/images/nomad-evaluation-flow.png