Files
nomad/website/content/docs/schedulers.mdx
Aimee Ukasick d9bb241b43 Docs SEO: Update runtime, networking, Nomad vs K8s, Nomad Enterprise, upgrading, release notes, and sectionless pages (#24764)
* Docs SEO: Updates

CE-781,782,785,788

* CE-791 single pages

* CE-786 enterprise section

* CE-789 release notes

* fix content-check error

* Update description and add intro body paragraph when appropriate

* fix typo

* Apply suggestions from Jeff's code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2025-02-03 10:03:36 -06:00

98 lines
4.6 KiB
Plaintext

---
layout: docs
page_title: Schedulers
description: Learn how Nomad's service, batch, system, and system batch job schedulers enable flexible workloads.
---
# Schedulers
This page provides conceptual information about Nomad service, batch, system,
and system batch job schedulers.
## Service
The `service` scheduler is designed for scheduling long lived services that
should never go down. As such, the `service` scheduler ranks a large portion
of the nodes that meet the job's constraints and selects the optimal node to
place a task group on. The `service` scheduler uses a best fit scoring algorithm
influenced by Google's work on [Borg]. Ranking this larger set of candidate
nodes increases scheduling time but provides greater guarantees about the
optimality of a job placement, which given the service workload is highly
desirable.
Service jobs are intended to run until explicitly stopped by an operator. If a
service task exits it is considered a failure and handled according to the job's
[restart] and [reschedule] blocks.
## Batch
Batch jobs are much less sensitive to short term performance fluctuations and
are short lived, finishing in a few minutes to a few days. Although the `batch`
scheduler is very similar to the `service` scheduler, it makes certain
optimizations for the batch workload. The main distinction is that after finding
the set of nodes that meet the job's constraints it uses the power of two
choices described in Berkeley's [Sparrow] scheduler to limit the number of nodes
that are ranked.
Batch jobs are intended to run until they exit successfully. Batch tasks that
exit with an error are handled according to the job's [restart] and [reschedule]
blocks.
## System
The `system` scheduler is used to register jobs that should be run on all
clients that meet the job's constraints. The `system` scheduler is also invoked
when clients join the cluster or transition into the ready state. This means
that all registered `system` jobs will be re-evaluated and their tasks will be
placed on the newly available nodes if the constraints are met.
This scheduler type is extremely useful for deploying and managing tasks that
should be present on every node in the cluster. Since these tasks are
managed by Nomad, they can take advantage of job updating,
service discovery, and more.
The system scheduler will preempt eligible lower priority tasks running on a
node if there isn't enough capacity to place a system job. See [preemption]
for details on how tasks that get preempted are chosen.
Systems jobs are intended to run until explicitly stopped either by an operator
or [preemption]. If a system task exits it is considered a failure and handled
according to the job's [restart] block; system jobs do not have rescheduling.
When used with node pools, system jobs run on all nodes of the pool used by the
job. The built-in node pool `all` allows placing allocations on all clients in
the cluster.
## System Batch
The `sysbatch` scheduler is used to register jobs that should be run to completion
on all clients that meet the job's constraints. The `sysbatch` scheduler will
schedule jobs similarly to the `system` scheduler, but like a `batch` job once a
task exits successfully it is not restarted on that client.
This scheduler type is useful for issuing "one off" commands to be run on every
node in the cluster. Sysbatch jobs can also be created as [periodic] and [parameterized]
jobs. Since these tasks are managed by Nomad, they can take advantage of job
updating, service discovery, monitoring, and more.
The `sysbatch` scheduler will preempt lower priority tasks running on a node if there
is not enough capacity to place the job. See preemption details on how tasks that
get preempted are chosen.
Sysbatch jobs are intended to run until successful completion, explicitly stopped
by an operator, or evicted through [preemption]. Sysbatch tasks that exit with an
error are handled according to the job's [restart] block.
Like the `batch` scheduler, task groups in system batch jobs may have a `count`
greater than 1 to control how many instances are run. Instances that cannot be
immediately placed will be scheduled when resources become available,
potentially on a node that has already run another instance of the same job.
[borg]: https://research.google.com/pubs/pub43438.html
[parameterized]: /nomad/docs/job-specification/parameterized
[periodic]: /nomad/docs/job-specification/periodic
[preemption]: /nomad/docs/concepts/scheduling/preemption
[restart]: /nomad/docs/job-specification/restart
[reschedule]: /nomad/docs/job-specification/reschedule
[sparrow]: https://cs.stanford.edu/~matei/papers/2013/sosp_sparrow.pdf