mirror of
https://github.com/kemko/nomad.git
synced 2026-01-06 18:35:44 +03:00
Merge pull request #5234 from hashicorp/b-preemption-docs
Documentation for preemption
This commit is contained in:
@@ -132,13 +132,12 @@ This endpoint retrieves its latest Autopilot configuration.
|
||||
| `GET` | `/operator/autopilot/configuration` | `application/json` |
|
||||
|
||||
The table below shows this endpoint's support for
|
||||
[blocking queries](/api/index.html#blocking-queries),
|
||||
[consistency modes](/api/index.html#consistency-modes), and
|
||||
[blocking queries](/api/index.html#blocking-queries) and
|
||||
[required ACLs](/api/index.html#acls).
|
||||
|
||||
| Blocking Queries | Consistency Modes | ACL Required |
|
||||
| ---------------- | ----------------- | --------------- |
|
||||
| `NO` | `none` | `operator:read` |
|
||||
| Blocking Queries | ACL Required |
|
||||
| ---------------- | --------------- |
|
||||
| `NO` | `operator:read` |
|
||||
|
||||
### Sample Request
|
||||
|
||||
@@ -175,13 +174,12 @@ This endpoint updates the Autopilot configuration of the cluster.
|
||||
| `PUT` | `/operator/autopilot/configuration` | `application/json` |
|
||||
|
||||
The table below shows this endpoint's support for
|
||||
[blocking queries](/api/index.html#blocking-queries),
|
||||
[consistency modes](/api/index.html#consistency-modes), and
|
||||
[blocking queries](/api/index.html#blocking-queries) and
|
||||
[required ACLs](/api/index.html#acls).
|
||||
|
||||
| Blocking Queries | Consistency Modes | ACL Required |
|
||||
| ---------------- | ----------------- | ---------------- |
|
||||
| `NO` | `none` | `operator:write` |
|
||||
| Blocking Queries | ACL Required |
|
||||
| ---------------- | ---------------- |
|
||||
| `NO` | `operator:write` |
|
||||
|
||||
### Parameters
|
||||
|
||||
@@ -240,13 +238,12 @@ This endpoint queries the health of the autopilot status.
|
||||
| `GET` | `/operator/autopilot/health` | `application/json` |
|
||||
|
||||
The table below shows this endpoint's support for
|
||||
[blocking queries](/api/index.html#blocking-queries),
|
||||
[consistency modes](/api/index.html#consistency-modes), and
|
||||
[blocking queries](/api/index.html#blocking-queries) and
|
||||
[required ACLs](/api/index.html#acls).
|
||||
|
||||
| Blocking Queries | Consistency Modes | ACL Required |
|
||||
| ---------------- | ----------------- | --------------- |
|
||||
| `NO` | `none` | `operator:read` |
|
||||
| Blocking Queries | ACL Required |
|
||||
| ---------------- | --------------- |
|
||||
| `NO` | `operator:read` |
|
||||
|
||||
### Sample Request
|
||||
|
||||
@@ -328,3 +325,95 @@ $ curl \
|
||||
|
||||
The HTTP status code will indicate the health of the cluster. If `Healthy` is true, then a
|
||||
status of 200 will be returned. If `Healthy` is false, then a status of 429 will be returned.
|
||||
|
||||
|
||||
## Read Scheduler Configuration
|
||||
|
||||
This endpoint retrieves the latest Scheduler configuration. This API was introduced in
|
||||
Nomad 0.9 and currently supports enabling/disabling preemption. More options may be added in
|
||||
the future.
|
||||
|
||||
| Method | Path | Produces |
|
||||
| ------ | ---------------------------- | -------------------------- |
|
||||
| `GET` | `/operator/scheduler/configuration` | `application/json` |
|
||||
|
||||
The table below shows this endpoint's support for
|
||||
[blocking queries](/api/index.html#blocking-queries) and
|
||||
[required ACLs](/api/index.html#acls).
|
||||
|
||||
| Blocking Queries | ACL Required |
|
||||
| ---------------- | --------------- |
|
||||
| `NO` | `operator:read` |
|
||||
|
||||
### Sample Request
|
||||
|
||||
```text
|
||||
$ curl \
|
||||
https://localhost:4646/operator/scheduler/configuration
|
||||
```
|
||||
|
||||
### Sample Response
|
||||
|
||||
```json
|
||||
{
|
||||
"Index": 5,
|
||||
"KnownLeader": true,
|
||||
"LastContact": 0,
|
||||
"SchedulerConfig": {
|
||||
"CreateIndex": 5,
|
||||
"ModifyIndex": 5,
|
||||
"PreemptionConfig": {
|
||||
"SystemSchedulerEnabled": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
#### Field Reference
|
||||
|
||||
- `Index` `(int)` - The `Index` value is the Raft commit index corresponding to this
|
||||
configuration.
|
||||
|
||||
- `SchedulerConfig` `(SchedulerConfig)` - The returned `SchedulerConfig` object has configuration
|
||||
settings mentioned below.
|
||||
|
||||
- `PreemptionConfig` `(PreemptionConfig)` - Options to enable preemption for various schedulers.
|
||||
- `SystemSchedulerEnabled` `(bool: true)` - Specifies whether preemption for system jobs is enabled. Note that
|
||||
this defaults to true.
|
||||
- `CreateIndex` - The Raft index at which the config was created.
|
||||
- `ModifyIndex` - The Raft index at which the config was modified.
|
||||
|
||||
## Update Scheduler Configuration
|
||||
|
||||
This endpoint updates the scheduler configuration of the cluster.
|
||||
|
||||
| Method | Path | Produces |
|
||||
| ------ | ---------------------------- | -------------------------- |
|
||||
| `PUT`, `POST` | `/operator/scheduler/configuration` | `application/json` |
|
||||
|
||||
The table below shows this endpoint's support for
|
||||
[blocking queries](/api/index.html#blocking-queries) and
|
||||
[required ACLs](/api/index.html#acls).
|
||||
|
||||
| Blocking Queries | ACL Required |
|
||||
| ---------------- | ---------------- |
|
||||
| `NO` | `operator:write` |
|
||||
|
||||
### Parameters
|
||||
|
||||
- `cas` `(int: 0)` - Specifies to use a Check-And-Set operation. The update will
|
||||
only happen if the given index matches the `ModifyIndex` of the configuration
|
||||
at the time of writing.
|
||||
|
||||
### Sample Payload
|
||||
|
||||
```json
|
||||
{
|
||||
"PreemptionConfig": {
|
||||
"EnablePreemption": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `PreemptionConfig` `(PreemptionConfig)` - Options to enable preemption for various schedulers.
|
||||
- `SystemSchedulerEnabled` `(bool: true)` - Specifies whether preemption for system jobs is enabled. Note that
|
||||
if this is set to true, then system jobs can preempt any other jobs.
|
||||
|
||||
21
website/source/docs/internals/scheduling/index.html.md
Normal file
21
website/source/docs/internals/scheduling/index.html.md
Normal file
@@ -0,0 +1,21 @@
|
||||
---
|
||||
layout: "docs"
|
||||
page_title: "Scheduling"
|
||||
sidebar_current: "docs-internals-scheduling"
|
||||
description: |-
|
||||
Learn about how scheduling works in Nomad.
|
||||
---
|
||||
|
||||
# Scheduling
|
||||
|
||||
Scheduling is a core function of Nomad. It is the process of assigning tasks
|
||||
from jobs to client machines. The design is heavily inspired by Google's work on
|
||||
both [Omega: flexible, scalable schedulers for large compute clusters][Omega] and
|
||||
[Large-scale cluster management at Google with Borg][Borg]. See the links below
|
||||
for implementation details on scheduling in Nomad.
|
||||
|
||||
- [Scheduling Internals](/docs/internals/scheduling/scheduling.html) - An overview of how the scheduler works.
|
||||
- [Preemption](/docs/internals/scheduling/preemption.html) - Details of preemption, an advanced scheduler feature introduced in Nomad 0.9.
|
||||
|
||||
[Omega]: https://research.google.com/pubs/pub41684.html
|
||||
[Borg]: https://research.google.com/pubs/pub43438.html
|
||||
100
website/source/docs/internals/scheduling/preemption.html.md
Normal file
100
website/source/docs/internals/scheduling/preemption.html.md
Normal file
@@ -0,0 +1,100 @@
|
||||
---
|
||||
layout: "docs"
|
||||
page_title: "Preemption"
|
||||
sidebar_current: "docs-internals-scheduling-preemption"
|
||||
description: |-
|
||||
Learn about how preemption works in Nomad.
|
||||
---
|
||||
|
||||
# Preemption
|
||||
|
||||
Preemption allows Nomad to kill existing allocations in order to place allocations for a higher priority job.
|
||||
The evicted allocation is temporary displaced until the cluster has capacity to run it. This allows operators to
|
||||
run high priority jobs even under resource contention across the cluster.
|
||||
|
||||
|
||||
~> **Advanced Topic!** This page covers technical details of Nomad. You do not
|
||||
~> need to understand these details to effectively use Nomad. The details are
|
||||
~> documented here for those who wish to learn about them without having to
|
||||
~> go spelunking through the source code.
|
||||
|
||||
# Preemption in Nomad
|
||||
|
||||
Every job in Nomad has a priority associated with it. Priorities impact scheduling at the evaluation and planning
|
||||
stages by sorting the respective queues accordingly (higher priority jobs get moved ahead in the queues).
|
||||
|
||||
Prior to Nomad 0.9, when a cluster is at capacity, any allocations that result from a newly scheduled or updated
|
||||
job remain in the pending state until sufficient resources become available - regardless of the defined priority.
|
||||
This leads to priority inversion, where a low priority task can prevent high priority tasks from completing.
|
||||
|
||||
Nomad 0.9 brings preemption capabilities to system jobs. The Nomad scheduler will evict lower priority running allocations
|
||||
to free up capacity for new allocations resulting from relatively higher priority jobs, sending evicted allocations back
|
||||
into the plan queue.
|
||||
|
||||
# Details
|
||||
|
||||
Preemption is enabled by default in Nomad 0.9. Operators can use the [scheduler config](/api/operator.html#update-scheduler-configuration) API endpoint to disable preemption.
|
||||
|
||||
Nomad uses the [job priority](/docs/job-specification/job.html#priority) field to determine what running allocations can be preempted.
|
||||
In order to prevent a cascade of preemptions due to jobs close in priority being preempted, only allocations from jobs with a priority
|
||||
delta of more than 10 from the job needing placement are eligible for preemption.
|
||||
|
||||
For example, consider a node with the following distribution of allocations:
|
||||
|
||||
| Job | Priority | Allocations | Total Used capacity |
|
||||
| ------------- |-------------| -------------- |------------
|
||||
| cache | 70 | a6 | 2 GB Memory, 0.5 GB Disk, 1 CPU
|
||||
| batch-analytics| 50 | a4, a5 | <1 GB Memory, 0.5 GB Disk, 0.5 CPU>, <1 GB Memory, 0.5 GB Disk, 0.5 CPU>
|
||||
| email-marketing | 20 | a1, a2 | <0.5 GB Memory, 0.8 GB Disk>, <0.5 GB Memory, 0.2 GB Disk>
|
||||
|
||||
If a job `webapp` with priority `75` needs placement on the above node, only allocations from `batch-analytics` and `email-marketing` are considered
|
||||
eligible to be preempted because they are of a lower priority. Allocations from the `cache` job will never be preempted because its priority value `70`
|
||||
is lesser than the required delta of `10`.
|
||||
|
||||
Allocations are selected starting from the lowest priority, and scored according
|
||||
to how closely they fit the job's required capacity. For example, if the `75` priority job needs 1GB disk and 2GB memory, Nomad will preempt
|
||||
allocations `a1`, `a2` and `a4` to satisfy those requirements.
|
||||
|
||||
# Preemption Visibility
|
||||
|
||||
Operators can use the [allocation API](/api/allocations.html#read-allocation) or the `alloc status` command to get visibility into
|
||||
whether an allocation has been preempted. Preempted allocations will have their DesiredStatus set to “evict”. The `Allocation` object
|
||||
in the API also has two additional fields related to preemption.
|
||||
|
||||
- `PreemptedAllocs` - This field is set on an allocation that caused preemption. It contains the allocation ids of allocations
|
||||
that were preempted to place this allocation. In the above example, allocations created for the job `webapp` will have the values
|
||||
`a1`, `a2` and `a4` set.
|
||||
- `PreemptedByAllocID` - This field is set on allocations that were preempted by the scheduler. It contains the allocation ID of the allocation
|
||||
that preempted it. In the above example, allocations `a1`, `a2` and `a4` will have this field set to the ID of the allocation from the job `webapp`.
|
||||
|
||||
# Integration with Nomad plan
|
||||
|
||||
`nomad plan` allows operators to dry run the scheduler. If the scheduler determines that
|
||||
preemption is necessary to place the job, it shows additional information in the CLI output for
|
||||
`nomad plan` as seen below.
|
||||
|
||||
```sh
|
||||
$ nomad plan example.nomad
|
||||
|
||||
+ Job: "test"
|
||||
+ Task Group: "test" (1 create)
|
||||
+ Task: "test" (forces create)
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Preemptions:
|
||||
|
||||
Alloc ID Job ID Task Group
|
||||
ddef9521 my-batch analytics
|
||||
ae59fe45 my-batch analytics
|
||||
```
|
||||
|
||||
Note that, the allocations shown in the `nomad plan` output above
|
||||
are not guaranteed to be the same ones picked when running the job later.
|
||||
They provide the operator a sample of the type of allocations that could be preempted.
|
||||
|
||||
[Omega]: https://research.google.com/pubs/pub41684.html
|
||||
[Borg]: https://research.google.com/pubs/pub43438.html
|
||||
[img-data-model]: /assets/images/nomad-data-model.png
|
||||
[img-eval-flow]: /assets/images/nomad-evaluation-flow.png
|
||||
@@ -1,26 +1,11 @@
|
||||
---
|
||||
layout: "docs"
|
||||
page_title: "Scheduling"
|
||||
sidebar_current: "docs-internals-scheduling"
|
||||
sidebar_current: "docs-internals-scheduling-internals"
|
||||
description: |-
|
||||
Learn about how scheduling works in Nomad.
|
||||
---
|
||||
|
||||
# Scheduling
|
||||
|
||||
Scheduling is a core function of Nomad. It is the process of assigning tasks
|
||||
from jobs to client machines. This process must respect the constraints as
|
||||
declared in the job, and optimize for resource utilization. This page documents
|
||||
the details of how scheduling works in Nomad to help both users and developers
|
||||
build a mental model. The design is heavily inspired by Google's work on both
|
||||
[Omega: flexible, scalable schedulers for large compute clusters][Omega] and
|
||||
[Large-scale cluster management at Google with Borg][Borg].
|
||||
|
||||
~> **Advanced Topic!** This page covers technical details of Nomad. You do not
|
||||
~> need to understand these details to effectively use Nomad. The details are
|
||||
~> documented here for those who wish to learn about them without having to
|
||||
~> go spelunking through the source code.
|
||||
|
||||
# Scheduling in Nomad
|
||||
|
||||
[![Nomad Data Model][img-data-model]][img-data-model]
|
||||
@@ -10,7 +10,15 @@
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-internals-scheduling") %>>
|
||||
<a href="/docs/internals/scheduling.html">Scheduling</a>
|
||||
<a href="/docs/internals/scheduling/index.html">Scheduling</a>
|
||||
<ul class="nav">
|
||||
<li <%= sidebar_current("docs-internals-scheduling-internals") %>>
|
||||
<a href="/docs/internals/scheduling/scheduling.html">Internals</a>
|
||||
</li>
|
||||
<li <%= sidebar_current("docs-configuration-autopilot") %>>
|
||||
<a href="/docs/internals/scheduling/preemption.html">Preemption</a>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-internals-consensus") %>>
|
||||
|
||||
Reference in New Issue
Block a user