diff --git a/website/source/api/operator.html.md b/website/source/api/operator.html.md index 2f3f69dc9..5b032cc8c 100644 --- a/website/source/api/operator.html.md +++ b/website/source/api/operator.html.md @@ -132,13 +132,12 @@ This endpoint retrieves its latest Autopilot configuration. | `GET` | `/operator/autopilot/configuration` | `application/json` | The table below shows this endpoint's support for -[blocking queries](/api/index.html#blocking-queries), -[consistency modes](/api/index.html#consistency-modes), and +[blocking queries](/api/index.html#blocking-queries) and [required ACLs](/api/index.html#acls). -| Blocking Queries | Consistency Modes | ACL Required | -| ---------------- | ----------------- | --------------- | -| `NO` | `none` | `operator:read` | +| Blocking Queries | ACL Required | +| ---------------- | --------------- | +| `NO` | `operator:read` | ### Sample Request @@ -175,13 +174,12 @@ This endpoint updates the Autopilot configuration of the cluster. | `PUT` | `/operator/autopilot/configuration` | `application/json` | The table below shows this endpoint's support for -[blocking queries](/api/index.html#blocking-queries), -[consistency modes](/api/index.html#consistency-modes), and +[blocking queries](/api/index.html#blocking-queries) and [required ACLs](/api/index.html#acls). -| Blocking Queries | Consistency Modes | ACL Required | -| ---------------- | ----------------- | ---------------- | -| `NO` | `none` | `operator:write` | +| Blocking Queries | ACL Required | +| ---------------- | ---------------- | +| `NO` | `operator:write` | ### Parameters @@ -240,13 +238,12 @@ This endpoint queries the health of the autopilot status. | `GET` | `/operator/autopilot/health` | `application/json` | The table below shows this endpoint's support for -[blocking queries](/api/index.html#blocking-queries), -[consistency modes](/api/index.html#consistency-modes), and +[blocking queries](/api/index.html#blocking-queries) and [required ACLs](/api/index.html#acls). -| Blocking Queries | Consistency Modes | ACL Required | -| ---------------- | ----------------- | --------------- | -| `NO` | `none` | `operator:read` | +| Blocking Queries | ACL Required | +| ---------------- | --------------- | +| `NO` | `operator:read` | ### Sample Request @@ -328,3 +325,95 @@ $ curl \ The HTTP status code will indicate the health of the cluster. If `Healthy` is true, then a status of 200 will be returned. If `Healthy` is false, then a status of 429 will be returned. + + +## Read Scheduler Configuration + +This endpoint retrieves the latest Scheduler configuration. This API was introduced in +Nomad 0.9 and currently supports enabling/disabling preemption. More options may be added in +the future. + +| Method | Path | Produces | +| ------ | ---------------------------- | -------------------------- | +| `GET` | `/operator/scheduler/configuration` | `application/json` | + +The table below shows this endpoint's support for +[blocking queries](/api/index.html#blocking-queries) and +[required ACLs](/api/index.html#acls). + +| Blocking Queries | ACL Required | +| ---------------- | --------------- | +| `NO` | `operator:read` | + +### Sample Request + +```text +$ curl \ + https://localhost:4646/operator/scheduler/configuration +``` + +### Sample Response + +```json +{ + "Index": 5, + "KnownLeader": true, + "LastContact": 0, + "SchedulerConfig": { + "CreateIndex": 5, + "ModifyIndex": 5, + "PreemptionConfig": { + "SystemSchedulerEnabled": true + } + } +} +``` +#### Field Reference + +- `Index` `(int)` - The `Index` value is the Raft commit index corresponding to this + configuration. + +- `SchedulerConfig` `(SchedulerConfig)` - The returned `SchedulerConfig` object has configuration + settings mentioned below. + + - `PreemptionConfig` `(PreemptionConfig)` - Options to enable preemption for various schedulers. + - `SystemSchedulerEnabled` `(bool: true)` - Specifies whether preemption for system jobs is enabled. Note that + this defaults to true. + - `CreateIndex` - The Raft index at which the config was created. + - `ModifyIndex` - The Raft index at which the config was modified. + +## Update Scheduler Configuration + +This endpoint updates the scheduler configuration of the cluster. + +| Method | Path | Produces | +| ------ | ---------------------------- | -------------------------- | +| `PUT`, `POST` | `/operator/scheduler/configuration` | `application/json` | + +The table below shows this endpoint's support for +[blocking queries](/api/index.html#blocking-queries) and +[required ACLs](/api/index.html#acls). + +| Blocking Queries | ACL Required | +| ---------------- | ---------------- | +| `NO` | `operator:write` | + +### Parameters + +- `cas` `(int: 0)` - Specifies to use a Check-And-Set operation. The update will + only happen if the given index matches the `ModifyIndex` of the configuration + at the time of writing. + +### Sample Payload + +```json +{ + "PreemptionConfig": { + "EnablePreemption": false + } +} +``` + +- `PreemptionConfig` `(PreemptionConfig)` - Options to enable preemption for various schedulers. + - `SystemSchedulerEnabled` `(bool: true)` - Specifies whether preemption for system jobs is enabled. Note that + if this is set to true, then system jobs can preempt any other jobs. diff --git a/website/source/docs/internals/scheduling/index.html.md b/website/source/docs/internals/scheduling/index.html.md new file mode 100644 index 000000000..35dbb30cf --- /dev/null +++ b/website/source/docs/internals/scheduling/index.html.md @@ -0,0 +1,21 @@ +--- +layout: "docs" +page_title: "Scheduling" +sidebar_current: "docs-internals-scheduling" +description: |- + Learn about how scheduling works in Nomad. +--- + +# Scheduling + +Scheduling is a core function of Nomad. It is the process of assigning tasks +from jobs to client machines. The design is heavily inspired by Google's work on +both [Omega: flexible, scalable schedulers for large compute clusters][Omega] and +[Large-scale cluster management at Google with Borg][Borg]. See the links below +for implementation details on scheduling in Nomad. + +- [Scheduling Internals](/docs/internals/scheduling/scheduling.html) - An overview of how the scheduler works. +- [Preemption](/docs/internals/scheduling/preemption.html) - Details of preemption, an advanced scheduler feature introduced in Nomad 0.9. + +[Omega]: https://research.google.com/pubs/pub41684.html +[Borg]: https://research.google.com/pubs/pub43438.html \ No newline at end of file diff --git a/website/source/docs/internals/scheduling/preemption.html.md b/website/source/docs/internals/scheduling/preemption.html.md new file mode 100644 index 000000000..863262153 --- /dev/null +++ b/website/source/docs/internals/scheduling/preemption.html.md @@ -0,0 +1,100 @@ +--- +layout: "docs" +page_title: "Preemption" +sidebar_current: "docs-internals-scheduling-preemption" +description: |- + Learn about how preemption works in Nomad. +--- + +# Preemption + +Preemption allows Nomad to kill existing allocations in order to place allocations for a higher priority job. +The evicted allocation is temporary displaced until the cluster has capacity to run it. This allows operators to +run high priority jobs even under resource contention across the cluster. + + +~> **Advanced Topic!** This page covers technical details of Nomad. You do not +~> need to understand these details to effectively use Nomad. The details are +~> documented here for those who wish to learn about them without having to +~> go spelunking through the source code. + +# Preemption in Nomad + +Every job in Nomad has a priority associated with it. Priorities impact scheduling at the evaluation and planning +stages by sorting the respective queues accordingly (higher priority jobs get moved ahead in the queues). + +Prior to Nomad 0.9, when a cluster is at capacity, any allocations that result from a newly scheduled or updated +job remain in the pending state until sufficient resources become available - regardless of the defined priority. +This leads to priority inversion, where a low priority task can prevent high priority tasks from completing. + +Nomad 0.9 brings preemption capabilities to system jobs. The Nomad scheduler will evict lower priority running allocations +to free up capacity for new allocations resulting from relatively higher priority jobs, sending evicted allocations back +into the plan queue. + +# Details + +Preemption is enabled by default in Nomad 0.9. Operators can use the [scheduler config](/api/operator.html#update-scheduler-configuration) API endpoint to disable preemption. + +Nomad uses the [job priority](/docs/job-specification/job.html#priority) field to determine what running allocations can be preempted. +In order to prevent a cascade of preemptions due to jobs close in priority being preempted, only allocations from jobs with a priority +delta of more than 10 from the job needing placement are eligible for preemption. + +For example, consider a node with the following distribution of allocations: + +| Job | Priority | Allocations | Total Used capacity | +| ------------- |-------------| -------------- |------------ +| cache | 70 | a6 | 2 GB Memory, 0.5 GB Disk, 1 CPU +| batch-analytics| 50 | a4, a5 | <1 GB Memory, 0.5 GB Disk, 0.5 CPU>, <1 GB Memory, 0.5 GB Disk, 0.5 CPU> +| email-marketing | 20 | a1, a2 | <0.5 GB Memory, 0.8 GB Disk>, <0.5 GB Memory, 0.2 GB Disk> + +If a job `webapp` with priority `75` needs placement on the above node, only allocations from `batch-analytics` and `email-marketing` are considered +eligible to be preempted because they are of a lower priority. Allocations from the `cache` job will never be preempted because its priority value `70` +is lesser than the required delta of `10`. + +Allocations are selected starting from the lowest priority, and scored according +to how closely they fit the job's required capacity. For example, if the `75` priority job needs 1GB disk and 2GB memory, Nomad will preempt +allocations `a1`, `a2` and `a4` to satisfy those requirements. + +# Preemption Visibility + +Operators can use the [allocation API](/api/allocations.html#read-allocation) or the `alloc status` command to get visibility into +whether an allocation has been preempted. Preempted allocations will have their DesiredStatus set to “evict”. The `Allocation` object +in the API also has two additional fields related to preemption. + +- `PreemptedAllocs` - This field is set on an allocation that caused preemption. It contains the allocation ids of allocations + that were preempted to place this allocation. In the above example, allocations created for the job `webapp` will have the values + `a1`, `a2` and `a4` set. +- `PreemptedByAllocID` - This field is set on allocations that were preempted by the scheduler. It contains the allocation ID of the allocation + that preempted it. In the above example, allocations `a1`, `a2` and `a4` will have this field set to the ID of the allocation from the job `webapp`. + +# Integration with Nomad plan + +`nomad plan` allows operators to dry run the scheduler. If the scheduler determines that +preemption is necessary to place the job, it shows additional information in the CLI output for +`nomad plan` as seen below. + +```sh +$ nomad plan example.nomad + ++ Job: "test" ++ Task Group: "test" (1 create) + + Task: "test" (forces create) + +Scheduler dry-run: +- All tasks successfully allocated. + +Preemptions: + +Alloc ID Job ID Task Group +ddef9521 my-batch analytics +ae59fe45 my-batch analytics +``` + +Note that, the allocations shown in the `nomad plan` output above +are not guaranteed to be the same ones picked when running the job later. +They provide the operator a sample of the type of allocations that could be preempted. + +[Omega]: https://research.google.com/pubs/pub41684.html +[Borg]: https://research.google.com/pubs/pub43438.html +[img-data-model]: /assets/images/nomad-data-model.png +[img-eval-flow]: /assets/images/nomad-evaluation-flow.png diff --git a/website/source/docs/internals/scheduling.html.md b/website/source/docs/internals/scheduling/scheduling.html.md similarity index 85% rename from website/source/docs/internals/scheduling.html.md rename to website/source/docs/internals/scheduling/scheduling.html.md index d26085bfc..b163270de 100644 --- a/website/source/docs/internals/scheduling.html.md +++ b/website/source/docs/internals/scheduling/scheduling.html.md @@ -1,26 +1,11 @@ --- layout: "docs" page_title: "Scheduling" -sidebar_current: "docs-internals-scheduling" +sidebar_current: "docs-internals-scheduling-internals" description: |- Learn about how scheduling works in Nomad. --- -# Scheduling - -Scheduling is a core function of Nomad. It is the process of assigning tasks -from jobs to client machines. This process must respect the constraints as -declared in the job, and optimize for resource utilization. This page documents -the details of how scheduling works in Nomad to help both users and developers -build a mental model. The design is heavily inspired by Google's work on both -[Omega: flexible, scalable schedulers for large compute clusters][Omega] and -[Large-scale cluster management at Google with Borg][Borg]. - -~> **Advanced Topic!** This page covers technical details of Nomad. You do not -~> need to understand these details to effectively use Nomad. The details are -~> documented here for those who wish to learn about them without having to -~> go spelunking through the source code. - # Scheduling in Nomad [![Nomad Data Model][img-data-model]][img-data-model] diff --git a/website/source/layouts/docs.erb b/website/source/layouts/docs.erb index b4abe7285..1771f2ef4 100644 --- a/website/source/layouts/docs.erb +++ b/website/source/layouts/docs.erb @@ -10,7 +10,15 @@ > - Scheduling + Scheduling + >