diff --git a/website/source/guides/operating-a-job/advanced-scheduling/preemption-service-batch.html.md b/website/source/guides/operating-a-job/advanced-scheduling/preemption-service-batch.html.md new file mode 100644 index 000000000..5cbdd34e4 --- /dev/null +++ b/website/source/guides/operating-a-job/advanced-scheduling/preemption-service-batch.html.md @@ -0,0 +1,438 @@ +--- +layout: "guides" +page_title: "Preemption (Service and Batch Jobs)" +sidebar_current: "guides-operating-a-job-preemption-service-batch" +description: |- + The following guide walks the user through enabling and using preemption on + service and batch jobs in Nomad Enterprise (0.9.3 and above). +--- + +# Preemption for Service and Batch Jobs + +~> **Enterprise Only!** This functionality only exists in Nomad Enterprise. This +is not present in the open source version of Nomad. + +Prior to Nomad 0.9, job [priority][priority] in Nomad was used to process +scheduling requests in priority order. Preemption, implemented in Nomad 0.9 +allows Nomad to evict running allocations to place allocations of a higher +priority. Allocations of a job that are blocked temporarily go into "pending" +status until the cluster has additional capacity to run them. This is useful +when operators need to run relatively higher priority tasks sooner even under +resource contention across the cluster. + +While Nomad 0.9 introduced preemption for [system][system-job] jobs, Nomad 0.9.3 +[Enterprise][enterprise] additionally allows preemption for +[service][service-job] and [batch][batch-job] jobs. This functionality can +easily be enabled by sending a [payload][payload-preemption-config] with the +appropriate options specified to the [scheduler +configuration][update-scheduler] API endpoint. + +## Reference Material + +- [Preemption][preemption] +- [Nomad Enterprise Preemption][enterprise-preemption] + +## Estimated Time to Complete + +20 minutes + +## Prerequisites + +To perform the tasks described in this guide, you need to have a Nomad +environment with Consul installed. You can use this +[repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud) +to easily provision a sandbox environment. This guide will assume a cluster with +one server node and three client nodes. To simulate resource contention, the +nodes in this environment will each have 1 GB RAM (For AWS, you can choose the +[t2.micro][t2-micro] instance type). Remember that service and batch job +preemption require Nomad 0.9.3 [Enterprise][enterprise]. + +-> **Please Note:** This guide is for demo purposes and is only using a single +server node. In a production cluster, 3 or 5 server nodes are recommended. + +## Steps + +### Step 1: Create a Job with Low Priority + +Start by creating a job with relatively lower priority into your Nomad cluster. +One of the allocations from this job will be preempted in a subsequent +deployment when there is a resource contention in the cluster. Copy the +following job into a file and name it `webserver.nomad`. + +```hcl +job "webserver" { + datacenters = ["dc1"] + type = "service" + priority = 40 + + group "webserver" { + count = 3 + + task "apache" { + driver = "docker" + + config { + image = "httpd:latest" + + port_map { + http = 80 + } + } + + resources { + network { + mbits = 10 + port "http"{} + } + + memory = 600 + } + + service { + name = "apache-webserver" + port = "http" + + check { + name = "alive" + type = "http" + path = "/" + interval = "10s" + timeout = "2s" + } + } + } + } +} +``` +Note that the [count][count] is 3 and that each allocation is specifying 600 MB +of [memory][memory]. Remember that each node only has 1 GB of RAM. + +### Step 2: Run the Low Priority Job + +Register `webserver.nomad`: + +```shell +$ nomad run webserver.nomad +==> Monitoring evaluation "1596bfc8" + Evaluation triggered by job "webserver" + Allocation "725d3b49" created: node "16653ac1", group "webserver" + Allocation "e2f9cb3d" created: node "f765c6e8", group "webserver" + Allocation "e9d8df1b" created: node "b0700ec0", group "webserver" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "1596bfc8" finished with status "complete" +``` +You should be able to check the status of the `webserver` job at this point and see that an allocation has been placed on each client node in the cluster: + +```shell +$ nomad status webserver +ID = webserver +Name = webserver +Submit Date = 2019-06-19T04:20:32Z +Type = service +Priority = 40 +... +Allocations +ID Node ID Task Group Version Desired Status Created Modified +725d3b49 16653ac1 webserver 0 run running 1m18s ago 59s ago +e2f9cb3d f765c6e8 webserver 0 run running 1m18s ago 1m2s ago +e9d8df1b b0700ec0 webserver 0 run running 1m18s ago 59s ago +``` + +### Step 3: Create a Job with High Priority + +Create another job with a [priority][priority] greater than the job you just deployed. Copy the following into a file named `redis.nomad`: + +```hcl +job "redis" { + datacenters = ["dc1"] + type = "service" + priority = 80 + + group "cache1" { + count = 1 + + task "redis" { + driver = "docker" + + config { + image = "redis:latest" + + port_map { + db = 6379 + } + } + + resources { + network { + port "db" {} + } + + memory = 700 + } + + service { + name = "redis-cache" + port = "db" + + check { + name = "alive" + type = "tcp" + interval = "10s" + timeout = "2s" + } + } + } + } +} +``` +Note that this job has a priority of 80 (greater than the priority of the job +from [Step 1][step-1]) and requires 700 MB of memory. This allocation will +create a resource contention in the cluster since each node only has 1 GB of +memory with a 600 MB allocation already placed on it. + +### Step 4: Try to Run `redis.nomad` + +Remember that preemption for service and batch jobs are [disabled by +default][preemption-config]. This means that the `redis` job will be queued due +to resource contention in the cluster. You can verify the resource contention before actually registering your job by running the [`plan`][plan] command: + +```shell +$ nomad plan redis.nomad ++ Job: "redis" ++ Task Group: "cache1" (1 create) + + Task: "redis" (forces create) + +Scheduler dry-run: +- WARNING: Failed to place all allocations. + Task Group "cache1" (failed to place 1 allocation): + * Resources exhausted on 3 nodes + * Dimension "memory" exhausted on 3 nodes +``` +Run the job to see that the allocation will be queued: + +```shell +$ nomad run redis.nomad +==> Monitoring evaluation "1e54e283" + Evaluation triggered by job "redis" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "1e54e283" finished with status "complete" but failed to place all allocations: + Task Group "cache1" (failed to place 1 allocation): + * Resources exhausted on 3 nodes + * Dimension "memory" exhausted on 3 nodes + Evaluation "1512251a" waiting for additional capacity to place remainder +``` + +You may also verify the allocation has been queued by now checking the status of the job: + +```shell +$ nomad status redis +ID = redis +Name = redis +Submit Date = 2019-06-19T03:33:17Z +Type = service +Priority = 80 +... +Placement Failure +Task Group "cache1": + * Resources exhausted on 3 nodes + * Dimension "memory" exhausted on 3 nodes + +Allocations +No allocations placed +``` +You may remove this job now. In the next steps, we will enable service job preemption and re-deploy: + +```shell +$ nomad stop -purge redis +==> Monitoring evaluation "153db6c0" + Evaluation triggered by job "redis" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "153db6c0" finished with status "complete" +``` + +### Step 5: Enable Service Job Preemption + +Verify the [scheduler configuration][scheduler-configuration] with the following +command: + +```shell +$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq +{ + "SchedulerConfig": { + "PreemptionConfig": { + "SystemSchedulerEnabled": true, + "BatchSchedulerEnabled": false, + "ServiceSchedulerEnabled": false + }, + "CreateIndex": 5, + "ModifyIndex": 506 + }, + "Index": 506, + "LastContact": 0, + "KnownLeader": true +} +``` + +Note that [BatchSchedulerEnabled][batch-enabled] and +[ServiceSchedulerEnabled][service-enabled] are both set to `false` by default. +Since we are preempting service jobs in this guide, we need to set +`ServiceSchedulerEnabled` to `true`. We will do this by directly interacting +with the [API][update-scheduler]. + +Create the following JSON payload and place it in a file named `scheduler.json`: + +```json +{ + "PreemptionConfig": { + "SystemSchedulerEnabled": true, + "BatchSchedulerEnabled": false, + "ServiceSchedulerEnabled": true + } +} +``` +Note that [ServiceSchedulerEnabled][service-enabled] has been set to `true`. + +Run the following command to update the scheduler configuration: + +```shell +$ curl -XPOST localhost:4646/v1/operator/scheduler/configuration -d @scheduler.json +``` +You should now be able to check the scheduler configuration again and see that +preemption has been enabled for service jobs (output below is abbreviated): + +```shell +$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq +{ + "SchedulerConfig": { + "PreemptionConfig": { + "SystemSchedulerEnabled": true, + "BatchSchedulerEnabled": false, + "ServiceSchedulerEnabled": true + }, +... +} +``` + +### Step 6: Try Running `redis.nomad` Again + +Now that you have enabled preemption on service jobs, deploying your `redis` job +should evict one of the lower priority `webserver` allocations and place it into +a queue. You can run `nomad plan` to see a preview of what will happen: + +```shell +$ nomad plan redis.nomad ++ Job: "redis" ++ Task Group: "cache1" (1 create) + + Task: "redis" (forces create) + +Scheduler dry-run: +- All tasks successfully allocated. + +Preemptions: + +Alloc ID Job ID Task Group +725d3b49-d5cf-6ba2-be3d-cb441c10a8b3 webserver webserver +... +``` + +Note that Nomad is indicating one of the `webserver` allocations will be +evicted. + +Now run the `redis` job: + +```shell +$ nomad run redis.nomad +==> Monitoring evaluation "7ada9d9f" + Evaluation triggered by job "redis" + Allocation "8bfcdda3" created: node "16653ac1", group "cache1" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "7ada9d9f" finished with status "complete" +``` +You can check the status of the `webserver` job and verify one of the allocations has been evicted: + +```shell +$ nomad status webserver +ID = webserver +Name = webserver +Submit Date = 2019-06-19T04:20:32Z +Type = service +Priority = 40 +... +Summary +Task Group Queued Starting Running Failed Complete Lost +webserver 1 0 2 0 1 0 + +Placement Failure +Task Group "webserver": + * Resources exhausted on 3 nodes + * Dimension "memory" exhausted on 3 nodes + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +725d3b49 16653ac1 webserver 0 evict complete 4m10s ago 33s ago +e2f9cb3d f765c6e8 webserver 0 run running 4m10s ago 3m54s ago +e9d8df1b b0700ec0 webserver 0 run running 4m10s ago 3m51s ago +``` + +### Step 7: Stop the Redis Job + +Stop the `redis` job and verify that evicted/queued `webserver` allocation +starts running again: + +```shell +$ nomad stop redis +==> Monitoring evaluation "670922e9" + Evaluation triggered by job "redis" + Evaluation status changed: "pending" -> "complete" +==> Evaluation "670922e9" finished with status "complete" +``` +You should now be able to see from the `webserver` status that the third allocation that was previously preempted is running again: + +```shell +$ nomad status webserver +ID = webserver +Name = webserver +Submit Date = 2019-06-19T04:20:32Z +Type = service +Priority = 40 +Datacenters = dc1 +Status = running +Periodic = false +Parameterized = false + +Summary +Task Group Queued Starting Running Failed Complete Lost +webserver 0 0 3 0 1 0 + +Allocations +ID Node ID Task Group Version Desired Status Created Modified +f623eb81 16653ac1 webserver 0 run running 13s ago 7s ago +725d3b49 16653ac1 webserver 0 evict complete 6m44s ago 3m7s ago +e2f9cb3d f765c6e8 webserver 0 run running 6m44s ago 6m28s ago +e9d8df1b b0700ec0 webserver 0 run running 6m44s ago 6m25s ago +``` + +## Next Steps + +The process you learned in this guide can also be applied to +[batch][batch-enabled] jobs as well. Read more about preemption in Nomad +Enterprise [here][enterprise-preemption]. + +[batch-enabled]: /api/operator.html#batchschedulerenabled-1 +[batch-job]: /docs/schedulers.html#batch +[count]: /docs/job-specification/group.html#count +[enterprise]: /docs/enterprise/index.html +[enterprise-preemption]: /docs/enterprise/preemption/index.html +[memory]: /docs/job-specification/resources.html#memory +[payload-preemption-config]: /api/operator.html#sample-payload-1 +[plan]: /docs/commands/job/plan.html +[preemption]: /docs/internals/scheduling/preemption.html +[preemption-config]: /api/operator.html#preemptionconfig-1 +[priority]: /docs/job-specification/job.html#priority +[service-enabled]: /api/operator.html#serviceschedulerenabled-1 +[service-job]: /docs/schedulers.html#service +[step-1]: #step-1-create-a-job-with-low-priority +[system-job]: /docs/schedulers.html#system +[t2-micro]: https://aws.amazon.com/ec2/instance-types/ +[update-scheduler]: /api/operator.html#update-scheduler-configuration +[scheduler-configuration]: /api/operator.html#read-scheduler-configuration diff --git a/website/source/guides/operating-a-job/advanced-scheduling/spread.html.md b/website/source/guides/operating-a-job/advanced-scheduling/spread.html.md index 427fbe422..7e20cb323 100644 --- a/website/source/guides/operating-a-job/advanced-scheduling/spread.html.md +++ b/website/source/guides/operating-a-job/advanced-scheduling/spread.html.md @@ -1,7 +1,7 @@ --- layout: "guides" page_title: "Spread" -sidebar_current: "guides-advanced-scheduling" +sidebar_current: "guides-operating-a-job-spread" description: |- The following guide walks the user through using the spread stanza in Nomad. --- diff --git a/website/source/layouts/guides.erb b/website/source/layouts/guides.erb index 2a851d72d..ef12a0bc9 100644 --- a/website/source/layouts/guides.erb +++ b/website/source/layouts/guides.erb @@ -119,10 +119,14 @@ Placement Preferences with Affinities -