From 02e585a611a369ef03fb91c8a3ffb65d49350f7a Mon Sep 17 00:00:00 2001 From: Mahmood Ali Date: Thu, 13 May 2021 13:35:51 -0400 Subject: [PATCH] add a section about memory oversubscription (#10573) add a section about memory oversubscription Co-authored-by: Tim Gross --- .../docs/job-specification/resources.mdx | 48 ++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/website/content/docs/job-specification/resources.mdx b/website/content/docs/job-specification/resources.mdx index 8ab13abc6..d006a5407 100644 --- a/website/content/docs/job-specification/resources.mdx +++ b/website/content/docs/job-specification/resources.mdx @@ -39,7 +39,7 @@ job "docs" { - `memory` `(int: 300)` - Specifies the memory required in MB. -- `memory_max` (`int`: <optional>) 1.1 Beta - Optionally, specifies the maximum memory the task may use, if the client has excess memory capacity, in MB. +- `memory_max` (`int`: <optional>) 1.1 Beta - Optionally, specifies the maximum memory the task may use, if the client has excess memory capacity, in MB. See [Memory Oversubscription](#memory-oversubscription) for more details. - `device` ([Device][]: <optional>) - Specifies the device requirements. This may be repeated to request multiple device types. @@ -86,5 +86,51 @@ resources { } } ``` +## Memory Oversubscription + +Setting task memory limits requires balancing the risk of interrupting tasks +against the risk of wasting resources. If a task memory limit is set too low, +the task may exceed the limit and be interrupted; if the task memory is too +high, the cluster is left underutilized. + +To help maximize cluster memory utilization while allowing a safety margin for +unexpected load spikes, Nomad 1.1. lets job authors set two separate memory +limits: + +* `memory`: the reserve limit to represent the task’s typical memory usage — + this number is used by the Nomad scheduler to reserve and place the task + +* `memory_max`: the maximum memory the task may use, if the client has excess + available memory, and may be terminated if it exceeds + +If a client's memory becomes contended or low, the operating system will +pressure the running tasks to free up memory. If the contention persists, Nomad +may kill oversubscribed tasks and reschedule them to other clients. The exact +mechanism for memory pressure is specific to the task driver, operating system, +and application runtime. + +The new max limit attribute is currently supported by the official `docker`, +`exec`, and `java` task drivers. Consult the documentation of +community-supported task drivers for their memory oversubscription support. + +Memory oversubscription is opt-in. Nomad operators can enable [Memory Oversubscription in the scheduler +configuration](/api-docs/operator/scheduler#update-scheduler-configuration). Enterprise customers can use [Resource +Quotas](https://learn.hashicorp.com/tutorials/nomad/quotas) to limit the memory +oversubscription. + +To avoid degrading the cluster experience, we recommend examining and monitoring +resource utilization and considering the following suggestions: + +* Set `oom_score_adj` for Linux host services that aren't managed by Nomad, e.g. + Docker, logging services, and the Nomad agent itself. For Systemd services, you can use the [`OOMScoreAdj` field](https://github.com/hashicorp/nomad/blob/v1.0.0/dist/systemd/nomad.service#L25). + +* Monitor hosts for memory utilization and set alerts on Out-Of-Memory errors + +* Set the [client `reserved`](/docs/configuration/client#reserved) with enough + memory for host services that aren't managed by Nomad as well as a buffer + for the memory excess. For example, if the client reserved memory is 1GB, + the allocations on the host may exceed their soft memory limit by almost + 1GB in aggregate before the memory becomes contended and allocations get + killed. [device]: /docs/job-specification/device 'Nomad device Job Specification'