From a0a52f61981cfa33883b2523972c751eedd39398 Mon Sep 17 00:00:00 2001 From: Alex Dadgar Date: Tue, 22 Jan 2019 16:30:57 -0800 Subject: [PATCH] Device job stanza --- website/source/docs/devices/community.html.md | 16 ++ website/source/docs/devices/index.html.md | 24 ++ website/source/docs/devices/nvidia.html.md | 118 ++++++++ .../docs/job-specification/device.html.md | 255 ++++++++++++++++++ website/source/layouts/docs.erb | 16 ++ 5 files changed, 429 insertions(+) create mode 100644 website/source/docs/devices/community.html.md create mode 100644 website/source/docs/devices/index.html.md create mode 100644 website/source/docs/devices/nvidia.html.md create mode 100644 website/source/docs/job-specification/device.html.md diff --git a/website/source/docs/devices/community.html.md b/website/source/docs/devices/community.html.md new file mode 100644 index 000000000..514cc2878 --- /dev/null +++ b/website/source/docs/devices/community.html.md @@ -0,0 +1,16 @@ +--- +layout: "docs" +page_title: "Drivers: Custom" +sidebar_current: "docs-devices-community" +description: |- + Create custom task drivers for Nomad. +--- + +# Custom Drivers + +Nomad does not currently support pluggable task drivers, however the +interface that a task driver must implement is minimal. In the short term, +custom drivers can be implemented in Go and compiled into the binary, +however in the long term we plan to expose a plugin interface such that +task drivers can be dynamically registered without recompiling the Nomad binary. + diff --git a/website/source/docs/devices/index.html.md b/website/source/docs/devices/index.html.md new file mode 100644 index 000000000..9d2831c33 --- /dev/null +++ b/website/source/docs/devices/index.html.md @@ -0,0 +1,24 @@ +--- +layout: "docs" +page_title: "Device Plugins" +sidebar_current: "docs-devices" +description: |- + Device Plugins are used to expose devices to tasks in Nomad. +--- + +# Device Plugins + +Task drivers are used by Nomad clients to execute a task and provide resource +isolation. By having extensible task drivers, Nomad has the flexibility to +support a broad set of workloads across all major operating systems. + +The list of supported task drivers is provided on the left of this page. +Each task driver documents the configuration available in a +[job specification](/docs/job-specification/index.html), the environments it +can be used in, and the resource isolation mechanisms available. + +Nomad strives to mask the details of running a task from users and instead +provides a clean abstraction. It is possible for the same task to be executed +with different isolation levels depending on the client running the task. +The goal is to use the strictest isolation available and gracefully degrade +protections where necessary. diff --git a/website/source/docs/devices/nvidia.html.md b/website/source/docs/devices/nvidia.html.md new file mode 100644 index 000000000..b8187c9eb --- /dev/null +++ b/website/source/docs/devices/nvidia.html.md @@ -0,0 +1,118 @@ +--- +layout: "docs" +page_title: "Drivers: Raw Exec" +sidebar_current: "docs-devices-nvidia" +description: |- + The Raw Exec task driver simply fork/execs and provides no isolation. +--- + +# Raw Fork/Exec Driver + +Name: `raw_exec` + +The `raw_exec` driver is used to execute a command for a task without any +isolation. Further, the task is started as the same user as the Nomad process. +As such, it should be used with extreme care and is disabled by default. + +## Task Configuration + +```hcl +task "webservice" { + driver = "raw_exec" + + config { + command = "my-binary" + args = ["-flag", "1"] + } +} +``` + +The `raw_exec` driver supports the following configuration in the job spec: + +* `command` - The command to execute. Must be provided. If executing a binary + that exists on the host, the path must be absolute. If executing a binary that + is downloaded from an [`artifact`](/docs/job-specification/artifact.html), the + path can be relative from the allocations's root directory. + +* `args` - (Optional) A list of arguments to the `command`. References + to environment variables or any [interpretable Nomad + variables](/docs/runtime/interpolation.html) will be interpreted before + launching the task. + +## Examples + +To run a binary present on the Node: + +``` +task "example" { + driver = "raw_exec" + + config { + # When running a binary that exists on the host, the path must be absolute/ + command = "/bin/sleep" + args = ["1"] + } +} +``` + +To execute a binary downloaded from an [`artifact`](/docs/job-specification/artifact.html): + +``` +task "example" { + driver = "raw_exec" + + config { + command = "name-of-my-binary" + } + + artifact { + source = "https://internal.file.server/name-of-my-binary" + options { + checksum = "sha256:abd123445ds4555555555" + } + } +} +``` + +## Client Requirements + +The `raw_exec` driver can run on all supported operating systems. For security +reasons, it is disabled by default. To enable raw exec, the Nomad client +configuration must explicitly enable the `raw_exec` driver in the client's +[options](/docs/configuration/client.html#options): + +``` +client { + options = { + "driver.raw_exec.enable" = "1" + } +} +``` + +## Client Options + +* `driver.raw_exec.enable` - Specifies whether the driver should be enabled or + disabled. + +* `driver.raw_exec.no_cgroups` - Specifies whether the driver should not use + cgroups to manage the process group launched by the driver. By default, + cgroups are used to manage the process tree to ensure full cleanup of all + processes started by the task. The driver only uses cgroups when Nomad is + launched as root, on Linux and when cgroups are detected. + +## Client Attributes + +The `raw_exec` driver will set the following client attributes: + +* `driver.raw_exec` - This will be set to "1", indicating the driver is available. + +## Resource Isolation + +The `raw_exec` driver provides no isolation. + +If the launched process creates a new process group, it is possible that Nomad +will leak processes on shutdown unless the application forwards signals +properly. Nomad will not leak any processes if cgroups are being used to manage +the process tree. Cgroups are used on Linux when Nomad is being run with +appropriate priviledges, the cgroup system is mounted and the operator hasn't +disabled cgroups for the driver. diff --git a/website/source/docs/job-specification/device.html.md b/website/source/docs/job-specification/device.html.md new file mode 100644 index 000000000..783edc6fd --- /dev/null +++ b/website/source/docs/job-specification/device.html.md @@ -0,0 +1,255 @@ +--- +layout: "docs" +page_title: "device Stanza - Job Specification" +sidebar_current: "docs-job-specification-device" +description: |- + The "device" stanza is used to require a certain device be made available + to the task. +--- + +# `device` Stanza + + + + + + +
Placement + job -> group -> task -> resources -> **device** +
+ +The `device` stanza is used to create both a scheduling and runtime requirement +that the given task has access to the specified devices. A device is a hardware +device that is attached to the node and may be made available to the task. +Examples are GPUs, FPGAs, and TPUs. + +When a `device` stanza is added, Nomad will schedule the task onto a node that +contains the set of device(s) that meet the specified requirements. The `device` stanza +allows the operator to specify as little as just the type of device required, +such as `gpu`, all the way to specifying arbitrary constraints and affinities. +Once the scheduler has placed the allocation on a suitable node, the Nomad +Client will invoke the device plugin to retrieve information on how to mount the +device and what environment variables to expose. For more information on the +runtime environment, please consult the individual device plugin's documentation. + +See the [device plugin's documentation][devices] for a list of supported devices. + +```hcl +job "docs" { + group "example" { + task "server" { + resources { + device "nvidia/gpu" { + count = 2 + + constraint { + attribute = "${driver.attr.memory}" + operator = ">=" + value = "2 GiB" + } + + affinity { + attribute = "${driver.attr.memory}" + operator = ">=" + value = "4 GiB" + weight = 75 + } + } + } + } + } +} +``` + +In the above example, the task is requesting two GPUs, from the Nvidia vendor, +but is not specifying the specific model required. Instead it is placing a hard +constraint that the device has at least 2 GiB of memory and that it would prefer +to use GPUs that have at least 4 GiB. This examples shows how expressive the +`device` stanza can be. + +~> Device supported is currently limited to Linux, and container based drivers +due to the ability to isolate devices to specific tasks. + +## `device` Parameters + +- `name` `(string: "")` - Specifies the device required. The following inputs + are valid: + + * ``: If a single value is given, it is assumed to be the device + type, such as "gpu", or "fpga". + + * `/`: If two values are given separated by a `/`, the + given device type will be selected, constraining on the provided vendor. + Examples include "nvidia/gpu" or "amd/gpu". + + * `//`: If three values are given separated by a `/`, the + given device type will be selected, constraining on the provided vendor, and + model name. Examples include "nvidia/gpu/1080ti" or "nvidia/gpu/2080ti". + +- `count` `(int: 1)` - Specifies the number of instances of the given device + that are required. + +- `constraint` ([Constraint][]: nil) - Constraints to restrict + which devices are eligible. This can be provided multiple times to define + additional constraints. See below for available attributes. + +- `affinity` ([Affinity][]: nil) - Affinity to specify a preference + for which devices get selected. This can be provided multiple times to define + additional affinities. See below for available attributes. + +## `device` Constraint and Affinity Attributes + +The set of attributes available for use in a `constraint` or `affinity` are as +follows: + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VariableDescriptionExample Value
${device.type}The type of device"gpu", "tpu", "fpga"
${device.vendor}The device's vendor"amd", "nvidia", "intel"
${device.model}The device's model"1080ti"
${device.attr.<property>}Property of the device${device.attr.memory} => 8 GiB
+ +For the set of attributes available, please see the individual [device plugin's +documentation][devices]. + +### Attribute Units and Conversions + +Devices report their attributes with strict types and can also provide unit +information. For example, when a GPU is reporting its memory, it can report that +it is "4096 MiB". Since Nomad has the associated unit information, a constraint +that requires greater than "3.5 GiB" can match since Nomad can convert between +these units. + +The units Nomad supports is as follows: + + + + + + + + + + + + + + + + + + + + +
Base UnitValues
Byte**Base 2**: KiB, MiB, GiB, TiB, PiB, EiB
**Base 10**: kB, KB (equivalent to kB), MB, GB, TB, PB, EB
+
Byte Rates**Base 2**: KiB/s, MiB/s, GiB/s, TiB/s, PiB/s, EiB/s
**Base 10**: kB/s, KB/s (equivalent to kB/s), MB/s, GB/s, TB/s, PB/s, EB/s
+
HertzMHz, GHz
WattsmW, W, kW, MW, GW
+ +Conversion is only possible within the same base unit. + +## `device` Examples + +The following examples only show the `device` stanzas. Remember that the +`device` stanza is only valid in the placements listed above. + +### Single Nvidia GPU + +This example schedules a task with a single Nvidia GPU made available. + +```hcl +device "nvidia/gpu" {} +``` + +### Multiple Nvidia GPU + +This example schedules a task with a two Nvidia GPU made available. + +```hcl +device "nvidia/gpu" { + count = 2 +} +``` + +### Single Nvidia GPU with Specific Model + +This example schedules a task with a single Nvidia GPU made available and uses +the name to specify the exact model to be used. + +```hcl +device "nvidia/gpu/1080ti" {} +``` + +This is a simplification of the following: + +```hcl +device "gpu" { + count = 1 + + constraint { + attribute = "${device.vendor}" + value = "nvidia" + } + + constraint { + attribute = "${device.model}" + value = "1080ti" + } +} +``` + +### Affinity with Unit Conversion + +This example uses an affinity to tell the scheduler it would prefer if the GPU +had at least 1.5 GiB of memory. The following are both equivalent as Nomad can +do unit conversions. + +Specified in `GiB`: + +```hcl +device "nvidia/gpu" { + affinity { + attribute = "${device.attr.memory}" + operator = ">=" + value = "1.5 GiB" + weight = 75 + } +} +``` + +Specified in `MiB`: + +```hcl +device "nvidia/gpu" { + affinity { + attribute = "${device.attr.memory}" + operator = ">=" + value = "1500 MiB" + weight = 75 + } +} +``` + +[affinity]: /docs/job-specification/affinity.html "Nomad affinity Job Specification" +[constraint]: /docs/job-specification/constraint.html "Nomad constraint Job Specification" +[devices]: /docs/devices/index.html "Nomad Device Plugins" diff --git a/website/source/layouts/docs.erb b/website/source/layouts/docs.erb index 64a35cd2f..b4abe7285 100644 --- a/website/source/layouts/docs.erb +++ b/website/source/layouts/docs.erb @@ -333,6 +333,9 @@ > constraint + > + device + > dispatch_payload @@ -434,6 +437,19 @@ + > + Device Plugins + + + > Schedulers