From 9062380c1144b16920d6e35a98b6d44ca47dc1ce Mon Sep 17 00:00:00 2001 From: Chris Baker <1675087+cgbaker@users.noreply.github.com> Date: Thu, 21 Nov 2019 19:50:50 +0000 Subject: [PATCH] added the device plugin authoring guide, made minor formatting changes to task driver plugin authoring guide. --- .../docs/internals/plugins/devices.html.md | 77 ++++++++++++++++++- .../internals/plugins/task-drivers.html.md | 16 ++-- 2 files changed, 82 insertions(+), 11 deletions(-) diff --git a/website/source/docs/internals/plugins/devices.html.md b/website/source/docs/internals/plugins/devices.html.md index a7a7e205f..6b137aa10 100644 --- a/website/source/docs/internals/plugins/devices.html.md +++ b/website/source/docs/internals/plugins/devices.html.md @@ -3,10 +3,81 @@ layout: "docs" page_title: "Device Plugins" sidebar_current: "docs-internals-plugins-devices" description: |- - Learn about how to author a Nomad device plugin. + Learn how to author a Nomad device plugin. --- # Devices -Device plugin documentation is currently a work in progress. Until there is -documentation, the [Nvidia GPU plugin](https://github.com/hashicorp/nomad/tree/master/devices/gpu/nvidia) is a useful example. +Nomad has built-in support for scheduling compute resources such as CPU, memory, +and networking. Nomad device plugins are used to support scheduling tasks with +other devices, such as GPUs. They are responsible for fingerprinting these +devices and working with the Nomad client to make them available to assigned +tasks. + +For a real world example of a Nomad device plugin implementation, see the [Nvidia +GPU plugin](https://github.com/hashicorp/nomad/tree/master/devices/gpu/nvidia). + +## Authoring Device Plugins + +Authoring a device plugin in Nomad consists of implementing the +[DevicePlugin][devicePlugin] interface alongside +a main package to launch the plugin. + +The [device plugin skeleton project][skeletonProject] exists to help bootstrap +the development of new device plugins. It provides most of the boilerplate +necessary for a device plugin, along with detailed comments. + +### Lifecycle and State + +A device plugin is long-lived. Nomad will ensure that one instance of the plugin is +running. If the plugin crashes or otherwise terminates, Nomad will launch another +instance of it. + +However, unlike [task drivers](task-drivers.html), device plugins do not currently +have an interface for persisting state to the Nomad client. Instead, the device +plugin API emphasizes fingerprinting devices and reporting their status. After +helping to provision a task with a scheduled device, a device plugin does not +have any responsibility (or ability) to monitor the task. + +## Device Plugin API + +The [base plugin][baseplugin] must be implemented in addition to the following +functions. + +### `Fingerprint(context.Context) (<-chan *FingerprintResponse, error)` + +The `Fingerprint` [function][fingerprintFn] is called by the client when the plugin is started. +It allows the plugin to provide Nomad with a list of discovered devices, along with their +attributes, for the purpose of scheduling workloads using devices. +The channel returned should immediately send an initial +[`FingerprintResponse`][fingerprintResponse], then send periodic updates at +an appropriate interval until the context is canceled. + +Each fingerprint response consists of either an error or a list of device groups. +A device group is a list of detected devices that are identical for the purpose of +scheduling; that is, they will have identical attributes. + +### `Stats(context.Context, time.Duration) (<-chan *StatsResponse, error)` + +The `Stats` [function][statsFn] returns a channel on which the plugin should +emit device statistics, at the specified interval, until either an error is +encountered or the specified context is cancelled. The `StatsReponse` object +allows [dimensioned][dimensioned] statistics to be returned for each device in a device group. + +### `Reserve(deviceIDs []string) (*ContainerReservation, error)` + +The `Reserve` [function][reserveFn] accepts a list of device IDs and returns the information +necessary for the client to make those devices available to a task. Currently, +the `ContainerReservation` object allows the plugin to specify environment +variables for the task, as well as a list of host devices and files to be mounted +into the task's filesystem. Any orchestration required to prepare the device for +use should also be performed in this function. + +[DevicePlugin]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/device/device.go#L20-L33 +[baseplugin]: /docs/internals/plugins/base.html +[skeletonProject]: https://github.com/hashicorp/nomad-skeleton-device-plugin +[fingerprintResponse]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/device/device.go#L37-L43 +[fingerprintFn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L159-L165 +[statsFn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L169-L176 +[reserveFn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L189-L245 +[dimensioned]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/shared/structs/stats.go#L33-L34 diff --git a/website/source/docs/internals/plugins/task-drivers.html.md b/website/source/docs/internals/plugins/task-drivers.html.md index a6dd06bce..8b0d72724 100644 --- a/website/source/docs/internals/plugins/task-drivers.html.md +++ b/website/source/docs/internals/plugins/task-drivers.html.md @@ -3,22 +3,22 @@ layout: "docs" page_title: "Task Driver Plugins" sidebar_current: "docs-internals-plugins-task-drivers" description: |- - Learn about how to author a Nomad plugin. + Learn how to author a Nomad task driver plugin. --- # Task Drivers Task drivers in Nomad are the runtime components that execute workloads. For -a real world example of a Nomad task driver plugin implementation see the [LXC +a real world example of a Nomad task driver plugin implementation, see the [LXC driver source][lxcdriver]. ## Authoring Task Driver Plugins Authoring a task driver (shortened to driver in this documentation) in Nomad consists of implementing the [DriverPlugin][driverplugin] interface and adding -a main package to launch the plugin. A driver plugin is long lived and its +a main package to launch the plugin. A driver plugin is long-lived and its lifetime is not bound to the Nomad client. This means that the Nomad client can -be restarted without the restarting the driver. Nomad will ensure that one +be restarted without restarting the driver. Nomad will ensure that one instance of the driver is running, meaning if the driver crashes or otherwise terminates, Nomad will launch another instance of it. @@ -29,7 +29,7 @@ Nomad client can recover tasks into the driver state. ## Task Driver Plugin API -The [base plugin][baseplugin] must be implement in addition to the following +The [base plugin][baseplugin] must be implemented in addition to the following functions. ### `TaskConfigSchema() (*hclspec.Spec, error)` @@ -123,7 +123,7 @@ returned by the `StartTask` function. If no error was returned, it is expected that the driver can now operate on the task by referencing the task ID. If an error occurs, the Nomad client will mark the task as `lost`. -### `WaitTask(ctx context.Context, id string) (<-chan *ExitResult, error)` +### `WaitTask(context.Context, id string) (<-chan *ExitResult, error)` The `WaitTask` function is expected to return a channel that will send an `*ExitResult` when the task exits or close the channel when the context is @@ -153,7 +153,7 @@ called. The `InspectTask` function returns detailed status information for the referenced `taskID`. -### `TaskStats(ctx context.Context, id string, i time.Duration) (<-chan *cstructs.TaskResourceUsage, error)` +### `TaskStats(context.Context, id string, time.Duration) (<-chan *cstructs.TaskResourceUsage, error)` The `TaskStats` function returns a channel which the driver should send stats to at the given interval. The driver must send stats at the given interval @@ -188,7 +188,7 @@ inside the running container. `ExecTask` is called for Consul script checks. [lxcdriver]: https://github.com/hashicorp/nomad-driver-lxc -[DriverPlugin]: https://github.com/hashicorp/nomad/blob/v0.9.0-beta2/plugins/drivers/driver.go#L39-L57 +[DriverPlugin]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/drivers/driver.go#L39-L57 [baseplugin]: /docs/internals/plugins/base.html [taskconfig]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskConfig [taskhandle]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskHandle