mirror of
https://github.com/kemko/nomad.git
synced 2026-01-08 19:35:41 +03:00
Merge pull request #6755 from hashicorp/docs-device-plugin-guide
cherry-pick: device plugin authoring guide
This commit is contained in:
@@ -3,10 +3,81 @@ layout: "docs"
|
||||
page_title: "Device Plugins"
|
||||
sidebar_current: "docs-internals-plugins-devices"
|
||||
description: |-
|
||||
Learn about how to author a Nomad device plugin.
|
||||
Learn how to author a Nomad device plugin.
|
||||
---
|
||||
|
||||
# Devices
|
||||
|
||||
Device plugin documentation is currently a work in progress. Until there is
|
||||
documentation, the [Nvidia GPU plugin](https://github.com/hashicorp/nomad/tree/master/devices/gpu/nvidia) is a useful example.
|
||||
Nomad has built-in support for scheduling compute resources such as CPU, memory,
|
||||
and networking. Nomad device plugins are used to support scheduling tasks with
|
||||
other devices, such as GPUs. They are responsible for fingerprinting these
|
||||
devices and working with the Nomad client to make them available to assigned
|
||||
tasks.
|
||||
|
||||
For a real world example of a Nomad device plugin implementation, see the [Nvidia
|
||||
GPU plugin](https://github.com/hashicorp/nomad/tree/master/devices/gpu/nvidia).
|
||||
|
||||
## Authoring Device Plugins
|
||||
|
||||
Authoring a device plugin in Nomad consists of implementing the
|
||||
[DevicePlugin][devicePlugin] interface alongside
|
||||
a main package to launch the plugin.
|
||||
|
||||
The [device plugin skeleton project][skeletonProject] exists to help bootstrap
|
||||
the development of new device plugins. It provides most of the boilerplate
|
||||
necessary for a device plugin, along with detailed comments.
|
||||
|
||||
### Lifecycle and State
|
||||
|
||||
A device plugin is long-lived. Nomad will ensure that one instance of the plugin is
|
||||
running. If the plugin crashes or otherwise terminates, Nomad will launch another
|
||||
instance of it.
|
||||
|
||||
However, unlike [task drivers](task-drivers.html), device plugins do not currently
|
||||
have an interface for persisting state to the Nomad client. Instead, the device
|
||||
plugin API emphasizes fingerprinting devices and reporting their status. After
|
||||
helping to provision a task with a scheduled device, a device plugin does not
|
||||
have any responsibility (or ability) to monitor the task.
|
||||
|
||||
## Device Plugin API
|
||||
|
||||
The [base plugin][baseplugin] must be implemented in addition to the following
|
||||
functions.
|
||||
|
||||
### `Fingerprint(context.Context) (<-chan *FingerprintResponse, error)`
|
||||
|
||||
The `Fingerprint` [function][fingerprintFn] is called by the client when the plugin is started.
|
||||
It allows the plugin to provide Nomad with a list of discovered devices, along with their
|
||||
attributes, for the purpose of scheduling workloads using devices.
|
||||
The channel returned should immediately send an initial
|
||||
[`FingerprintResponse`][fingerprintResponse], then send periodic updates at
|
||||
an appropriate interval until the context is canceled.
|
||||
|
||||
Each fingerprint response consists of either an error or a list of device groups.
|
||||
A device group is a list of detected devices that are identical for the purpose of
|
||||
scheduling; that is, they will have identical attributes.
|
||||
|
||||
### `Stats(context.Context, time.Duration) (<-chan *StatsResponse, error)`
|
||||
|
||||
The `Stats` [function][statsFn] returns a channel on which the plugin should
|
||||
emit device statistics, at the specified interval, until either an error is
|
||||
encountered or the specified context is cancelled. The `StatsReponse` object
|
||||
allows [dimensioned][dimensioned] statistics to be returned for each device in a device group.
|
||||
|
||||
### `Reserve(deviceIDs []string) (*ContainerReservation, error)`
|
||||
|
||||
The `Reserve` [function][reserveFn] accepts a list of device IDs and returns the information
|
||||
necessary for the client to make those devices available to a task. Currently,
|
||||
the `ContainerReservation` object allows the plugin to specify environment
|
||||
variables for the task, as well as a list of host devices and files to be mounted
|
||||
into the task's filesystem. Any orchestration required to prepare the device for
|
||||
use should also be performed in this function.
|
||||
|
||||
[DevicePlugin]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/device/device.go#L20-L33
|
||||
[baseplugin]: /docs/internals/plugins/base.html
|
||||
[skeletonProject]: https://github.com/hashicorp/nomad-skeleton-device-plugin
|
||||
[fingerprintResponse]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/device/device.go#L37-L43
|
||||
[fingerprintFn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L159-L165
|
||||
[statsFn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L169-L176
|
||||
[reserveFn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L189-L245
|
||||
[dimensioned]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/shared/structs/stats.go#L33-L34
|
||||
|
||||
@@ -3,22 +3,22 @@ layout: "docs"
|
||||
page_title: "Task Driver Plugins"
|
||||
sidebar_current: "docs-internals-plugins-task-drivers"
|
||||
description: |-
|
||||
Learn about how to author a Nomad plugin.
|
||||
Learn how to author a Nomad task driver plugin.
|
||||
---
|
||||
|
||||
# Task Drivers
|
||||
|
||||
Task drivers in Nomad are the runtime components that execute workloads. For
|
||||
a real world example of a Nomad task driver plugin implementation see the [LXC
|
||||
a real world example of a Nomad task driver plugin implementation, see the [LXC
|
||||
driver source][lxcdriver].
|
||||
|
||||
## Authoring Task Driver Plugins
|
||||
|
||||
Authoring a task driver (shortened to driver in this documentation) in Nomad
|
||||
consists of implementing the [DriverPlugin][driverplugin] interface and adding
|
||||
a main package to launch the plugin. A driver plugin is long lived and its
|
||||
a main package to launch the plugin. A driver plugin is long-lived and its
|
||||
lifetime is not bound to the Nomad client. This means that the Nomad client can
|
||||
be restarted without the restarting the driver. Nomad will ensure that one
|
||||
be restarted without restarting the driver. Nomad will ensure that one
|
||||
instance of the driver is running, meaning if the driver crashes or otherwise
|
||||
terminates, Nomad will launch another instance of it.
|
||||
|
||||
@@ -29,7 +29,7 @@ Nomad client can recover tasks into the driver state.
|
||||
|
||||
## Task Driver Plugin API
|
||||
|
||||
The [base plugin][baseplugin] must be implement in addition to the following
|
||||
The [base plugin][baseplugin] must be implemented in addition to the following
|
||||
functions.
|
||||
|
||||
### `TaskConfigSchema() (*hclspec.Spec, error)`
|
||||
@@ -123,7 +123,7 @@ returned by the `StartTask` function. If no error was returned, it is
|
||||
expected that the driver can now operate on the task by referencing the task
|
||||
ID. If an error occurs, the Nomad client will mark the task as `lost`.
|
||||
|
||||
### `WaitTask(ctx context.Context, id string) (<-chan *ExitResult, error)`
|
||||
### `WaitTask(context.Context, id string) (<-chan *ExitResult, error)`
|
||||
|
||||
The `WaitTask` function is expected to return a channel that will send an
|
||||
`*ExitResult` when the task exits or close the channel when the context is
|
||||
@@ -153,7 +153,7 @@ called.
|
||||
The `InspectTask` function returns detailed status information for the
|
||||
referenced `taskID`.
|
||||
|
||||
### `TaskStats(ctx context.Context, id string, i time.Duration) (<-chan *cstructs.TaskResourceUsage, error)`
|
||||
### `TaskStats(context.Context, id string, time.Duration) (<-chan *cstructs.TaskResourceUsage, error)`
|
||||
|
||||
The `TaskStats` function returns a channel which the driver should send stats
|
||||
to at the given interval. The driver must send stats at the given interval
|
||||
@@ -188,7 +188,7 @@ inside the running container. `ExecTask` is called for Consul script checks.
|
||||
|
||||
|
||||
[lxcdriver]: https://github.com/hashicorp/nomad-driver-lxc
|
||||
[DriverPlugin]: https://github.com/hashicorp/nomad/blob/v0.9.0-beta2/plugins/drivers/driver.go#L39-L57
|
||||
[DriverPlugin]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/drivers/driver.go#L39-L57
|
||||
[baseplugin]: /docs/internals/plugins/base.html
|
||||
[taskconfig]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskConfig
|
||||
[taskhandle]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskHandle
|
||||
|
||||
Reference in New Issue
Block a user