mirror of
https://github.com/kemko/nomad.git
synced 2026-01-03 08:55:43 +03:00
* create plugin author guide; remove concepts/plugins * style guide; update links * update cni redirect * move host-volume plugin to /plugins/. Add arch host volume content. * Apply Jeff's style guide updates Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Create Base plugin API section, link to BasePlugin interface --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
89 lines
4.4 KiB
Plaintext
89 lines
4.4 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Create a device plugin for Nomad
|
|
description: |-
|
|
Learn how to create a Nomad device plugin so you can schedule workload tasks with other devices, such as GPUs. Review device plugin lifecycle and the device plugin API functions that you must implement in your device plugin.
|
|
---
|
|
|
|
# Create a device plugin
|
|
|
|
This page provides conceptual information for creating a device driver plugin to extend Nomad's workload execution functionality.
|
|
|
|
Nomad has built-in support for scheduling compute resources such as CPU, memory,
|
|
and networking. Use Nomad device driver plugins to support scheduling tasks with
|
|
other devices, such as GPUs. Device driver plugins are responsible for
|
|
fingerprinting these devices and working with the Nomad client to make them
|
|
available to assigned tasks.
|
|
|
|
For a real world example of a Nomad device plugin implementation, refer to the
|
|
[Nvidia GPU plugin](https://github.com/hashicorp/nomad-device-nvidia).
|
|
|
|
## Authoring a device plugin
|
|
|
|
Authoring a device plugin in Nomad consists of implementing the
|
|
[BasePlugin][base-plugin] and [DevicePlugin][deviceplugin] interfaces alongside
|
|
a main package to launch the plugin.
|
|
|
|
The [nomad-skeleton-device-plugin][skeletonproject] exists to help bootstrap
|
|
the development of new device plugins. It provides most of the boilerplate
|
|
necessary for a device plugin, along with detailed comments.
|
|
|
|
### Lifecycle and state
|
|
|
|
A device plugin is long-lived. Nomad ensures that one instance of the plugin
|
|
is running. If the plugin crashes or otherwise terminates, Nomad launches
|
|
another instance of it.
|
|
|
|
However, unlike [task driver plugins](/nomad/plugins/author/task-driver), device
|
|
plugins do not currently have an interface for persisting state to the Nomad
|
|
client. Instead, the device plugin API emphasizes fingerprinting devices and
|
|
reporting their status. After helping to provision a task with a scheduled
|
|
device, a device plugin does not have any responsibility, or ability, to monitor
|
|
the task.
|
|
|
|
## Base plugin API
|
|
|
|
@include 'plugins/base.mdx'
|
|
|
|
## Device driver plugin API
|
|
|
|
### `Fingerprint(context.Context) (<-chan *FingerprintResponse, error)`
|
|
|
|
The client calls the `Fingerprint` [function][fingerprintfn] when the plugin is
|
|
started. This function allows the plugin to provide Nomad with a list of
|
|
discovered devices, along with their attributes, for the purpose of scheduling
|
|
workloads using devices. The channel returned should immediately send an initial
|
|
[`FingerprintResponse`][fingerprintresponse], then send periodic updates at an
|
|
appropriate interval until the context is canceled.
|
|
|
|
Each fingerprint response consists of either an error or a list of device
|
|
groups. A _device group_ is a list of detected devices that are identical for the
|
|
purpose of scheduling, which means they have identical attributes.
|
|
|
|
### `Stats(context.Context, time.Duration) (<-chan *StatsResponse, error)`
|
|
|
|
The `Stats` [function][statsfn] returns a channel on which the plugin should
|
|
emit device statistics, at the specified interval, until either an error is
|
|
encountered or the specified context is cancelled. The `StatsResponse` object
|
|
allows [dimensioned][dimensioned] statistics to be returned for each device in a device group.
|
|
|
|
### `Reserve(deviceIDs []string) (*ContainerReservation, error)`
|
|
|
|
The `Reserve` [function][reservefn] accepts a list of device IDs and returns the
|
|
information necessary for the client to make those devices available to a task.
|
|
Currently, the `ContainerReservation` object allows the plugin to specify
|
|
environment variables for the task, as well as a list of host devices and files
|
|
to be mounted into the task's filesystem. Any orchestration required to prepare
|
|
the device for use should also be performed in this function.
|
|
|
|
@include 'plugins/hcl-specifications.mdx'
|
|
|
|
[base-plugin]: https://github.com/hashicorp/nomad/blob/main/plugins/base/base.go#L17
|
|
[deviceplugin]: https://github.com/hashicorp/nomad/blob/main/plugins/device/device.go#L28
|
|
[skeletonproject]: https://github.com/hashicorp/nomad-skeleton-device-plugin
|
|
[fingerprintresponse]:https://github.com/hashicorp/nomad/blob/main/plugins/device/device.go#L45
|
|
[fingerprintfn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/main/device/device.go#L162
|
|
[statsfn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/main/device/device.go#L172
|
|
[reservefn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/main/device/device.go#L192
|
|
[dimensioned]: https://github.com/hashicorp/nomad/blob/main/plugins/shared/structs/stats.go#L37
|