docs: update nvidia driver documentation

notably: - name of the compiled binary is 'nomad-device-nvidia', not 'nvidia-gpu' - link to Nvidia docs for installing the container runtime toolkit - list docker v19.03 as minimum version, to track with nvidia's new container runtime toolkit
2026-01-06 18:35:44 +03:00 · 2022-05-02 09:11:05 -05:00
parent dfda28daab
commit d352ab25c4
1 changed files with 33 additions and 82 deletions
--- a/website/content/plugins/devices/nvidia.mdx
+++ b/website/content/plugins/devices/nvidia.mdx
@@ -6,7 +6,7 @@ description: The Nvidia Device Plugin detects and makes Nvidia devices available

 # Nvidia GPU Device Plugin

-Name: `nvidia-gpu`
+Name: `nomad-device-nvidia`

 The Nvidia device plugin is used to expose Nvidia GPUs to Nomad.

@@ -97,23 +97,29 @@ documentation](https://github.com/NVIDIA/nvidia-container-runtime#environment-va

 ## Installation Requirements

-In order to use the `nvidia-gpu` the following prerequisites must be met:
+In order to use the `nomad-device-nvidia` device driver the following prerequisites must be met:

 1. GNU/Linux x86_64 with kernel version > 3.10
 2. NVIDIA GPU with Architecture > Fermi (2.1)
 3. NVIDIA drivers >= 340.29 with binary `nvidia-smi`
+4. Docker v19.03+

-### Docker Driver Requirements
+### Container Toolkit Installation
+
+Follow the [NVIDIA Container Toolkit installation instructions][nvidia_container_toolkit]
+from Nvidia to prepare a machine to use docker containers with Nvidia GPUs. You should
+be able to run this simple command to test your environment and produce meaningful
+output.
+
+```shell
+docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
+```

-The Nvidia driver plugin currently only supports the older v1.0 version of the
-Docker driver provided by Nvidia. In order to use the Nvidia driver plugin with
-the Docker driver, please follow the installation instructions for
-[`nvidia-container-runtime`](https://github.com/nvidia/nvidia-container-runtime#installation).

 ## Plugin Configuration

 ```hcl
-plugin "nvidia-gpu" {
+plugin "nomad-device-nvidia" {
  config {
    enabled            = true
    ignored_gpu_ids    = ["GPU-fef8089b", "GPU-ac81e44d"]
@@ -122,7 +128,7 @@ plugin "nvidia-gpu" {
 }
 ```

-The `nvidia-gpu` device plugin supports the following configuration in the agent
+The `nomad-device-nvidia` device plugin supports the following configuration in the agent
 config:

 - `enabled` `(bool: true)` - Control whether the plugin should be enabled and running.
@@ -133,17 +139,20 @@ config:
 - `fingerprint_period` `(string: "1m")` - The period in which to fingerprint for
  device changes.

-## Restrictions
+## Limitations

 The Nvidia integration only works with drivers who natively integrate with
 Nvidia's [container runtime
 library](https://github.com/NVIDIA/libnvidia-container).

-Nomad has tested support with the [`docker` driver][docker-driver] and plans to
-bring support to the built-in [`exec`][exec-driver] and [`java`][java-driver]
-drivers. Support for [`lxc`][lxc-driver] should be possible by installing the
-[Nvidia hook](https://github.com/lxc/lxc/blob/master/hooks/nvidia) but is not
-tested or documented by Nomad.
+Nomad has tested support with the [`docker` driver][docker-driver]. Support for
+[`lxc`][lxc-driver] should be possible by installing the [Nvidia hook][nvidia_hook]
+but is not tested or documented by Nomad.
+
+## Source Code & Compiled Binaries
+
+The source code for this plugin can be found at hashicorp/nomad-device-nvidia. You
+can also find pre-built binaries on the [releases page][nvidia_plugin_download].

 ## Examples

@@ -151,68 +160,19 @@ Inspect a node with a GPU:

 ```shell-session
 $ nomad node status 4d46e59f
-ID            = 4d46e59f
-Name          = nomad
-Class         = <none>
-DC            = dc1
-Drain         = false
-Eligibility   = eligible
-Status        = ready
-Uptime        = 19m43s
-Driver Status = docker,mock_driver,raw_exec

-Node Events
-Time                  Subsystem  Message
-2019-01-23T18:25:18Z  Cluster    Node registered
-
-Allocated Resources
-CPU          Memory      Disk
-0/15576 MHz  0 B/55 GiB  0 B/28 GiB
-
-Allocation Resource Utilization
-CPU          Memory
-0/15576 MHz  0 B/55 GiB
-
-Host Resource Utilization
-CPU             Memory          Disk
-2674/15576 MHz  1.5 GiB/55 GiB  3.0 GiB/31 GiB
+// ...TRUNCATED...

 Device Resource Utilization
 nvidia/gpu/Tesla K80[GPU-e1f6f4f1-1ea5-7b9d-5f03-338a9dc32416]  0 / 11441 MiB
-
-Allocations
-No allocations placed
 ```

 Display detailed statistics on a node with a GPU:

 ```shell-session
 $ nomad node status -stats 4d46e59f
-ID            = 4d46e59f
-Name          = nomad
-Class         = <none>
-DC            = dc1
-Drain         = false
-Eligibility   = eligible
-Status        = ready
-Uptime        = 19m59s
-Driver Status = docker,mock_driver,raw_exec

-Node Events
-Time                  Subsystem  Message
-2019-01-23T18:25:18Z  Cluster    Node registered
-
-Allocated Resources
-CPU          Memory      Disk
-0/15576 MHz  0 B/55 GiB  0 B/28 GiB
-
-Allocation Resource Utilization
-CPU          Memory
-0/15576 MHz  0 B/55 GiB
-
-Host Resource Utilization
-CPU             Memory          Disk
-2673/15576 MHz  1.5 GiB/55 GiB  3.0 GiB/31 GiB
+// ...TRUNCATED...

 Device Resource Utilization
 nvidia/gpu/Tesla K80[GPU-e1f6f4f1-1ea5-7b9d-5f03-338a9dc32416]  0 / 11441 MiB
@@ -232,9 +192,6 @@ Memory state        = 0 / 11441 MiB
 Memory utilization  = 0 %
 Power usage         = 37 / 149 W
 Temperature         = 34 C
-
-Allocations
-No allocations placed
 ```

 Run the following example job to see that that the GPU was mounted in the
@@ -250,7 +207,7 @@ job "gpu-test" {
      driver = "docker"

      config {
-        image = "nvidia/cuda:9.0-base"
+        image = "nvidia/cuda:11.0-base"
        command = "nvidia-smi"
      }

@@ -280,18 +237,8 @@ $ nomad run example.nomad
 ==> Evaluation "21bd7584" finished with status "complete"

 $ nomad alloc status d250baed
-ID                  = d250baed
-Eval ID             = 21bd7584
-Name                = gpu-test.smi[0]
-Node ID             = 4d46e59f
-Job ID              = example
-Job Version         = 0
-Client Status       = complete
-Client Description  = All tasks have completed
-Desired Status      = run
-Desired Description = <none>
-Created             = 7s ago
-Modified            = 2s ago
+
+// ...TRUNCATED...

 Task "smi" is "dead"
 Task Resources
@@ -334,10 +281,14 @@ Wed Jan 23 18:25:32 2019
 +-----------------------------------------------------------------------------+
 ```

+
 [docker-driver]: /docs/drivers/docker 'Nomad docker Driver'
 [exec-driver]: /docs/drivers/exec 'Nomad exec Driver'
 [java-driver]: /docs/drivers/java 'Nomad java Driver'
 [lxc-driver]: /plugins/drivers/community/lxc 'Nomad lxc Driver'
 [`plugin`]: /docs/configuration/plugin
 [`plugin_dir`]: /docs/configuration#plugin_dir
+[nvidia_hook]: https://github.com/lxc/lxc/blob/master/hooks/nvidia
 [nvidia_plugin_download]: https://releases.hashicorp.com/nomad-device-nvidia/
+[nvidia_container_toolkit]: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
+[source]: https://github.com/hashicorp/nomad-device-nvidia