From dfa096127256e85075ffc6f8d5cd6456ff0393de Mon Sep 17 00:00:00 2001 From: Tim Gross Date: Wed, 4 Nov 2020 09:59:19 -0500 Subject: [PATCH] docs: internals documentation for alloc filesystem (#9195) We recently added documentation disambiguating the terminology of the allocation/task working directories. This changeset adds an internals document that describes in more detail exactly what does into the allocation working directory, how this interacts with the filesystem isolation provided by task drivers, and how this interacts with features like `artifact` and `template`. Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com> --- website/data/docs-navigation.js | 3 +- website/pages/docs/internals/filesystem.mdx | 465 ++++++++++++++++++ .../pages/docs/job-specification/artifact.mdx | 5 +- .../pages/docs/job-specification/template.mdx | 6 +- website/pages/docs/runtime/environment.mdx | 9 +- 5 files changed, 481 insertions(+), 7 deletions(-) create mode 100644 website/pages/docs/internals/filesystem.mdx diff --git a/website/data/docs-navigation.js b/website/data/docs-navigation.js index d0b3e5480..024801488 100644 --- a/website/data/docs-navigation.js +++ b/website/data/docs-navigation.js @@ -49,8 +49,9 @@ export default [ content: ['scheduling', 'preemption'], }, 'consensus', + 'filesystem', 'gossip', - 'security', + 'security' ], }, { diff --git a/website/pages/docs/internals/filesystem.mdx b/website/pages/docs/internals/filesystem.mdx new file mode 100644 index 000000000..d9975802b --- /dev/null +++ b/website/pages/docs/internals/filesystem.mdx @@ -0,0 +1,465 @@ +--- +layout: docs +page_title: Filesystem +sidebar_title: Filesystem +description: |- + Nomad creates an allocation working directory for every allocation. Learn what + goes into the working directory and how it interacts with Nomad task drivers. +--- + +# Filesystem + +Nomad creates a working directory for each allocation on a client. This +directory can be found in the Nomad [`data_dir`] at +`./allocs/«alloc_id»`. The allocation working directory is where Nomad +creates task directories and directories shared between tasks, write logs for +tasks, and downloads artifacts or templates. + +An allocation with two tasks (named `task1` and `task2`) will have an +allocation directory like the one below. + +```shell-session +. +├── alloc +│ ├── data +│ ├── logs +│ │ ├── task1.stderr.0 +│ │ ├── task1.stdout.0 +│ │ ├── task2.stderr.0 +│ │ └── task2.stdout.0 +│ └── tmp +├── task1 +│ ├── local +│ ├── secrets +│ └── tmp +└── task2 + ├── local + ├── secrets + └── tmp +``` + +- **alloc/**: This directory is shared across all tasks in an allocation and + can be used to store data that needs to be used by multiple tasks, such as a + log shipper. This is the directory that's provided to the task as the + `NOMAD_ALLOC_DIR`. Note that this `alloc/` directory is not the same as the + "allocation working directory", which is the top-level directory. All tasks + in a task group can read and write to the `alloc/` directory. Within the + `alloc/` directory are three standard directories: + + - **alloc/data/**: This directory is the location used by the + [`ephemeral_disk`] stanza for shared data. + + - **alloc/logs/**: This directory is the location of the log files for every + task within an allocation. The `nomad alloc logs` command streams these + files to your terminal. + + - **alloc/tmp/**: A temporary directory used as scratch space by task drivers. + +- **«taskname»**: Each task has a **task working directory** with the same name as + the task. Tasks in a task group can't read each other's task working + directory. Depending on the task driver's [filesystem isolation mode], a + task may not be able to access the task working directory. Within the + `task/` directory are three standard directories: + + - **«taskname»/local/**: This directory is the location provided to the task as the + `NOMAD_TASK_DIR`. Note this is not the same as the "task working + directory". This directory is private to the task. + + - **«taskname»/secrets/**: This directory is the location provided to the task as + `NOMAD_SECRETS_DIR`. The contents of files in this directory cannot be read + the the `nomad alloc fs` command. It can be used to store secret data that + should not be visible outside the task. + + - **«taskname»/tmp/**: A temporary directory used as scratch space by task drivers. + +The allocation working directory is the directory you see when using the +`nomad alloc fs` command. If you were to run `nomad alloc fs` against the +allocation that made the working directory shown above, you'd see the +following: + +```shell-session +$ nomad alloc fs c0b2245f +Mode Size Modified Time Name +drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z alloc/ +drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z task1/ +drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z task2/ + +$ nomad alloc fs c0b2245f alloc/ +Mode Size Modified Time Name +drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z data/ +drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z logs/ +drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/ + +$ nomad alloc fs c0b2245f task1/ +Mode Size Modified Time Name +drwxrwxrwx 4.0 KiB 2020-10-27T18:00:33Z local/ +drwxrwxrwx 60 B 2020-10-27T18:00:32Z secrets/ +dtrwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/ +``` + +## Task Drivers and Filesystem Isolation Modes + +Depending on the task driver, the task's working directory may also be the +root directory for the running task. This is determined by the task driver's +[filesystem isolation capability]. + +### `image` isolation + +Task drivers like `docker` or `qemu` use `image` isolation, where the task +driver isolates task filesystems as machine images. These filesystems are +owned by the task driver's external process and not by Nomad itself. These +filesystems will not typically be found anywhere in the allocation working +directory. For example, Docker containers will have their overlay filesystem +unpacked to `/var/run/docker/containerd/«container_id»` by default. + +Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and +`NOMAD_SECRETS_DIR` to tasks with `image` isolation, typically by +bind-mounting them to the task driver's filesystem. + +You can see an example of `image` isolation by running the following minimal +job: + +```hcl +job "example" { + datacenters = ["dc1"] + + task "task1" { + driver = "docker" + + config { + image = "redis:6.0" + } + } +} +``` + +If you look at the allocation working directory from the host, you'll see a +minimal filesystem tree: + +```shell-session +. +├── alloc +│ ├── data +│ ├── logs +│ │ ├── task1.stderr.0 +│ │ └── task1.stdout.0 +│ └── tmp +└── task1 + ├── local + ├── secrets + └── tmp +``` + +The `nomad alloc fs` command shows the same bare directory tree: + +```shell-session +$ nomad alloc fs b0686b27 +Mode Size Modified Time Name +drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z alloc/ +drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z task1/ + +$ nomad alloc fs b0686b27 task1 +Mode Size Modified Time Name +drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z local/ +drwxrwxrwx 60 B 2020-10-27T18:51:54Z secrets/ +dtrwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z tmp/ + +$ nomad alloc fs b0686b27 task1/local +Mode Size Modified Time Name +``` + +If you inspect the Docker container that's created, you'll see three +directories bind-mounted into the container: + +```shell-session +$ docker inspect 32e | jq '.[0].HostConfig.Binds' +[ + "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/alloc:/alloc", + "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/local:/local", + "/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/secrets:/secrets" +] +``` + +The root filesystem inside the container can see these three mounts, along +with the rest of the container filesystem: + +```shell-session +$ docker exec -it 32e /bin/sh +# ls / +alloc boot dev home lib64 media opt root sbin srv tmp var +bin data etc lib local mnt proc run secrets sys usr +``` + +Note that because the three directories are bind-mounted into the container +filesystem, nothing written outside those three directories elsewhere in the +allocation working directory will be accessible inside the container. This +means templates, artifacts, and dispatch payloads for tasks with `image` +isolation must be written into the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, or +`NOMAD_SECRETS_DIR`. + +To work around this limitation, you can use the task driver's mounting +capabilities to mount one of the three directories to another location in the +task. For example, with the Docker driver you can use the driver's `mounts` +block to bind a secret written by a `template` block to the +`NOMAD_SECRETS_DIR` into a configuration directory elsewhere in the task: + +```hcl +job "example" { + datacenters = ["dc1"] + + task "task1" { + driver = "docker" + + config { + image = "redis:6.0" + mounts = [{ + type = "bind" + source = "secrets" + target = "/etc/redis.d" + readonly = true + }] + + template { + destination = "${NOMAD_SECRETS_DIR}/redis.conf" + data = <)` - Specifies the location where the resulting template should be rendered, relative to the [task working directory]. Only drivers without filesystem isolation (ex. `raw_exec`) or - that buiold a chroot in the task working directory (ex. `exec`) can render + that build a chroot in the task working directory (ex. `exec`) can render templates outside of the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, or - `NOMAD_SECRETS_DIR`. + `NOMAD_SECRETS_DIR`. For more details on how `destination` interacts with + task drivers, see the [Filesystem internals] documentation. - `env` `(bool: false)` - Specifies the template should be read back in as environment variables for the task. ([See below](#environment-variables)) @@ -385,3 +386,4 @@ options](/docs/configuration/client#options): [nodevars]: /docs/runtime/interpolation#interpreted_node_vars 'Nomad Node Variables' [go-envparse]: https://github.com/hashicorp/go-envparse#readme 'The go-envparse Readme' [task working directory]: /docs/runtime/environment#task-directories 'Task Directories' +[Filesystem internals]: /docs/internals/filesystem#templates-artifacts-and-dispatch-payloads diff --git a/website/pages/docs/runtime/environment.mdx b/website/pages/docs/runtime/environment.mdx index a328d67f9..e74269dba 100644 --- a/website/pages/docs/runtime/environment.mdx +++ b/website/pages/docs/runtime/environment.mdx @@ -25,9 +25,9 @@ environment variable names such as `NOMAD_ADDR__