When we implemented CSI, the types of the fields for access mode and attachment
mode on volume requests were defined with a prefix "CSI". This gets confusing
now that we have dynamic host volumes using the same fields. Fortunately the
original was a typedef on string, and the Go API in the `api` package just uses
strings directly, so we can change the name of the type without breaking
backwards compatibility for the msgpack wire format.
Update the names to `VolumeAccessMode` and `VolumeAttachmentMode`. Keep the CSI
and DHV specific value constant names for these fields (they aren't currently
1:1), so that we can easily differentiate in a given bit of code which values
are valid.
Ref: https://github.com/hashicorp/nomad/pull/24881#discussion_r1920702890
The default node configuration in the client should always set an empty
HostVolumes map. Otherwise callers can panic, e.g.,:
goroutine 179 [running]:
github.com/hashicorp/nomad/client/hostvolumemanager.UpdateVolumeMap({0x36042b0, 0xc000c62a80}, 0x0, {0xc000a802a0, 0xd}, 0xc000691940)
github.com/hashicorp/nomad/client/hostvolumemanager/volume_fingerprint.go:43 +0x1b2
github.com/hashicorp/nomad/client.(*Client).batchFirstFingerprints.func1({0xc000a802a0, 0xd}, 0xc000691940)
github.com/hashicorp/nomad/client/node_updater.go:54 +0xd7
github.com/hashicorp/nomad/client.(*batchNodeUpdates).batchHostVolumeUpdates(0xc000912608?, 0xc0009f2f88)
github.com/hashicorp/nomad/client/node_updater.go:417 +0x152
github.com/hashicorp/nomad/client.(*Client).batchFirstFingerprints(0xc000c2d188)
github.com/hashicorp/nomad/client/node_updater.go:53 +0x1c5
created by github.com/hashicorp/nomad/client.NewClient in goroutine 1
github.com/hashicorp/nomad/client/client.go:557 +0x2069
is a panic of the HVM when restarting a client that doesn't have any static
host volumes, but does have a dynamic host volume.
The output of `GetDynamicHostVolumes` is a slice but that slice is constructed
from iterating over a map and isn't sorted. Sort the output in the test to
eliminate a test flake.
When we register a volume without a plugin, we need to send a client RPC so that
the node fingerprint can be updated. The registered volume also needs to be
written to client state so that we can restore the fingerprint after a restart.
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.
When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail.
If the check fails, destroy the network namespace and try to recreate it from
scratch once. If that fails in the second pass, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).
This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.
Fixes: https://github.com/hashicorp/nomad/issues/24292
Ref: https://github.com/hashicorp/nomad/pull/24650
a node can have only one volume with a given name.
the scheduler prevents duplicates, but can only
do so after the server knows about the volume.
this prevents multiple concurrent creates being
called faster than the fingerprint/heartbeat interval.
users may still modify an existing volume only
if they set the `id` in the volume spec and
re-issue `nomad volume create`
if a *static* vol is added to config with a name
already being used by a dynamic volume, the
dynamic takes precedence, but log a warning.
Instead of a plugin `version` subcommand that responds with a string
(established in #24497), respond to a `fingerprint` command with a data
structure that we may extend in the future (such as plugin capabilities,
like size constraint support?). In the immediate term, it's still just the
version: `{"version": "0.0.1"}`
In addition to leaving the door open for future expansion, I think it will
also avoid false positives detecting executables that just happen to
respond to a `version` command.
This also reverses the ordering of the fingerprint string parts
from `plugins.host_volume.version.mkdir` (which aligned with CNI)
to `plugins.host_volume.mkdir.version` (makes more sense to me)
store dynamic host volume creations in client state,
so they can be "restored" on agent restart. restore works
by repeating the same Create operation as initial creation,
and expecting the plugin to be idempotent.
this is (potentially) especially important after host restarts,
which may have dropped mount points or such.
also ensure that volume ID is uuid-shaped so user-provided input
like `id = "../../../"` which is used as part of the target directory
can not find its way very far into the volume submission process
* mkdir: HostVolumePluginMkdir: just creates a directory
* example-host-volume: HostVolumePluginExternal:
plugin script that does mkfs and mount loopback
Co-authored-by: Tim Gross <tgross@hashicorp.com>
This changeset implements the RPC handlers for Dynamic Host Volumes, including
the plumbing needed to forward requests to clients. The client-side
implementation is stubbed and will be done under a separate PR.
Ref: https://hashicorp.atlassian.net/browse/NET-11549
Nomad sets a default port when resolving server addresses that don't have
one. When we get a "bare" IPv6 address without a port, we end up with an
unexpected error "too many colons in address" when we try to split the address
and host, because the standard library function expects IPv6 addresses to be
wrapped in brackets as recommended by RFC5952. User-configured addresses avoid
this problem by accepting IP address and port as separate configuration values,
but go-discover emits "bare" IPv6 addresses without a port in IPv6 environments.
Fix this by adding brackets to IPv6 addresses when we get the "too many colons"
error from the stdlib. This will still give erroneous results if the address
includes the port but is missing brackets, but there's no way to unambiguously
parse that address.
Ref: https://www.rfc-editor.org/rfc/rfc5952
Fixes: https://github.com/hashicorp/nomad/issues/24608
When a Nomad host reboots, the network namespace files in the tmpfs in
`/var/run` are wiped out. So when we restore allocations after a host reboot, we
need to be able to restore both the network namespace and the network
configuration. But because the netns is newly created and we need to run the CNI
plugins again, this create potential conflicts with the IPAM plugin which has
written state to persistent disk at `/var/lib/cni`. These IPs aren't the ones
advertised to Consul, so there's no particular reason to keep them around after
a host reboot because all virtual interfaces need to be recreated too.
Reconfigure the CNI bridge configuration to use `/var/run/cni` as its state
directory. We already expect this location to be created by CNI because the
netns files are hard-coded to be created there too in `libcni`.
Note this does not fix the problem described for Docker in #24292 because that
appears to be related to the netns itself being restored unexpectedly from
Docker's state.
Ref: https://github.com/hashicorp/nomad/issues/24292#issuecomment-2537078584
Ref: https://www.cni.dev/plugins/current/ipam/host-local/#files
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.
The new datapoints are:
- nomad.client.alloc_hook.prerun.success (counter)
- nomad.client.alloc_hook.prerun.failed (counter)
- nomad.client.alloc_hook.prerun.elapsed (sample)
- nomad.client.task_hook.prestart.success (counter)
- nomad.client.task_hook.prestart.failed (counter)
- nomad.client.task_hook.prestart.elapsed (sample)
The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.
Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
This PR adds Consul Template's executeTemplate function to the denylist by
default, in order to prevent accidental or malicious infinitely recursive
execution.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Fixes a bug in the AllocatedResources.Comparable method, where the scheduler
would only take into account the cpusets of the tasks in the largest lifecycle.
This could result in overlapping cgroup cpusets. Now we make the distinction
between reserved and fungible resources throughout the lifespan of the alloc.
In addition, added logging in case of future regressions thus not requiring
manual inspection of cgroup files.
When a task restarts, the Nomad client may need to rewrite the Consul token, but
it's created with permissions that prevent a non-root agent from writing to
it. While Nomad clients should be run as root (currently), it's harmless to
allow whatever user the Nomad agent is running as to be able to write to it, and
that's one less barrier to rootless Nomad.
Ref: https://github.com/hashicorp/nomad/issues/23859#issuecomment-2465757392
User of `nsutil` library should be able to do the following and for it
to work:
```
var errno syscall.Errno
if errors.As(err, &errno) {
if errno == unix.EBUSY { ... }
}
```
This commit fixes that issue.
When a Vault lease expires, it's revoked on the server and cannot be removed, so
this error should be treated as fatal.
The errors we get aren't wrapped by the Vault SDK, so unfortunately we have to
read the error messages and can't easily enumerate non-fatal error
messages (which might be bubbling up from the stdlib). I've audited the errors
currently used and have documented their source.
Ref 52ba156d47/vault/expiration.go (L1327)
Fixes: https://github.com/hashicorp/nomad/issues/23859
When multiple templates with api functions are included in a task, it's
possible for consul-template to re-render templates as it creates
watchers, overwriting render event data. This change uses event fields
that do not get overwritten, and only executes the change mode for
templates that were actually written to disk.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* connect: handle grpc_address as gosockaddr/template string
This PR fixes a bug where the consul.grpc_address could not be set using
a go-sockaddr/template string. This was inconsistent with how we do accept
such strings for consul.address values.
* add changelog
In #10193 we introduced a testing helper that spins up a client RPC server
without the rest of the client operations so that we can make server-side client
RPC tests lighter. But this wasn't actually ever wired up to the intended
target. While working on Dynamic Host Volumes I noticed that this would be
useful for RPC tests.
This changeset fixes some bugs in the helper that arose from client code drift,
and makes it used by the client RPC tests for CSI. This will also get used for
the DHV RPC tests.
Ref: https://github.com/hashicorp/nomad/pull/10193
* jobspec: add a chown option to artifact block
This PR adds a boolean 'chown' field to the artifact block.
It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.
* api: no need for pointer of chown field
When using the Client FS APIs, we check to ensure that reads don't traverse into
the allocation's secret dir and private dir. But this check can be bypassed on
case-insensitive file systems (ex. Windows, macOS, and Linux with obscure ext4
options enabled). This allows a user with `read-fs` permissions but not
`alloc-exec` permissions to read from the secrets dir.
This changeset updates the check so that it's case-insensitive. This risks false
positives for escape (see linked Go issue), but only if a task without
filesystem isolation deliberately writes into the task working directory to do
so, which is a fail-safe failure mode.
Ref: https://github.com/golang/go/issues/18358
Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>
This PR replaces fsouza/go-dockerclient 3rd party docker client library with
docker's official SDK.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
When we introduced change_mode=script to templates, we passed the driver handle
down into the template manager so we could call its `Exec` method directly. But
the lifecycle of the driver handle is managed by the taskrunner and isn't
available when the template manager is first created. This has led to a series
of patches trying to fixup the behavior (#15915, #15192, #23663, #23917). Part
of the challenge in getting this right is using an interface to avoid the
circular import of the driver handle.
But the taskrunner already has a way to deal with this problem using a "lazy
handle". The other template change modes already use this indirectly through the
`Lifecycle` interface. Change the driver handle `Exec` call in the template
manager to a new `Lifecycle.Exec` call that reuses the existing behavior. This
eliminates the need for the template manager to know anything at all about the
handle state.
Fixes: https://github.com/hashicorp/nomad/issues/24051
The landlock fingerprint test assumes there's no version of the landlock API
>3. Update the test assertion to allow for the current v4 and any future
versions.
when a CNI result includes an IPv6 address,
set it on the alloc's NetworkStatus for reference.
e.g.:
$ nomad alloc status -json 3dca | jq '.NetworkStatus'
{
"Address": "172.26.64.14",
"AddressIPv6": "fd00:a110:c8::b",
"DNS": null,
"InterfaceName": "eth0"
}