Adds a `-type` flag to the `volume init` command that generates an example
volume specification with only those fields relevant to dynamic host
volumes. This changeset also moves the string literals into uses of `go:embed`
Ref: https://github.com/hashicorp/nomad/pull/24479
The List Volumes API was originally written for CSI but assumed we'd have future
volume types, dispatched on a query parameter. Dynamic host volumes uses this,
but the resulting code has host volumes concerns comingled in the CSI volumes
endpoint. Refactor this so that we have a top-level `GET /v1/volumes` route that's
shared between CSI and DHV, and have it dispatch to the appropriate handler in
the type-specific endpoints.
Ref: https://github.com/hashicorp/nomad/pull/24479
When we add a Sentinel scope for dynamic host volumes, having a default `-scope`
value for `sentinel apply` risks accidentally adding policies for volumes to the
job scope. This would immediately prevent any job from being submitted. Forcing
the administrator to pass a `-scope` will prevent accidental misuse.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/2087
Ref: https://github.com/hashicorp/nomad/pull/24479
The create/register volume RPCs support a policy override flag for
soft-mandatory Sentinel policies, but the CLI and Go API were missing support
for it.
Also add support for Sentinel warnings to the Go API and CLI.
Ref: https://github.com/hashicorp/nomad/pull/24479
In #24528 we added monitoring to the CLI for dynamic host volume creation. But
when the volume's namespace is set by the volume specification instead of the
`-namespace` flag, the API client doesn't have the right namespace and gets a
404 when setting up the monitoring. The specification always overrides the
`-namespace` flag, so use that when available for all subsequent API calls.
Ref: https://github.com/hashicorp/nomad/pull/24479
Adds dynamic host volumes to argument autocomplete for the `volume status` and
`volume delete` commands. Adds flag autocompletion for those commands plus
`volume create`.
Ref: https://github.com/hashicorp/nomad/pull/24479
Most Nomad upsert RPCs accept a single object with the notable exception of
CSI. But in CSI we don't actually expose this to users except through the Go
API. It deeply complicates how we present errors to users, especially once
Sentinel policy enforcement enters the mix.
Refactor the `HostVolume.Create` and `HostVolume.Register` RPCs to take a single
volume instead of a slice of volumes.
Add a stub function for Enterprise policy enforcement. This requires splitting
out placement from the `createVolume` function so that we can ensure we've
completed placement before trying to enforce policy.
Ref: https://github.com/hashicorp/nomad/pull/24479
When making a request to create a dynamic host volumes, users can pass a node
pool and constraints instead of a specific node ID.
This changeset implements a node scheduling logic by instantiating a filter by
node pool and constraint checker borrowed from the scheduler package. Because
host volumes with the same name can't land on the same host, we don't need to
support `distinct_hosts`/`distinct_property`; this would be challenging anyways
without building out a much larger node iteration mechanism to keep track of
usage across multiple hosts.
Ref: https://github.com/hashicorp/nomad/pull/24479
* mkdir: HostVolumePluginMkdir: just creates a directory
* example-host-volume: HostVolumePluginExternal:
plugin script that does mkfs and mount loopback
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Add several validation steps in the create/register RPCs for dynamic host
volumes. We first check that submitted volumes are self-consistent (ex. max
capacity is more than min capacity), then that any updates we've made are
valid. And we validate against state: preventing claimed volumes from being
updated and preventing placement requests for nodes that don't exist.
Ref: https://github.com/hashicorp/nomad/issues/15489
This changeset implements the HTTP API endpoints for Dynamic Host Volumes.
The `GET /v1/volumes` endpoint is shared between CSI and DHV with a query
parameter for the type. In the interest of getting some working handlers
available for use in development (and minimizing the size of the diff to
review), this changeset doesn't do any sort of refactoring of how the existing
List Volumes CSI endpoint works. That will come in a later PR, as will the
corresponding `api` package updates we need to support the CLI.
Ref: https://hashicorp.atlassian.net/browse/NET-11549
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.
The new datapoints are:
- nomad.client.alloc_hook.prerun.success (counter)
- nomad.client.alloc_hook.prerun.failed (counter)
- nomad.client.alloc_hook.prerun.elapsed (sample)
- nomad.client.task_hook.prestart.success (counter)
- nomad.client.task_hook.prestart.failed (counter)
- nomad.client.task_hook.prestart.elapsed (sample)
The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.
Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
The retry_join logic was not allowing for retries to happen and
was exiting after the first failed discovery attempt. This change
fixes that behaviour and adds a test to ensure no further
regressions.
This PR adds Consul Template's executeTemplate function to the denylist by
default, in order to prevent accidental or malicious infinitely recursive
execution.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
A more comprehensive env.denylist that now includes more token, token file and
license variables.
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
* func: User url rules to scape non alphanumeric values in hcl variables
* docs: add changelog
* func: unscape flags before returning
* use JSON.stringify instead of bespoke value quoting to handle in-value-multi-line cases
---------
Co-authored-by: Phil Renaud <phil@riotindustries.com>
When the service client syncs to Consul, we accumulate service sync errors in a
multierror before reading all the local checks. If the API call to the local
checks fails, we either return that error or append it to the multierror and
return the set of errors. But `multierror.Error.Len()` doesn't nil-check, so we
need to do this ourselves.
I've also made a quick pass through the rest of the code base looking for
multierror `Len` method calls to see if we have this pattern elsewhere.
Fixes: https://github.com/hashicorp/nomad/issues/24512
this opens up dispatching parameterized jobs by systems
that do not allow modifying what http request body they send
e.g. these two things are equal:
POST '{"Payload": "'"$(base64 <<< "hello")"'"}' /v1/job/my-job/dispatch
POST 'hello' /v1/job/my-job/dispatch/payload
* cli: trim job init example jobspec
* cli: trim job init -connect example jobspec
---------
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
* detect ipv6 on "bridge" network and set
service.connect.sidecar_proxy.config.bind_address
for envoy to "::" instead of "0.0.0.0"
* allow users to set bind_address in jobspec
e.g. "" would defer to consul proxy-defaults
* caveat: tproxy still does not work, because
the CNI plugin does not configure ip6tables
When the local Consul agent receives a deregister request, it performs a
pre-flight check using the locally cached ACL token. The agent then sends the
request upstream to the Consul servers as part of anti-entropy, using its own
token. This requires that the token we use for deregistration is valid even
though that's not the token used to write to the Consul server.
There are several cases where the service identity token might no longer exist
at the time of deregistration:
* A race condition between the sync and destroying the allocation.
* Misconfiguration of the Consul auth method with a TTL.
* Out-of-band destruction of the token.
Additionally, Nomad's sync with Consul returns early if there are any errors,
which means that a single broken token can prevent any other service on the
Nomad agent from being registered or deregistered.
Update Nomad's sync with Consul to use the Nomad agent's own Consul token for
deregistration, regardless of which token the service was registered
with. Accumulate errors from the sync so that they no longer block
deregistration of other services.
Fixes: https://github.com/hashicorp/nomad/issues/20159
* jobspec: add a chown option to artifact block
This PR adds a boolean 'chown' field to the artifact block.
It indicates whether the Nomad client should chown the downloaded files
and directories to be owned by the task.user. This is useful for drivers
like raw_exec and exec2 which are subject to the host filesystem user
permissions structure. Before, these drivers might not be able to use or
manage the downloaded artifacts since they would be owned by the root
user on a typical Nomad client configuration.
* api: no need for pointer of chown field
We initialize this slice with a zeroed array and then append to it, which means
we then have to clean out the empty strings later. Initialize to the correct
capacity up front so there are no empty values.
Ref: https://github.com/hashicorp/nomad/pull/24104
In #24007 we merged new HCL files but they were missing copywrite headers
because the scan didn't run on this PR for some reason. I've already backported
this to the Enterprise branches.