Commit Graph

3882 Commits

Author SHA1 Message Date
Tim Gross
09eb473189 dynamic host volumes: set status unavailable on failed restore (#24962)
When a client restarts but can't restore a volume (ex. the plugin is now
missing), it's removed from the node fingerprint. So we won't allow future
scheduling of the volume, but we were not updating the volume state field to
report this reasoning to operators. Make debugging easier and the state field
more meaningful by setting the value to "unavailable".

Also, remove the unused "deleted" field. We did not implement soft deletes and
aren't planning on it for Nomad 1.10.0.

Ref: https://hashicorp.atlassian.net/browse/NET-11551
2025-01-27 16:35:53 -05:00
Daniel Bennett
49c147bcd7 dynamic host volumes: change env vars, fixup auto-delete (#24943)
* plugin env: DHV_HOST_PATH->DHV_VOLUMES_DIR
* client config: host_volumes_dir
* plugin env: add namespace+nodepool
* only auto-delete after error saving client state
  on *initial* create
2025-01-27 10:36:53 -06:00
Tim Gross
7add04eb0f refactor: volume request modes to be generic between DHV/CSI (#24896)
When we implemented CSI, the types of the fields for access mode and attachment
mode on volume requests were defined with a prefix "CSI". This gets confusing
now that we have dynamic host volumes using the same fields. Fortunately the
original was a typedef on string, and the Go API in the `api` package just uses
strings directly, so we can change the name of the type without breaking
backwards compatibility for the msgpack wire format.

Update the names to `VolumeAccessMode` and `VolumeAttachmentMode`. Keep the CSI
and DHV specific value constant names for these fields (they aren't currently
1:1), so that we can easily differentiate in a given bit of code which values
are valid.

Ref: https://github.com/hashicorp/nomad/pull/24881#discussion_r1920702890
2025-01-24 10:37:48 -05:00
Tim Gross
3e7adba8f0 volume spec: fix access_mode field in examples (#24911)
The `volume init` command creates example volume specifications. But one of the
values for `capability.access_mode` is not a valid value. Correct the example to
match the validation logic.
2025-01-22 09:30:49 -05:00
Michael Schurter
63dacd2d6e update vault token warning from 1.9->1.10 (#24884)
Fixes #24847
2025-01-17 10:56:06 -08:00
Tim Gross
96e539ee87 dynamic host volumes quotas (#24871)
Allow users to configure a host volumes quota in MB. This will be enforced at
the time of provisioning via create/register RPCs. This changeset is the CE
version of ENT/2114.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2114
Ref: https://hashicorp.atlassian.net/browse/NET-11549
2025-01-17 11:41:56 -05:00
James Rasell
63ea13be77 agent: Ensure logger set up method is public. (#24886)
This is needed by a Nomad Enterprise code path.
2025-01-17 13:47:06 +00:00
James Rasell
753f752cdd agent: remove unused log filter and unrequired library. (#24873)
The Nomad agent used a log filter to ensure logs were written at
the expected level. Since the use of hclog this is not required,
as hclog acts as the gate keeper and filter for logging. All log
writers accept messages from hclog which has already done the
filtering.
2025-01-17 07:51:27 +00:00
James Rasell
1ae9785f9b agent: Fix a bug where all syslog lines are notice when using JSON (#24865)
The agent syslog write handler was unable to handle JSON log lines
correctly, meaning all syslog entries when using JSON log format
showed as NOTICE level.

This change adds a new handler to the Nomad agent which can parse
JSON log lines and correctly understand the expected log level
entry.

The change also removes the use of a filter from the default log
format handler. This is not needed as the logs are fed into the
syslog handler via hclog, which is responsible for level
filtering.
2025-01-16 07:23:08 +00:00
Tim Gross
46bd0b1716 dynamic host volume: set default capability (#24857)
We can reduce the amount of volume specification configuration many users will
need by setting a default capability on a dynamic host volume if none is
set. The default capability will allow using the volume in read/write mode on
its node, with no further restrictions except those that might be set in the
jobspec.
2025-01-15 14:07:07 -05:00
James Rasell
8d201a82fd agent: Fixed a bug where syslog error messages marked as notice. (#24820)
The mapping between Nomad log level identifiers and syslog
priorities did not handle the error level string correctly.
2025-01-15 08:02:53 +00:00
hc-github-team-nomad-core
b40200cefd Generate files for 1.9.5 release 2025-01-14 12:31:18 -08:00
Tim Gross
3a11a0b1e1 quotas: refactor storage limit specification (#24785)
In anticipation of having quotas for dynamic host volumes, we want the user
experience of the storage limits to feel integrated with the other resource
limits. This is currently prevented by reusing the `Resources` type instead of
having a specific type for `QuotaResources`.

Update the quota limit/usage types to use a `QuotaResources` that includes a new
storage resources quota block. The wire format for the two types are compatible
such that we can migrate the existing variables limit in the FSM.

Also fixes improper parallelism in the quota init test where we change working
directory to avoid file write conflicts but this breaks when multiple tests are
executed in the same process.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2096
2025-01-13 09:25:00 -05:00
Tim Gross
4a65b21aab dynamic host volumes: send register to client for fingerprint (#24802)
When we register a volume without a plugin, we need to send a client RPC so that
the node fingerprint can be updated. The registered volume also needs to be
written to client state so that we can restore the fingerprint after a restart.

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2025-01-08 16:58:58 -05:00
Seth Hoenig
2bfe817721 Post 1.9.4 release (#24811)
* Generate files for 1.9.4 release

* Prepare for next release

* Merge release 1.9.4 files

---------

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2025-01-08 09:36:22 -06:00
Tim Gross
024c504a1e dynamic host volumes: require node ID on register (#24795)
When registering a host volume created out-of-band, the volume will have been
created on a specific node. Require the node ID field to be set.

Ref: https://github.com/hashicorp/nomad/pull/24789#discussion_r1904690799
2025-01-07 11:24:45 -05:00
Piotr Kazmierczak
0906f788f0 keyring: warn if removing a key that was used for encrypting variables (#24766)
Adds an additional check in the Keyring.Delete RPC to make sure we're not
trying to delete a key that's been used to encrypt a variable. It also adds a
-force flag for the CLI/API to sidestep that check.
2025-01-07 10:15:02 +01:00
Daniel Bennett
459453917e dynamic host volumes: client-side tests, comments, tidying (#24747) 2025-01-06 13:20:07 -06:00
Charlie Voiselle
30ab8897d2 deps: Switch from mitchellh/cli to hashicorp/cli (#19321)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2024-12-19 15:41:11 +00:00
Piotr Kazmierczak
967addec48 stateful deployments: add corrections to API structs and methods (#24700)
This changeset includes changes accidentally left out from 24641.
2024-12-19 09:25:54 -05:00
Tim Gross
fd05e461dd dynamic host volumes: add -type flag to volume init (#24667)
Adds a `-type` flag to the `volume init` command that generates an example
volume specification with only those fields relevant to dynamic host
volumes. This changeset also moves the string literals into uses of `go:embed`

Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Tim Gross
76641c8081 dynamic host volumes: refactor HTTP routes for volumes list dispatch (#24612)
The List Volumes API was originally written for CSI but assumed we'd have future
volume types, dispatched on a query parameter. Dynamic host volumes uses this,
but the resulting code has host volumes concerns comingled in the CSI volumes
endpoint. Refactor this so that we have a top-level `GET /v1/volumes` route that's
shared between CSI and DHV, and have it dispatch to the appropriate handler in
the type-specific endpoints.

Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Daniel Bennett
5826e92671 dynamic host volumes: delete by single volume ID (#24606)
string instead of []string
2024-12-19 09:25:54 -05:00
Tim Gross
787fbbe671 sentinel: remove default scope for Sentinel apply command (#24601)
When we add a Sentinel scope for dynamic host volumes, having a default `-scope`
value for `sentinel apply` risks accidentally adding policies for volumes to the
job scope. This would immediately prevent any job from being submitted. Forcing
the administrator to pass a `-scope` will prevent accidental misuse.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2087
Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Tim Gross
d700538921 dynamic host volumes: Sentinel improvements for CLI (#24592)
The create/register volume RPCs support a policy override flag for
soft-mandatory Sentinel policies, but the CLI and Go API were missing support
for it.

Also add support for Sentinel warnings to the Go API and CLI.

Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Daniel Bennett
46a39560bb dynamic host volumes: fingerprint client plugins (#24589) 2024-12-19 09:25:54 -05:00
Tim Gross
df258ac02a dynamic host volumes: set namespace from volume spec when monitoring (#24586)
In #24528 we added monitoring to the CLI for dynamic host volume creation. But
when the volume's namespace is set by the volume specification instead of the
`-namespace` flag, the API client doesn't have the right namespace and gets a
404 when setting up the monitoring. The specification always overrides the
`-namespace` flag, so use that when available for all subsequent API calls.

Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Tim Gross
e3864a5f4a dynamic host volumes: autocomplete for CLI (#24533)
Adds dynamic host volumes to argument autocomplete for the `volume status` and
`volume delete` commands. Adds flag autocompletion for those commands plus
`volume create`.

Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Tim Gross
d1352b285d dynamic host volumes: Enterprise stubs and refactor API (#24545)
Most Nomad upsert RPCs accept a single object with the notable exception of
CSI. But in CSI we don't actually expose this to users except through the Go
API. It deeply complicates how we present errors to users, especially once
Sentinel policy enforcement enters the mix.

Refactor the `HostVolume.Create` and `HostVolume.Register` RPCs to take a single
volume instead of a slice of volumes.

Add a stub function for Enterprise policy enforcement. This requires splitting
out placement from the `createVolume` function so that we can ensure we've
completed placement before trying to enforce policy.

Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Tim Gross
298460dcd9 dynamic host volumes: monitor readiness from CLI (#24528)
When creating a dynamic host volumes, set up an optional monitor that waits for
the node to fingerprint the volume as healthy.

Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Tim Gross
bbf49a9050 dynamic host volumes: node selection via constraints (#24518)
When making a request to create a dynamic host volumes, users can pass a node
pool and constraints instead of a specific node ID.

This changeset implements a node scheduling logic by instantiating a filter by
node pool and constraint checker borrowed from the scheduler package. Because
host volumes with the same name can't land on the same host, we don't need to
support `distinct_hosts`/`distinct_property`; this would be challenging anyways
without building out a much larger node iteration mechanism to keep track of
usage across multiple hosts.

Ref: https://github.com/hashicorp/nomad/pull/24479
2024-12-19 09:25:54 -05:00
Daniel Bennett
c2dd97dee7 HostVolumePlugin interface and two implementations (#24497)
* mkdir: HostVolumePluginMkdir: just creates a directory
* example-host-volume: HostVolumePluginExternal:
  plugin script that does mkfs and mount loopback

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-12-19 09:25:54 -05:00
Tim Gross
10a5f4861f dynamic host volumes: create/register RPC validation
Add several validation steps in the create/register RPCs for dynamic host
volumes. We first check that submitted volumes are self-consistent (ex. max
capacity is more than min capacity), then that any updates we've made are
valid. And we validate against state: preventing claimed volumes from being
updated and preventing placement requests for nodes that don't exist.

Ref: https://github.com/hashicorp/nomad/issues/15489
2024-12-19 09:25:54 -05:00
Tim Gross
7c85176059 dynamic host volumes: basic CLI CRUD operations (#24382)
This changeset implements a first pass at the CLI for Dynamic Host Volumes.

Ref: https://hashicorp.atlassian.net/browse/NET-11549
2024-12-19 09:25:54 -05:00
Tim Gross
a65358da7b dynamic host volumes: HTTP API endpoint (#24380)
This changeset implements the HTTP API endpoints for Dynamic Host Volumes.

The `GET /v1/volumes` endpoint is shared between CSI and DHV with a query
parameter for the type. In the interest of getting some working handlers
available for use in development (and minimizing the size of the diff to
review), this changeset doesn't do any sort of refactoring of how the existing
List Volumes CSI endpoint works. That will come in a later PR, as will the
corresponding `api` package updates we need to support the CLI.

Ref: https://hashicorp.atlassian.net/browse/NET-11549
2024-12-19 09:25:54 -05:00
Deniz Onur Duzgun
22b7470ccf sec: fix alloc workload identity namespace permission (#24683)
Sanitize the Allocations SignedIdentities to prevent privilege escalation within a namespace through unauthorized impersonation of [workload associated with ACL policies](https://developer.hashicorp.com/nomad/docs/concepts/workload-identity#workload-associated-acl-policies) in any workload within the namespace.

Ref: CVE-2024-12678.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/2098
2024-12-16 16:35:10 -05:00
James Rasell
7d48aa2667 client: emit optional telemetry from prerun and prestart hooks. (#24556)
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.

The new datapoints are:
  - nomad.client.alloc_hook.prerun.success (counter)
  - nomad.client.alloc_hook.prerun.failed (counter)
  - nomad.client.alloc_hook.prerun.elapsed (sample)

  - nomad.client.task_hook.prestart.success (counter)
  - nomad.client.task_hook.prestart.failed (counter)
  - nomad.client.task_hook.prestart.elapsed (sample)

The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.

Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
2024-12-12 14:43:14 +00:00
James Rasell
86bc7ed224 cli: Ensure JSON flag is respected in autopilot health command. (#24655) 2024-12-12 13:43:32 +00:00
James Rasell
261359fba7 agent: Fix a bug where retry_join was not retrying. (#24561)
The retry_join logic was not allowing for retries to happen and
was exiting after the first failed discovery attempt. This change
fixes that behaviour and adds a test to ensure no further
regressions.
2024-11-29 08:29:15 +00:00
Piotr Kazmierczak
f7a4ded2c0 security: add CT executeTemplate to default function_denylist (#24541)
This PR adds Consul Template's executeTemplate function to the denylist by
default, in order to prevent accidental or malicious infinitely recursive
execution.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-11-22 19:33:56 +01:00
Piotr Kazmierczak
368241dbf2 security: a more comprehensive env.denylist (#24540)
A more comprehensive env.denylist that now includes more token, token file and
license variables. 

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-11-22 18:54:18 +01:00
Juana De La Cuesta
c21dfdb17a [gh-476] Sanitise HCL variables before storing on job submission (#24423)
* func: User url rules to scape non alphanumeric values in hcl variables

* docs: add changelog

* func: unscape flags before returning

* use JSON.stringify instead of bespoke value quoting to handle in-value-multi-line cases

---------

Co-authored-by: Phil Renaud <phil@riotindustries.com>
2024-11-22 11:45:02 +01:00
Tim Gross
6b9dbefb9e consul: handle nil multierror pointer correctly (#24513)
When the service client syncs to Consul, we accumulate service sync errors in a
multierror before reading all the local checks. If the API call to the local
checks fails, we either return that error or append it to the multierror and
return the set of errors. But `multierror.Error.Len()` doesn't nil-check, so we
need to do this ourselves.

I've also made a quick pass through the rest of the code base looking for
multierror `Len` method calls to see if we have this pattern elsewhere.

Fixes: https://github.com/hashicorp/nomad/issues/24512
2024-11-20 10:55:52 -05:00
Piotr Kazmierczak
9c5078f151 agent: set content type header explicitly (#24489)
This PR addresses an XSS vulnerability where Nomad agents wouldn't explicitly
set content type headers for error responses.
2024-11-20 10:18:30 +01:00
Tim Gross
189d648f95 csi: remove redundant namespace field from volume status output (#24432)
The `volume status :id` command outputs the namespace for a CSI volume
twice. Drop the second output.

Ref: https://github.com/hashicorp/nomad/pull/24382#discussion_r1837097250
2024-11-11 16:05:59 -05:00
hc-github-team-nomad-core
9f9e66fa61 Generate files for 1.9.3 release 2024-11-11 19:40:44 +01:00
hc-github-team-nomad-core
1938a7578b Generate files for 1.9.2 release 2024-11-08 15:21:39 +01:00
Daniel Bennett
a036b75aef api: new dispatch endpoint sends body as Payload (#24381)
this opens up dispatching parameterized jobs by systems
that do not allow modifying what http request body they send

e.g. these two things are equal:

POST '{"Payload": "'"$(base64 <<< "hello")"'"}' /v1/job/my-job/dispatch
POST 'hello' /v1/job/my-job/dispatch/payload
2024-11-07 10:12:29 -06:00
Jamie Finnigan
dec1bf51c0 update ndjson links due to domain expiry/resale (#24306) 2024-10-28 09:06:50 +00:00
Martijn Vegter
6236f354a5 consul: add support for service weight (#24186) 2024-10-25 11:21:38 -04:00