Commit Graph

3921 Commits

Author SHA1 Message Date
Daniel Bennett
5c8e436de9 auth: oidc: disable pkce by default (#25600)
our goal of "enable by default, only for new auth methods"
proved to be unwieldy, so instead make it a simple bool,
disabled by default.
2025-04-07 12:36:09 -05:00
hc-github-team-nomad-core
a18faebda1 Generate files for 1.10.0-rc.1 release 2025-04-03 18:21:58 +00:00
Daniel Bennett
6a0c4f5a3d auth: oidc: enable pkce only on new auth methods (#25593)
trying not to violate the principle of least astonishment.

we want to only auto-enable PKCE on *new* auth methods,
rather than *new or updated* auth methods, to avoid a
scenario where a Nomad admin updates an auth method
sometime in the future -- something innocent like a new
client secret -- and their OIDC provider doesn't like PKCE.

the main concern is that the provider won't like PKCE
in a totally confusing way. error messages rarely
say PKCE directly, so why the user's auth method
suddenly broke would be a big mystery.

this means that to enable it on existing auth methods,
you would set `OIDCDisablePKCE = false`, and the double-
negative doesn't feel right, so instead, swap the language,
so enabling it on *existing* methods reads sensibly, and to
disable it on *new* methods reads ok-enough:
`OIDCEnablePKCE = false`
2025-04-03 10:56:17 -05:00
Nikita Eliseev
76fb3eb9a1 rpc: added configuration for yamux session (#25466)
Fixes: https://github.com/hashicorp/nomad/issues/25380
2025-04-02 10:58:23 -04:00
Allison Larson
17d191ae24 Add -group flag to alloc exec, alloc logs command (#25568)
* Add -group flag to `alloc exec`, `alloc logs` command

* fixup! Add -group flag to `alloc exec`, `alloc logs` command

* Add -group option to alloc fs

* Add changelog
2025-03-31 14:17:45 -07:00
Daniel Bennett
99c25fc635 dhv: mkdir plugin parameters: uid,guid,mode (#25533)
also remove Error logs from client rpc and promote plugin Debug logs to Error (since they have more info in them)
2025-03-28 10:13:13 -05:00
James Rasell
ea25503705 cli: Use meta response index to start monitoring volume create. (#25514) 2025-03-25 14:00:46 +00:00
James Rasell
27ad88ac17 test: Calculate agent endpoint scheduler count, not static. (#25473) 2025-03-21 13:47:53 +00:00
James Rasell
b3f28f9387 test: Use runtime CPUs for test not static number. (#25458) 2025-03-20 09:05:36 +00:00
James Rasell
5a157eb123 server: Validate config num schedulers is between 0 and num CPUs. (#25441)
The `server.num_scheduler` configuration value should be a value
between 0 and the number of CPUs on the machine. The Nomad agent
was not validating the configuration parameter which meant you
could use a negative value or a value much larger than the
available machine CPUs. This change enforces validation of the
configuration value both on server startup and when the agent is
reloaded.

The Nomad API was only performing negative value validation when
updating the scheduler number via this method. This change adds
to the validation to ensure the number is not greater than the
CPUs on the machine.
2025-03-20 07:29:57 +00:00
James Rasell
61b2b9d3d0 agent: Improve retry joiner code with small refactor. (#25422)
The agent retry joiner implementation had different parameters
to control its execution for agents running in server and client
mode. The agent would set up individual joiners depending on the
agent mode, making the object parameter overhead unrequired.

This change removes the excess configuration options for the
joiner, reducing code complexity slighly and hopefully making
future modifications in this area easier to make.
2025-03-18 15:55:52 +00:00
James Rasell
3e1f56c1c0 cli: Add volume type to delete error messages when API call fails. (#25392) 2025-03-14 14:59:41 +00:00
Daniel Bennett
3322254e5b cli: acl auth-method info: add client assertion (#25370)
and pkce
2025-03-12 12:38:03 -05:00
hc-github-team-nomad-core
e1b9bd8ab0 Generate files for 1.10.0-beta.1 release 2025-03-12 10:37:46 +00:00
Daniel Bennett
04db81951f test: fix go 1.24 test complaints (#25346)
e.g. Error: nomad/leader_test.go:382:12: non-constant format string in call to (*testing.common).Fatalf
2025-03-11 11:01:39 -05:00
Tim Gross
1ffb7ab3fb dynamic host volumes: allow plugins to return an error message (#25341)
Errors from `volume create` or `volume delete` only get logged by the client
agent, which may make it harder for volume authors to debug these tasks if they
are not also the cluster administrator with access to host logs.

Allow plugins to include an optional error message in their response. Because we
can't count on receiving this response (the error could come before the plugin
executes), we parse this message optimistically and include it only if
available.

Ref: https://hashicorp.atlassian.net/browse/NET-12087
2025-03-11 11:06:57 -04:00
hc-github-team-nomad-core
1da56b8e07 Generate files for 1.9.7 release 2025-03-11 14:09:02 +00:00
Daniel Bennett
8e56805fea oidc: support PKCE and client assertion / private key JWT (#25231)
PKCE is enabled by default for new/updated auth methods.
 * ref: https://oauth.net/2/pkce/

Client assertions are an optional, more secure replacement for client secrets
 * ref: https://oauth.net/private-key-jwt/

a change to the existing flow, even without these new options,
is that the oidc.Req is retained on the Nomad server (leader)
in between auth-url and complete-auth calls.

and some fields in auth method config are now more strictly required.
2025-03-10 13:32:53 -05:00
James Rasell
f94016816d cli: Add node_prefix read policy to Consul setup task policy. (#25310)
When Nomad registers a service within Consul it is regarded as a
node service. In order for Nomad workloads to read these services,
it must have an ACL policy which includes node_prefix read. If it
does not, the service is filtered out from the result.

This change adds the required permission to the Consul setup
command.
2025-03-10 08:06:09 +00:00
Phil Renaud
35e1ea4328 [cli] UI URL hints for common CLI commands (#24454)
* Basic implementation for server members and node status

* Commands for alloc status and job status

* -ui flag for most commands

* url hints for variables

* url hints for job dispatch, evals, and deployments

* agent config ui.cli_url_links to disable

* Fix an issue where path prefix was presumed for variables

* driver uncomment and general cleanup

* -ui flag on the generic status endpoint

* Job run command gets namespaces, and no longer gets ui hints for --output flag

* Dispatch command hints get a namespace, and bunch o tests

* Lots of tests depend on specific output, so let's not mess with them

* figured out what flagAddress is all about for testServer, oof

* Parallel outside of test instances

* Browser-opening test, sorta

* Env var for disabling/enabling CLI hints

* Addressing a few PR comments

* CLI docs available flags now all have -ui

* PR comments addressed; switched the env var to be consistent and scrunched monitor-adjacent hints a bit more

* ui.Output -> ui.Warn; moves hints from stdout to stderr

* isTerminal check and parseBool on command option

* terminal.IsTerminal check removed for test-runner-not-being-terminal reasons
2025-03-07 13:23:35 -05:00
James Rasell
2eb35a4678 build: Update Go to v1.24.1 (#25249) 2025-03-06 10:33:14 +00:00
Piotr Kazmierczak
29c7b7ca44 stateful deployments: fix missing prefix search in claim list CLI (#25297) 2025-03-06 11:23:00 +01:00
Michael Smithhisler
5c4d0e923d consul: Remove legacy token based authentication workflow (#25217) 2025-03-05 15:38:11 -05:00
Michael Smithhisler
f2b761f17c disconnected: removes deprecated disconnect fields (#25284)
The group level fields stop_after_client_disconnect,
max_client_disconnect, and prevent_reschedule_on_lost were deprecated in
Nomad 1.8 and replaced by field in the disconnect block. This change
removes any logic related to those deprecated fields.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-03-05 14:46:02 -05:00
James Rasell
7268053174 vault: Remove legacy token based authentication workflow. (#25155)
The legacy workflow for Vault whereby servers were configured
using a token to provide authentication to the Vault API has now
been removed. This change also removes the workflow where servers
were responsible for deriving Vault tokens for Nomad clients.

The deprecated Vault config options used byi the Nomad agent have
all been removed except for "token" which is still in use by the
Vault Transit keyring implementation.

Job specification authors can no longer use the "vault.policies"
parameter and should instead use "vault.role" when not using the
default workload identity.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-28 07:40:02 +00:00
Piotr Kazmierczak
73a193f6d9 stateful deployments: task group host volume claims CLI (#25116)
CLI for interacting with task group host volume claims.
2025-02-27 17:04:48 +01:00
Piotr Kazmierczak
58c6387323 stateful deployments: task group host volume claims API (#25114)
This PR introduces API endpoints /v1/volumes/claims/ and /v1/volumes/claim/:id
for listing and deleting task group host volume claims, respectively.
2025-02-25 15:51:59 +01:00
Tim Gross
4cdfa19b1e volume status: default type to show both DHV and CSI volumes (#25185)
The `-type` option for `volume status` is a UX papercut because for many
clusters there will be only one sort of volume in use. Update the CLI so that
the default behavior is to query CSI and/or DHV.

This behavior is subtly different when the user provides an ID or not. If the
user doesn't provide an ID, we query both CSI and DHV and show both tables. If
the user provides an ID, we query DHV first and then CSI, and show only the
appropriate volume. Because DHV IDs are UUIDs, we're sure we won't have
collisions between the two. We only show errors if both queries return an error.

Fixes: https://hashicorp.atlassian.net/browse/NET-12214
2025-02-24 11:38:07 -05:00
Tim Gross
d63c1d0bad volumes: add capabilities to volume status (#25173)
Job authors need to be able to review what capabilities a dynamic host volume or
CSI volume has so that they can set the correct access mode and attachment mode
in their job. Add these to the CLI output of `volume status`.

Ref: https://hashicorp.atlassian.net/browse/NET-12063
2025-02-20 14:40:31 -05:00
James Rasell
32c25d3935 cli: Remove warning notes from Vault and Consul setup commands. (#25153) 2025-02-19 09:18:42 +00:00
Tim Gross
dc58f247ed docs: clarify reschedule, migrate, and replacement terminology (#24929)
Our vocabulary around scheduler behaviors outside of the `reschedule` and
`migrate` blocks leaves room for confusion around whether the reschedule tracker
should be propagated between allocations. There are effectively five different
behaviors we need to cover:

* restart: when the tasks of an allocation fail and we try to restart the tasks
  in place.

* reschedule: when the `restart` block runs out of attempts (or the allocation
  fails before tasks even start), and we need to move
  the allocation to another node to try again.

* migrate: when the user has asked to drain a node and we need to move the
  allocations. These are not failures, so we don't want to propagate the
  reschedule tracker.

* replacement: when a node is lost, we don't count that against the `reschedule`
  tracker for the allocations on the node (it's not the allocation's "fault",
  after all). We don't want to run the `migrate` machinery here here either, as we
  can't contact the down node. To the scheduler, this is effectively the same as
  if we bumped the `group.count`

* replacement for `disconnect.replace = true`: this is a replacement, but the
  replacement is intended to be temporary, so we propagate the reschedule tracker.

Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining
when each item applies. Update the use of the word "reschedule" in several
places where "replacement" is correct, and vice-versa.

Fixes: https://github.com/hashicorp/nomad/issues/24918
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-18 09:31:03 -05:00
Tim Gross
7b89c0ee28 template: fix client's default retry configuration (#25113)
In #20165 we fixed a bug where a partially configured `client.template` retry
block would set any unset fields to nil instead of their default values. But
this patch introduced a regression in the default values, so we were now
defaulting to unlimited retries if the retry block was unset. Restore the
correct behavior and add better test coverage at both the config parsing and
template configuration code.

Ref: https://github.com/hashicorp/nomad/pull/20165
Ref: https://github.com/hashicorp/nomad/issues/23305#issuecomment-2643731565
2025-02-14 09:25:41 -05:00
Jorge Marey
25426f0777 fingerprint: add config option to disable dmidecode (#25108) 2025-02-13 11:20:48 -05:00
hc-github-team-nomad-core
ac36990fe3 Generate files for 1.9.6 release 2025-02-11 17:03:45 -05:00
Phil Renaud
9367929d87 [cli] Adds Actions to job status command output (#24959)
* Adds Actions to job status command output

* Adds Actions to job status command output

* Status documentation updated to show actions and formatJobActions no longer cares about pipe delineation
2025-02-04 09:34:49 -05:00
Tim Gross
7929939116 volume delete: allow prefix for ID (#24997)
The `volume delete` command doesn't allow using a prefix for the volume ID for
either CSI or dynamic host volumes. Use a prefix search and wildcard namespace
as we do for other CLI commands.

Ref: https://hashicorp.atlassian.net/browse/NET-12057
2025-02-03 11:29:43 -05:00
Tim Gross
cc99e8f0a2 dynamic host volumes: add -id arg for updates of existing volumes (#24996)
If you create a volume via `volume create/register` and want to update it later,
you need to change the volume spec to add the ID that was returned. This isn't a
very nice UX, so let's add an `-id` argument that allows you to update existing
volumes that have that ID.

Ref: https://hashicorp.atlassian.net/browse/NET-12083
2025-02-03 10:26:30 -05:00
Matt Keeler
833e240597 Upgrade to using hashicorp/go-metrics@v0.5.4 (#24856)
* Upgrade to using hashicorp/go-metrics@v0.5.4

This also requires bumping the dependencies for:

* memberlist
* serf
* raft
* raft-boltdb
* (and indirectly hashicorp/mdns due to the memberlist or serf update)

Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.
2025-01-31 15:22:00 -05:00
Daniel Bennett
dcf6201d2b dynamic host volumes: CE side of quota tweaks (#24972)
* quota spec:
  if `region_limit.storage.host_volumes` is set,
  do not require that `variables` also be set,
  and vice versa.
* subtract from quota usage on volume delete
* stub CE quota subtraction method
2025-01-28 17:27:25 -06:00
Tim Gross
09eb473189 dynamic host volumes: set status unavailable on failed restore (#24962)
When a client restarts but can't restore a volume (ex. the plugin is now
missing), it's removed from the node fingerprint. So we won't allow future
scheduling of the volume, but we were not updating the volume state field to
report this reasoning to operators. Make debugging easier and the state field
more meaningful by setting the value to "unavailable".

Also, remove the unused "deleted" field. We did not implement soft deletes and
aren't planning on it for Nomad 1.10.0.

Ref: https://hashicorp.atlassian.net/browse/NET-11551
2025-01-27 16:35:53 -05:00
Daniel Bennett
49c147bcd7 dynamic host volumes: change env vars, fixup auto-delete (#24943)
* plugin env: DHV_HOST_PATH->DHV_VOLUMES_DIR
* client config: host_volumes_dir
* plugin env: add namespace+nodepool
* only auto-delete after error saving client state
  on *initial* create
2025-01-27 10:36:53 -06:00
Tim Gross
7add04eb0f refactor: volume request modes to be generic between DHV/CSI (#24896)
When we implemented CSI, the types of the fields for access mode and attachment
mode on volume requests were defined with a prefix "CSI". This gets confusing
now that we have dynamic host volumes using the same fields. Fortunately the
original was a typedef on string, and the Go API in the `api` package just uses
strings directly, so we can change the name of the type without breaking
backwards compatibility for the msgpack wire format.

Update the names to `VolumeAccessMode` and `VolumeAttachmentMode`. Keep the CSI
and DHV specific value constant names for these fields (they aren't currently
1:1), so that we can easily differentiate in a given bit of code which values
are valid.

Ref: https://github.com/hashicorp/nomad/pull/24881#discussion_r1920702890
2025-01-24 10:37:48 -05:00
Tim Gross
3e7adba8f0 volume spec: fix access_mode field in examples (#24911)
The `volume init` command creates example volume specifications. But one of the
values for `capability.access_mode` is not a valid value. Correct the example to
match the validation logic.
2025-01-22 09:30:49 -05:00
Michael Schurter
63dacd2d6e update vault token warning from 1.9->1.10 (#24884)
Fixes #24847
2025-01-17 10:56:06 -08:00
Tim Gross
96e539ee87 dynamic host volumes quotas (#24871)
Allow users to configure a host volumes quota in MB. This will be enforced at
the time of provisioning via create/register RPCs. This changeset is the CE
version of ENT/2114.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2114
Ref: https://hashicorp.atlassian.net/browse/NET-11549
2025-01-17 11:41:56 -05:00
James Rasell
63ea13be77 agent: Ensure logger set up method is public. (#24886)
This is needed by a Nomad Enterprise code path.
2025-01-17 13:47:06 +00:00
James Rasell
753f752cdd agent: remove unused log filter and unrequired library. (#24873)
The Nomad agent used a log filter to ensure logs were written at
the expected level. Since the use of hclog this is not required,
as hclog acts as the gate keeper and filter for logging. All log
writers accept messages from hclog which has already done the
filtering.
2025-01-17 07:51:27 +00:00
James Rasell
1ae9785f9b agent: Fix a bug where all syslog lines are notice when using JSON (#24865)
The agent syslog write handler was unable to handle JSON log lines
correctly, meaning all syslog entries when using JSON log format
showed as NOTICE level.

This change adds a new handler to the Nomad agent which can parse
JSON log lines and correctly understand the expected log level
entry.

The change also removes the use of a filter from the default log
format handler. This is not needed as the logs are fed into the
syslog handler via hclog, which is responsible for level
filtering.
2025-01-16 07:23:08 +00:00
Tim Gross
46bd0b1716 dynamic host volume: set default capability (#24857)
We can reduce the amount of volume specification configuration many users will
need by setting a default capability on a dynamic host volume if none is
set. The default capability will allow using the volume in read/write mode on
its node, with no further restrictions except those that might be set in the
jobspec.
2025-01-15 14:07:07 -05:00
James Rasell
8d201a82fd agent: Fixed a bug where syslog error messages marked as notice. (#24820)
The mapping between Nomad log level identifiers and syslog
priorities did not handle the error level string correctly.
2025-01-15 08:02:53 +00:00