ResolveToken RPC endpoint was only used by the /acl/token/self API. We should migrate to the WI-aware WhoAmI instead.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
First of all, we should not send the unix time, but the monotonic time.
Second of all, RELOADING= and MONOTONIC_USEC fields should be sent in
*single* message not two separate messages.
From the man page of [systemd.service](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Type=)
> notification message via sd_notify(3) that contains the "RELOADING=1" field in
> combination with "MONOTONIC_USEC=" set to the current monotonic time (i.e.
> CLOCK_MONOTONIC in clock_gettime(2)) in μs, formatted as decimal string.
[sd_notify](https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html)
now has code samples of the protocol to clarify.
Without these changes, if you'd set
Type=notify-reload on the agen'ts systemd unit, systemd
would kill the service due to the service not responding to reload
correctly.
The `server.num_scheduler` configuration value should be a value
between 0 and the number of CPUs on the machine. The Nomad agent
was not validating the configuration parameter which meant you
could use a negative value or a value much larger than the
available machine CPUs. This change enforces validation of the
configuration value both on server startup and when the agent is
reloaded.
The Nomad API was only performing negative value validation when
updating the scheduler number via this method. This change adds
to the validation to ensure the number is not greater than the
CPUs on the machine.
The agent retry joiner implementation had different parameters
to control its execution for agents running in server and client
mode. The agent would set up individual joiners depending on the
agent mode, making the object parameter overhead unrequired.
This change removes the excess configuration options for the
joiner, reducing code complexity slighly and hopefully making
future modifications in this area easier to make.
Errors from `volume create` or `volume delete` only get logged by the client
agent, which may make it harder for volume authors to debug these tasks if they
are not also the cluster administrator with access to host logs.
Allow plugins to include an optional error message in their response. Because we
can't count on receiving this response (the error could come before the plugin
executes), we parse this message optimistically and include it only if
available.
Ref: https://hashicorp.atlassian.net/browse/NET-12087
PKCE is enabled by default for new/updated auth methods.
* ref: https://oauth.net/2/pkce/
Client assertions are an optional, more secure replacement for client secrets
* ref: https://oauth.net/private-key-jwt/
a change to the existing flow, even without these new options,
is that the oidc.Req is retained on the Nomad server (leader)
in between auth-url and complete-auth calls.
and some fields in auth method config are now more strictly required.
The group level fields stop_after_client_disconnect,
max_client_disconnect, and prevent_reschedule_on_lost were deprecated in
Nomad 1.8 and replaced by field in the disconnect block. This change
removes any logic related to those deprecated fields.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
The legacy workflow for Vault whereby servers were configured
using a token to provide authentication to the Vault API has now
been removed. This change also removes the workflow where servers
were responsible for deriving Vault tokens for Nomad clients.
The deprecated Vault config options used byi the Nomad agent have
all been removed except for "token" which is still in use by the
Vault Transit keyring implementation.
Job specification authors can no longer use the "vault.policies"
parameter and should instead use "vault.role" when not using the
default workload identity.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
In #20165 we fixed a bug where a partially configured `client.template` retry
block would set any unset fields to nil instead of their default values. But
this patch introduced a regression in the default values, so we were now
defaulting to unlimited retries if the retry block was unset. Restore the
correct behavior and add better test coverage at both the config parsing and
template configuration code.
Ref: https://github.com/hashicorp/nomad/pull/20165
Ref: https://github.com/hashicorp/nomad/issues/23305#issuecomment-2643731565
* Upgrade to using hashicorp/go-metrics@v0.5.4
This also requires bumping the dependencies for:
* memberlist
* serf
* raft
* raft-boltdb
* (and indirectly hashicorp/mdns due to the memberlist or serf update)
Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.
When we implemented CSI, the types of the fields for access mode and attachment
mode on volume requests were defined with a prefix "CSI". This gets confusing
now that we have dynamic host volumes using the same fields. Fortunately the
original was a typedef on string, and the Go API in the `api` package just uses
strings directly, so we can change the name of the type without breaking
backwards compatibility for the msgpack wire format.
Update the names to `VolumeAccessMode` and `VolumeAttachmentMode`. Keep the CSI
and DHV specific value constant names for these fields (they aren't currently
1:1), so that we can easily differentiate in a given bit of code which values
are valid.
Ref: https://github.com/hashicorp/nomad/pull/24881#discussion_r1920702890
The Nomad agent used a log filter to ensure logs were written at
the expected level. Since the use of hclog this is not required,
as hclog acts as the gate keeper and filter for logging. All log
writers accept messages from hclog which has already done the
filtering.
The agent syslog write handler was unable to handle JSON log lines
correctly, meaning all syslog entries when using JSON log format
showed as NOTICE level.
This change adds a new handler to the Nomad agent which can parse
JSON log lines and correctly understand the expected log level
entry.
The change also removes the use of a filter from the default log
format handler. This is not needed as the logs are fed into the
syslog handler via hclog, which is responsible for level
filtering.
Adds an additional check in the Keyring.Delete RPC to make sure we're not
trying to delete a key that's been used to encrypt a variable. It also adds a
-force flag for the CLI/API to sidestep that check.
The List Volumes API was originally written for CSI but assumed we'd have future
volume types, dispatched on a query parameter. Dynamic host volumes uses this,
but the resulting code has host volumes concerns comingled in the CSI volumes
endpoint. Refactor this so that we have a top-level `GET /v1/volumes` route that's
shared between CSI and DHV, and have it dispatch to the appropriate handler in
the type-specific endpoints.
Ref: https://github.com/hashicorp/nomad/pull/24479
Most Nomad upsert RPCs accept a single object with the notable exception of
CSI. But in CSI we don't actually expose this to users except through the Go
API. It deeply complicates how we present errors to users, especially once
Sentinel policy enforcement enters the mix.
Refactor the `HostVolume.Create` and `HostVolume.Register` RPCs to take a single
volume instead of a slice of volumes.
Add a stub function for Enterprise policy enforcement. This requires splitting
out placement from the `createVolume` function so that we can ensure we've
completed placement before trying to enforce policy.
Ref: https://github.com/hashicorp/nomad/pull/24479
When making a request to create a dynamic host volumes, users can pass a node
pool and constraints instead of a specific node ID.
This changeset implements a node scheduling logic by instantiating a filter by
node pool and constraint checker borrowed from the scheduler package. Because
host volumes with the same name can't land on the same host, we don't need to
support `distinct_hosts`/`distinct_property`; this would be challenging anyways
without building out a much larger node iteration mechanism to keep track of
usage across multiple hosts.
Ref: https://github.com/hashicorp/nomad/pull/24479
Add several validation steps in the create/register RPCs for dynamic host
volumes. We first check that submitted volumes are self-consistent (ex. max
capacity is more than min capacity), then that any updates we've made are
valid. And we validate against state: preventing claimed volumes from being
updated and preventing placement requests for nodes that don't exist.
Ref: https://github.com/hashicorp/nomad/issues/15489