Commit Graph

2416 Commits

Author SHA1 Message Date
hc-github-team-nomad-core
1e49d9eb44 Generate files for 1.10.2 release 2025-06-10 14:35:25 -07:00
James Rasell
e95148c10d consul: Fix data race within test by using mutex to read map. (#25977) 2025-06-04 15:09:37 +01:00
Michael Smithhisler
4c8257d0c7 client: add once mode to template block (#25922) 2025-05-28 11:45:11 -04:00
Tim Gross
3f59860254 host volumes: add configuration to GC on node GC (#25903)
When a node is garbage collected, any dynamic host volumes on the node are
orphaned in the state store. We generally don't want to automatically collect
these volumes and risk data loss, and have provided a CLI flag to `-force`
remove them in #25902. But for clusters running on ephemeral cloud
instances (ex. AWS EC2 in an autoscaling group), deleting host volumes may add
excessive friction. Add a configuration knob to the client configuration to
remove host volumes from the state store on node GC.

Ref: https://github.com/hashicorp/nomad/pull/25902
Ref: https://github.com/hashicorp/nomad/issues/25762
Ref: https://hashicorp.atlassian.net/browse/NMD-705
2025-05-27 10:22:08 -04:00
tehut
55523ecf8e Add NodeMaxAllocations to client configuration (#25785)
* Set MaxAllocations in client config
Add NodeAllocationTracker struct to Node struct
Evaluate MaxAllocations in AllocsFit function
Set up cli config parsing
Integrate maxAllocs into AllocatedResources view
Co-authored-by: Tim Gross <tgross@hashicorp.com>

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-05-22 12:49:27 -07:00
Daniel Bennett
15c01e5a49 ipv6: normalize addrs per RFC-5942 §4 (#25921)
https://datatracker.ietf.org/doc/html/rfc5952#section-4

* copy NormalizeAddr func from vault
  * PRs hashicorp/vault#29228 & hashicorp/vault#29517
* normalize bind/advertise addrs
* normalize consul/vault addrs
2025-05-22 14:21:30 -04:00
Piotr Kazmierczak
cdc308a0eb wi: new endpoint for listing workload attached ACL policies (#25588)
This introduces a new HTTP endpoint (and an associated CLI command) for querying
ACL policies associated with a workload identity. It allows users that want
to learn about the ACL capabilities from within WI-tasks to know what sort of
policies are enabled.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-05-19 19:54:12 +02:00
Tim Gross
8a5a057d88 offline license utilization reporting (#25844)
Nomad Enterprise users operating in air-gapped or otherwise secured environments
don't want to send license reporting metrics directly from their
servers. Implement manual/offline reporting by periodically recording usage
metrics snapshots in the state store, and providing an API and CLI by which
cluster administrators can download the snapshot for review and out-of-band
transmission to HashiCorp.

This is the CE portion of the work required for implemention in the Enterprise
product. Nomad CE does not perform utilization reporting.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673
Ref: https://hashicorp.atlassian.net/browse/NMD-68
Ref: https://go.hashi.co/rfc/nmd-210
2025-05-14 09:51:13 -04:00
hc-github-team-nomad-core
9ef42e9807 Generate files for 1.10.1 release 2025-05-13 14:26:48 +02:00
James Rasell
0b265d2417 encrypter: Track initial tasks for is ready calculation. (#25803)
The server startup could "hang" to the view of an operator if it
had a key that could not be decrypted or replicated loaded from
the FSM at startup.

In order to prevent this happening, the server startup function
will now use a timeout to wait for the encrypter to be ready. If
the timeout is reached, the error is sent back to the caller which
fails the CLI command. This bubbling of error message will also
flush to logs which will provide addition operator feedback.

The server only cares about keys loaded from the FSM snapshot and
trailing logs before the encrypter should be classed as ready. So
that the encrypter ready function does not get blocked by keys
added outside of the initial Raft load, we take a snapshot of the
decryption tasks as we enter the blocking call, and class these as
our barrier.
2025-05-07 15:38:16 +01:00
Juanadelacuesta
9288a3141a func and docs: Use the config from the client and not from the agent that is already parsed. Add the breaking change to the release notes 2025-04-30 10:53:02 +02:00
Juanadelacuesta
949571e313 func: read the config from the agent, dont reparse 2025-04-24 05:01:53 +02:00
Juanadelacuesta
46343ee56e func: use the client's configured drain deadline to calculate the graceful timeout when terminating an agent 2025-04-23 23:59:50 +02:00
Juanadelacuesta
c91f24681d style: add changelog 2025-04-23 23:28:54 +02:00
Juana De La Cuesta
9778a31e29 Update command/agent/command.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-23 23:18:09 +02:00
Juana De La Cuesta
39b3d63172 Update command/agent/command.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-23 23:18:02 +02:00
Juana De La Cuesta
313f430fdd Update command/agent/command.go
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2025-04-23 23:17:36 +02:00
Juanadelacuesta
b47b962439 fix: use a timer instead of a time.After to avoid memory leaks 2025-04-23 16:26:46 +02:00
Juanadelacuesta
38c27b7e7f fix: make teh disctintion when for os.Interrupt 2025-04-23 16:20:04 +02:00
Juanadelacuesta
adf038b495 fix: correct the logic for LeaveOnTerm or LeaveOnInt depending on the incoming signal 2025-04-23 16:03:12 +02:00
Juanadelacuesta
b375974bc3 style: add comments 2025-04-23 15:47:37 +02:00
Juanadelacuesta
c5c4272aee func: force agent return if there is an error on reload 2025-04-23 15:14:48 +02:00
Piotr Kazmierczak
df3b00bce0 acl: use WhoAmI RPC endpoint in /acl/token/self (#25547)
ResolveToken RPC endpoint was only used by the /acl/token/self API. We should migrate to the WI-aware WhoAmI instead.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-22 17:53:39 +02:00
tehut
b11619010e Add priority flag to Dispatch CLI and API (#25622)
* Add priority flag to Dispatch CLI and DispatchOpts() helper to HTTP API
2025-04-18 13:24:52 -07:00
Arian van Putten
d28af58cbb agent: implement sd-notify reload correctly (#25636)
First of all, we should not send the unix time, but the monotonic time.
Second of all, RELOADING= and MONOTONIC_USEC fields should be sent in
*single* message not two separate messages.

From the man page of [systemd.service](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Type=)

> notification message via sd_notify(3) that contains the "RELOADING=1" field in
> combination with "MONOTONIC_USEC=" set to the current monotonic time (i.e.
> CLOCK_MONOTONIC in clock_gettime(2)) in μs, formatted as decimal string.

[sd_notify](https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html)
now has code samples of the protocol to clarify.

Without these changes, if you'd set
Type=notify-reload on the agen'ts systemd unit, systemd
would kill the service due to the service not responding to reload
correctly.
2025-04-14 11:38:56 -04:00
Michael Schurter
c5451cf300 Merge pull request #25635 from hashicorp/post-1.10.0-release
Post 1.10.0 release
2025-04-10 10:32:24 -07:00
Tim Gross
27caae2b2a api: make attempting to remove peer by address a no-op (#25599)
In Nomad 1.4.0 we removed support for Raft Protocol v2 entirely. But the
`Operator.RemoveRaftPeerByAddress` RPC handler was left in place, along with its
supporting HTTP API and command line flags. Using this API will always result in
the Raft library error "operation not supported with current protocol version".

Unfortunately it's still possible in unit tests to exercise this code path, and
these tests are quite flaky. This changeset turns the RPC handler and HTTP API
into a no-op, removes the associated command line flags, and removes the flaky
tests. I've also cleaned up the test for `RemoveRaftPeerByID` to consolidate
test servers and use `shoenig/test`.

Fixes: https://hashicorp.atlassian.net/browse/NET-12413
Ref: https://github.com/hashicorp/nomad/pull/13467
Ref: https://developer.hashicorp.com/nomad/docs/upgrade/upgrade-specific#raft-protocol-version-2-unsupported
Ref: https://github.com/hashicorp/nomad-enterprise/actions/runs/13201513025/job/36855234398?pr=2302
2025-04-10 09:19:25 -04:00
hc-github-team-nomad-core
71af41b4b1 Generate files for 1.10.0 release 2025-04-09 16:03:21 -07:00
hc-github-team-nomad-core
239c5f11ee Generate files for 1.10.0 release 2025-04-09 16:03:21 -07:00
hc-github-team-nomad-core
a18faebda1 Generate files for 1.10.0-rc.1 release 2025-04-03 18:21:58 +00:00
Nikita Eliseev
76fb3eb9a1 rpc: added configuration for yamux session (#25466)
Fixes: https://github.com/hashicorp/nomad/issues/25380
2025-04-02 10:58:23 -04:00
James Rasell
27ad88ac17 test: Calculate agent endpoint scheduler count, not static. (#25473) 2025-03-21 13:47:53 +00:00
James Rasell
b3f28f9387 test: Use runtime CPUs for test not static number. (#25458) 2025-03-20 09:05:36 +00:00
James Rasell
5a157eb123 server: Validate config num schedulers is between 0 and num CPUs. (#25441)
The `server.num_scheduler` configuration value should be a value
between 0 and the number of CPUs on the machine. The Nomad agent
was not validating the configuration parameter which meant you
could use a negative value or a value much larger than the
available machine CPUs. This change enforces validation of the
configuration value both on server startup and when the agent is
reloaded.

The Nomad API was only performing negative value validation when
updating the scheduler number via this method. This change adds
to the validation to ensure the number is not greater than the
CPUs on the machine.
2025-03-20 07:29:57 +00:00
James Rasell
61b2b9d3d0 agent: Improve retry joiner code with small refactor. (#25422)
The agent retry joiner implementation had different parameters
to control its execution for agents running in server and client
mode. The agent would set up individual joiners depending on the
agent mode, making the object parameter overhead unrequired.

This change removes the excess configuration options for the
joiner, reducing code complexity slighly and hopefully making
future modifications in this area easier to make.
2025-03-18 15:55:52 +00:00
hc-github-team-nomad-core
e1b9bd8ab0 Generate files for 1.10.0-beta.1 release 2025-03-12 10:37:46 +00:00
Daniel Bennett
04db81951f test: fix go 1.24 test complaints (#25346)
e.g. Error: nomad/leader_test.go:382:12: non-constant format string in call to (*testing.common).Fatalf
2025-03-11 11:01:39 -05:00
Tim Gross
1ffb7ab3fb dynamic host volumes: allow plugins to return an error message (#25341)
Errors from `volume create` or `volume delete` only get logged by the client
agent, which may make it harder for volume authors to debug these tasks if they
are not also the cluster administrator with access to host logs.

Allow plugins to include an optional error message in their response. Because we
can't count on receiving this response (the error could come before the plugin
executes), we parse this message optimistically and include it only if
available.

Ref: https://hashicorp.atlassian.net/browse/NET-12087
2025-03-11 11:06:57 -04:00
hc-github-team-nomad-core
1da56b8e07 Generate files for 1.9.7 release 2025-03-11 14:09:02 +00:00
Daniel Bennett
8e56805fea oidc: support PKCE and client assertion / private key JWT (#25231)
PKCE is enabled by default for new/updated auth methods.
 * ref: https://oauth.net/2/pkce/

Client assertions are an optional, more secure replacement for client secrets
 * ref: https://oauth.net/private-key-jwt/

a change to the existing flow, even without these new options,
is that the oidc.Req is retained on the Nomad server (leader)
in between auth-url and complete-auth calls.

and some fields in auth method config are now more strictly required.
2025-03-10 13:32:53 -05:00
James Rasell
2eb35a4678 build: Update Go to v1.24.1 (#25249) 2025-03-06 10:33:14 +00:00
Michael Smithhisler
5c4d0e923d consul: Remove legacy token based authentication workflow (#25217) 2025-03-05 15:38:11 -05:00
Michael Smithhisler
f2b761f17c disconnected: removes deprecated disconnect fields (#25284)
The group level fields stop_after_client_disconnect,
max_client_disconnect, and prevent_reschedule_on_lost were deprecated in
Nomad 1.8 and replaced by field in the disconnect block. This change
removes any logic related to those deprecated fields.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-03-05 14:46:02 -05:00
James Rasell
7268053174 vault: Remove legacy token based authentication workflow. (#25155)
The legacy workflow for Vault whereby servers were configured
using a token to provide authentication to the Vault API has now
been removed. This change also removes the workflow where servers
were responsible for deriving Vault tokens for Nomad clients.

The deprecated Vault config options used byi the Nomad agent have
all been removed except for "token" which is still in use by the
Vault Transit keyring implementation.

Job specification authors can no longer use the "vault.policies"
parameter and should instead use "vault.role" when not using the
default workload identity.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-28 07:40:02 +00:00
Piotr Kazmierczak
73a193f6d9 stateful deployments: task group host volume claims CLI (#25116)
CLI for interacting with task group host volume claims.
2025-02-27 17:04:48 +01:00
Piotr Kazmierczak
58c6387323 stateful deployments: task group host volume claims API (#25114)
This PR introduces API endpoints /v1/volumes/claims/ and /v1/volumes/claim/:id
for listing and deleting task group host volume claims, respectively.
2025-02-25 15:51:59 +01:00
Tim Gross
7b89c0ee28 template: fix client's default retry configuration (#25113)
In #20165 we fixed a bug where a partially configured `client.template` retry
block would set any unset fields to nil instead of their default values. But
this patch introduced a regression in the default values, so we were now
defaulting to unlimited retries if the retry block was unset. Restore the
correct behavior and add better test coverage at both the config parsing and
template configuration code.

Ref: https://github.com/hashicorp/nomad/pull/20165
Ref: https://github.com/hashicorp/nomad/issues/23305#issuecomment-2643731565
2025-02-14 09:25:41 -05:00
Jorge Marey
25426f0777 fingerprint: add config option to disable dmidecode (#25108) 2025-02-13 11:20:48 -05:00
hc-github-team-nomad-core
ac36990fe3 Generate files for 1.9.6 release 2025-02-11 17:03:45 -05:00
Matt Keeler
833e240597 Upgrade to using hashicorp/go-metrics@v0.5.4 (#24856)
* Upgrade to using hashicorp/go-metrics@v0.5.4

This also requires bumping the dependencies for:

* memberlist
* serf
* raft
* raft-boltdb
* (and indirectly hashicorp/mdns due to the memberlist or serf update)

Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.
2025-01-31 15:22:00 -05:00