276 Commits

Author SHA1 Message Date
Tim Gross
4e75e99f1a windows: use/accept platform-specific signal for stopping agent (#26780)
On Windows, the `os.Process.Signal` method returns an error when sending
`os.Interrupt` (SIGINT) because it isn't implemented. This causes test servers
in the `testutil` packages to break on Windows. Use the platform specific
syscalls to generate the SIGINT instead.

The agent's signal handler also did not correctly handle the Ctrl-C because we
were masking os.Interrupt instead of SIGINT.

Fixes: https://github.com/hashicorp/nomad/issues/26775

Co-authored-by: Chris Roberts <croberts@hashicorp.com>
2025-09-17 11:32:20 -04:00
Chris Roberts
61c36bdef7 [winsvc] Add support for Windows Eventlog (#26441)
Defines a `winsvc.Event` type which can be sent using the `winsvc.SendEvent`
function. If nomad is running on Windows and can send to the Windows
Eventlog the event will be sent. Initial event types are defined for
starting, ready, stopped, and log message.

The `winsvc.EventLogger` provides an `io.WriteCloser` that can be included
in the logger's writers collection. It will extract the log level from
log lines and write them appropriately to the eventlog. The eventlog
only supports error, warning, and info levels so messages with other
levels will be ignored.

A new configuration block is included for enabling logging to the
eventlog. Logging must be enabled with the `log_level` option and
the `eventlog.level` value can then be of the same or higher severity.
2025-09-02 16:40:31 -07:00
James Rasell
cddc1b0127 config: Validate keyring config to catch invalid provider types. (#26673) 2025-09-02 11:07:49 +01:00
Michael Smithhisler
da4cf07ff4 logs: skip logging SIGPIPE signal (#26582) 2025-08-21 09:08:49 -04:00
James Rasell
80a26306bf intro: Add node introduction flow for Nomad client registration. (#26405)
This change implements the client -> server workflow for Nomad
node introduction. A Nomad node can optionally be started with an
introduction token, which is a signed JWT containing claims for
the node registration. The server handles this according to the
enforcement configuration.

The introduction token can be provided by env var, cli flag, or
by placing it within a default filesystem location. The latter
option does not override the CLI or env var.

The region claims has been removed from the initial claims set of
the intro identity. This boundary is guarded by mTLS and aligns
with the node identity.
2025-08-05 08:23:44 +01:00
Chris Roberts
493e7b2faa command: prevent server panic on graceful shutdown (#26171)
When performing a graceful shutdown the client drain configuration
is checked for a deadline which is appended to the timeout. When
running as a server the client will not be set. Attempting to get
the drain deadline will result in a panic. This checks for the
client being available prior to fetching the deadline value.
2025-07-01 15:54:03 -07:00
Chris Roberts
4dbf645bf7 command: prevent panic on graceful shutdown (#26018)
When performing a graceful shutdown a channel is used to wait for
the agent to leave. The channel is closed when the agent leaves
successfully, but it also is closed within a deferral. If the
agent successfully leaves and closes the channel, a panic will
occur when the channel is closed the second time within the
deferral. To prevent this from occurring, the channel closing
is wrapped within a `OnceFunc` so the channel is only closed
once.
2025-06-12 09:35:57 -07:00
Chris Roberts
eeec603975 command: prevent early exit from graceful shutdown (#26023)
While waiting for the agent to leave during a graceful shutdown
the wait can be interrupted immediately if another signal is
received. It is common that while waiting a `SIGPIPE` is received
from journald causing the wait to end early. This results in the
agent not finishing the leave process and reporting an error when
the process has stopped. Instead of allowing any signal to interrupt
the wait, the signal is checked for a `SIGPIPE` and if matched will
continue waiting.
2025-06-12 08:56:55 -07:00
Juanadelacuesta
9288a3141a func and docs: Use the config from the client and not from the agent that is already parsed. Add the breaking change to the release notes 2025-04-30 10:53:02 +02:00
Juanadelacuesta
949571e313 func: read the config from the agent, dont reparse 2025-04-24 05:01:53 +02:00
Juanadelacuesta
46343ee56e func: use the client's configured drain deadline to calculate the graceful timeout when terminating an agent 2025-04-23 23:59:50 +02:00
Juanadelacuesta
c91f24681d style: add changelog 2025-04-23 23:28:54 +02:00
Juana De La Cuesta
9778a31e29 Update command/agent/command.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-23 23:18:09 +02:00
Juana De La Cuesta
39b3d63172 Update command/agent/command.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-23 23:18:02 +02:00
Juana De La Cuesta
313f430fdd Update command/agent/command.go
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2025-04-23 23:17:36 +02:00
Juanadelacuesta
b47b962439 fix: use a timer instead of a time.After to avoid memory leaks 2025-04-23 16:26:46 +02:00
Juanadelacuesta
38c27b7e7f fix: make teh disctintion when for os.Interrupt 2025-04-23 16:20:04 +02:00
Juanadelacuesta
adf038b495 fix: correct the logic for LeaveOnTerm or LeaveOnInt depending on the incoming signal 2025-04-23 16:03:12 +02:00
Juanadelacuesta
b375974bc3 style: add comments 2025-04-23 15:47:37 +02:00
Juanadelacuesta
c5c4272aee func: force agent return if there is an error on reload 2025-04-23 15:14:48 +02:00
Arian van Putten
d28af58cbb agent: implement sd-notify reload correctly (#25636)
First of all, we should not send the unix time, but the monotonic time.
Second of all, RELOADING= and MONOTONIC_USEC fields should be sent in
*single* message not two separate messages.

From the man page of [systemd.service](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Type=)

> notification message via sd_notify(3) that contains the "RELOADING=1" field in
> combination with "MONOTONIC_USEC=" set to the current monotonic time (i.e.
> CLOCK_MONOTONIC in clock_gettime(2)) in μs, formatted as decimal string.

[sd_notify](https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html)
now has code samples of the protocol to clarify.

Without these changes, if you'd set
Type=notify-reload on the agen'ts systemd unit, systemd
would kill the service due to the service not responding to reload
correctly.
2025-04-14 11:38:56 -04:00
Nikita Eliseev
76fb3eb9a1 rpc: added configuration for yamux session (#25466)
Fixes: https://github.com/hashicorp/nomad/issues/25380
2025-04-02 10:58:23 -04:00
James Rasell
61b2b9d3d0 agent: Improve retry joiner code with small refactor. (#25422)
The agent retry joiner implementation had different parameters
to control its execution for agents running in server and client
mode. The agent would set up individual joiners depending on the
agent mode, making the object parameter overhead unrequired.

This change removes the excess configuration options for the
joiner, reducing code complexity slighly and hopefully making
future modifications in this area easier to make.
2025-03-18 15:55:52 +00:00
Michael Smithhisler
5c4d0e923d consul: Remove legacy token based authentication workflow (#25217) 2025-03-05 15:38:11 -05:00
James Rasell
7268053174 vault: Remove legacy token based authentication workflow. (#25155)
The legacy workflow for Vault whereby servers were configured
using a token to provide authentication to the Vault API has now
been removed. This change also removes the workflow where servers
were responsible for deriving Vault tokens for Nomad clients.

The deprecated Vault config options used byi the Nomad agent have
all been removed except for "token" which is still in use by the
Vault Transit keyring implementation.

Job specification authors can no longer use the "vault.policies"
parameter and should instead use "vault.role" when not using the
default workload identity.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-28 07:40:02 +00:00
Matt Keeler
833e240597 Upgrade to using hashicorp/go-metrics@v0.5.4 (#24856)
* Upgrade to using hashicorp/go-metrics@v0.5.4

This also requires bumping the dependencies for:

* memberlist
* serf
* raft
* raft-boltdb
* (and indirectly hashicorp/mdns due to the memberlist or serf update)

Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.
2025-01-31 15:22:00 -05:00
Daniel Bennett
49c147bcd7 dynamic host volumes: change env vars, fixup auto-delete (#24943)
* plugin env: DHV_HOST_PATH->DHV_VOLUMES_DIR
* client config: host_volumes_dir
* plugin env: add namespace+nodepool
* only auto-delete after error saving client state
  on *initial* create
2025-01-27 10:36:53 -06:00
Michael Schurter
63dacd2d6e update vault token warning from 1.9->1.10 (#24884)
Fixes #24847
2025-01-17 10:56:06 -08:00
James Rasell
63ea13be77 agent: Ensure logger set up method is public. (#24886)
This is needed by a Nomad Enterprise code path.
2025-01-17 13:47:06 +00:00
James Rasell
753f752cdd agent: remove unused log filter and unrequired library. (#24873)
The Nomad agent used a log filter to ensure logs were written at
the expected level. Since the use of hclog this is not required,
as hclog acts as the gate keeper and filter for logging. All log
writers accept messages from hclog which has already done the
filtering.
2025-01-17 07:51:27 +00:00
James Rasell
1ae9785f9b agent: Fix a bug where all syslog lines are notice when using JSON (#24865)
The agent syslog write handler was unable to handle JSON log lines
correctly, meaning all syslog entries when using JSON log format
showed as NOTICE level.

This change adds a new handler to the Nomad agent which can parse
JSON log lines and correctly understand the expected log level
entry.

The change also removes the use of a filter from the default log
format handler. This is not needed as the logs are fed into the
syslog handler via hclog, which is responsible for level
filtering.
2025-01-16 07:23:08 +00:00
James Rasell
8d201a82fd agent: Fixed a bug where syslog error messages marked as notice. (#24820)
The mapping between Nomad log level identifiers and syslog
priorities did not handle the error level string correctly.
2025-01-15 08:02:53 +00:00
Charlie Voiselle
30ab8897d2 deps: Switch from mitchellh/cli to hashicorp/cli (#19321)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2024-12-19 15:41:11 +00:00
Daniel Bennett
46a39560bb dynamic host volumes: fingerprint client plugins (#24589) 2024-12-19 09:25:54 -05:00
guifran001
1c44521543 client: Add a preferred address family option for network-interface (#23389)
to prefer ipv4 or ipv6 when deducing IP from network interface

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-07-12 15:30:38 -05:00
Tim Gross
54fc146432 agent: add support for sdnotify protocol (#20528)
Nomad agents expect to receive `SIGHUP` to reload their configuration. The
signal handler for this is installed fairly late in agent startup, after the
client or server components are up and running. This means that configuration
management tools can potentially reload the configuration before the agent can
handle it, causing the agent to crash.

We don't want to allow configuration reload during client or server component
startup, because it would significantly complicate initialization. Instead,
we'll implement the systemd notify protocol. This causes systemd to block
sending configuration reload signals until the agent is actually ready. Users
can still bypass this by sending signals directly.

Note that there are several Go libraries that implement the sdnotify protocol,
but most are part of much larger projects which would create a lot of dependabot
burden. The bits of the protocol we need are extremely simple to implement in a
just a couple of functions.

For non-Linux or non-systemd Linux systems, this feature is a no-op. In future
work we could potentially implement service notification for Windows as well.

Fixes: https://github.com/hashicorp/nomad/issues/3885
2024-05-03 13:42:07 -04:00
James Rasell
facc3e8013 agent: allow configuration of in-memory telemetry sink. (#20166)
This change adds configuration options for setting the in-memory
telemetry sink collection and retention durations. This sink backs
the metrics JSON API and previously had hard-coded default values.

The new options are particularly useful when running development or
debug environments, where metrics collection is desired at a fast
and granular rate.
2024-03-25 15:00:18 +00:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
James Rasell
41555b6370 cli: Fix minor help formatting issue in agent command. (#19743) 2024-01-17 12:18:00 +00:00
Mike Nomitch
31f4296826 Adds support for failures before warning to Consul service checks (#19336)
Adds support for failures before warning and failures before critical
to the automatically created Nomad client and server services in Consul
2023-12-14 11:33:31 -08:00
Luiz Aoqui
099ee06a60 Revert "deps: update go-metrics to v0.5.3 (#19190)" (#19374)
* Revert "deps: update go-metrics to v0.5.3 (#19190)"

This reverts commit ddb060d8b3.

* changelog: add entry for #19374
2023-12-08 08:46:55 -05:00
Luiz Aoqui
c624dc2121 config: fix loading Vault token from env var (#19349)
The `defaultVault` variable is a pointer to the Vault configuration
named `default`. Initially, this variable points to the Vault
configuration that is used to load CLI flag values, but after those are
merged with the default and config file values the pointer reference
must be updated before mutating the config with environment variable
values.
2023-12-07 11:56:53 -05:00
Luiz Aoqui
27d2ad1baf cli: add -dev-consul and -dev-vault agent mode (#19327)
The `-dev-consul` and `-dev-vault` flags add default identities and
configuration to the Nomad agent to connect and use the workload
identity integration with Consul and Vault.
2023-12-07 11:51:20 -05:00
Luiz Aoqui
ddb060d8b3 deps: update go-metrics to v0.5.3 (#19190)
Update `go-metrics` to v0.5.3 to pick
https://github.com/hashicorp/go-metrics/pull/146.
2023-11-28 12:37:57 -05:00
Adriano Caloiaro
f66eb83fc0 Add go-netaddrs support to retry_join (#18745) 2023-11-15 10:07:18 -05:00
Tim Gross
9d075c44b2 config: remove old Vault/Consul config blocks from parser (#18997)
Remove the now-unused original configuration blocks for Consul and Vault from
the agent configuration parsing. When the agent needs to refer to a Consul or
Vault block it will always be for a specific cluster for the task/service (or
the default cluster for the agent's own use).

This is third of three changesets for this work.

Fixes: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Ref: https://github.com/hashicorp/nomad/pull/18994
2023-11-08 09:30:08 -05:00
Tim Gross
8f8265fa6d add deprecation warning for Vault/Consul token usage (#18863)
Submitting a Consul or Vault token with a job is deprecated in Nomad 1.7 and
intended for removal in Nomad 1.9. Add a deprecation warning to the CLI when the
user passes in the appropriate flag or environment variable.

Nomad agents will no longer need a Vault token when configured with workload
identity, and we'll ignore Vault tokens in the agent config after Nomad 1.9. Log
a warning at agent startup.

Ref: https://github.com/hashicorp/nomad/issues/15617
Ref: https://github.com/hashicorp/nomad/issues/15618
2023-10-26 10:46:02 -04:00
Michael Schurter
a806363f6d OpenID Configuration Discovery Endpoint (#18691)
Added the [OIDC Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) `/.well-known/openid-configuration` endpoint to Nomad, but it is only enabled if the `server.oidc_issuer` parameter is set. Documented the parameter, but without a tutorial trying to actually _use_ this will be very hard.

I intentionally did *not* use https://github.com/hashicorp/cap for the OIDC configuration struct because it's built to be a *compliant* OIDC provider. Nomad is *not* trying to be compliant initially because compliance to the spec does not guarantee it will actually satisfy the requirements of third parties. I want to avoid the problem where in an attempt to be standards compliant we ship configuration parameters that lock us in to a certain behavior that we end up regretting. I want to add parameters and behaviors as there's a demonstrable need.

Users always have the escape hatch of providing their own OIDC configuration endpoint. Nomad just needs to know the Issuer so that the JWTs match the OIDC configuration. There's no reason the actual OIDC configuration JSON couldn't live in S3 and get served directly from there. Unlike JWKS the OIDC configuration should be static, or at least change very rarely.

This PR is just the endpoint extracted from #18535. The `RS256` algorithm still needs to be added in hopes of supporting third parties such as [AWS IAM OIDC Provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html).

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-10-20 17:11:41 -07:00
James Rasell
1ffdd576bb agent: add config option to enable file and line log detail. (#18768) 2023-10-16 15:59:16 +01:00
Tim Gross
5001bf4547 consul: use constant instead of "default" literal (#18611)
Use the constant `structs.ConsulDefaultCluster` instead of the string literal
"default", as we've done for Vault.
2023-09-28 16:50:21 -04:00