nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Juanadelacuesta	adf038b495	fix: correct the logic for LeaveOnTerm or LeaveOnInt depending on the incoming signal	2025-04-23 16:03:12 +02:00
Juanadelacuesta	b375974bc3	style: add comments	2025-04-23 15:47:37 +02:00
Juanadelacuesta	c5c4272aee	func: force agent return if there is an error on reload	2025-04-23 15:14:48 +02:00
Arian van Putten	d28af58cbb	agent: implement sd-notify reload correctly (#25636 ) First of all, we should not send the unix time, but the monotonic time. Second of all, RELOADING= and MONOTONIC_USEC fields should be sent in single message not two separate messages. From the man page of [systemd.service](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Type=) > notification message via sd_notify(3) that contains the "RELOADING=1" field in > combination with "MONOTONIC_USEC=" set to the current monotonic time (i.e. > CLOCK_MONOTONIC in clock_gettime(2)) in μs, formatted as decimal string. [sd_notify](https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html) now has code samples of the protocol to clarify. Without these changes, if you'd set Type=notify-reload on the agen'ts systemd unit, systemd would kill the service due to the service not responding to reload correctly.	2025-04-14 11:38:56 -04:00
Nikita Eliseev	76fb3eb9a1	rpc: added configuration for yamux session (#25466 ) Fixes: https://github.com/hashicorp/nomad/issues/25380	2025-04-02 10:58:23 -04:00
James Rasell	61b2b9d3d0	agent: Improve retry joiner code with small refactor. (#25422 ) The agent retry joiner implementation had different parameters to control its execution for agents running in server and client mode. The agent would set up individual joiners depending on the agent mode, making the object parameter overhead unrequired. This change removes the excess configuration options for the joiner, reducing code complexity slighly and hopefully making future modifications in this area easier to make.	2025-03-18 15:55:52 +00:00
Michael Smithhisler	5c4d0e923d	consul: Remove legacy token based authentication workflow (#25217 )	2025-03-05 15:38:11 -05:00
James Rasell	7268053174	vault: Remove legacy token based authentication workflow. (#25155 ) The legacy workflow for Vault whereby servers were configured using a token to provide authentication to the Vault API has now been removed. This change also removes the workflow where servers were responsible for deriving Vault tokens for Nomad clients. The deprecated Vault config options used byi the Nomad agent have all been removed except for "token" which is still in use by the Vault Transit keyring implementation. Job specification authors can no longer use the "vault.policies" parameter and should instead use "vault.role" when not using the default workload identity. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-28 07:40:02 +00:00
Matt Keeler	833e240597	Upgrade to using hashicorp/go-metrics@v0.5.4 (#24856 ) * Upgrade to using hashicorp/go-metrics@v0.5.4 This also requires bumping the dependencies for: * memberlist * serf * raft * raft-boltdb * (and indirectly hashicorp/mdns due to the memberlist or serf update) Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.	2025-01-31 15:22:00 -05:00
Daniel Bennett	49c147bcd7	dynamic host volumes: change env vars, fixup auto-delete (#24943 ) * plugin env: DHV_HOST_PATH->DHV_VOLUMES_DIR * client config: host_volumes_dir * plugin env: add namespace+nodepool * only auto-delete after error saving client state on initial create	2025-01-27 10:36:53 -06:00
Michael Schurter	63dacd2d6e	update vault token warning from 1.9->1.10 (#24884 ) Fixes #24847	2025-01-17 10:56:06 -08:00
James Rasell	63ea13be77	agent: Ensure logger set up method is public. (#24886 ) This is needed by a Nomad Enterprise code path.	2025-01-17 13:47:06 +00:00
James Rasell	753f752cdd	agent: remove unused log filter and unrequired library. (#24873 ) The Nomad agent used a log filter to ensure logs were written at the expected level. Since the use of hclog this is not required, as hclog acts as the gate keeper and filter for logging. All log writers accept messages from hclog which has already done the filtering.	2025-01-17 07:51:27 +00:00
James Rasell	1ae9785f9b	agent: Fix a bug where all syslog lines are notice when using JSON (#24865 ) The agent syslog write handler was unable to handle JSON log lines correctly, meaning all syslog entries when using JSON log format showed as NOTICE level. This change adds a new handler to the Nomad agent which can parse JSON log lines and correctly understand the expected log level entry. The change also removes the use of a filter from the default log format handler. This is not needed as the logs are fed into the syslog handler via hclog, which is responsible for level filtering.	2025-01-16 07:23:08 +00:00
James Rasell	8d201a82fd	agent: Fixed a bug where syslog error messages marked as notice. (#24820 ) The mapping between Nomad log level identifiers and syslog priorities did not handle the error level string correctly.	2025-01-15 08:02:53 +00:00
Charlie Voiselle	30ab8897d2	deps: Switch from mitchellh/cli to hashicorp/cli (#19321 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2024-12-19 15:41:11 +00:00
Daniel Bennett	46a39560bb	dynamic host volumes: fingerprint client plugins (#24589 )	2024-12-19 09:25:54 -05:00
guifran001	1c44521543	client: Add a preferred address family option for network-interface (#23389 ) to prefer ipv4 or ipv6 when deducing IP from network interface Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-12 15:30:38 -05:00
Tim Gross	54fc146432	agent: add support for sdnotify protocol (#20528 ) Nomad agents expect to receive `SIGHUP` to reload their configuration. The signal handler for this is installed fairly late in agent startup, after the client or server components are up and running. This means that configuration management tools can potentially reload the configuration before the agent can handle it, causing the agent to crash. We don't want to allow configuration reload during client or server component startup, because it would significantly complicate initialization. Instead, we'll implement the systemd notify protocol. This causes systemd to block sending configuration reload signals until the agent is actually ready. Users can still bypass this by sending signals directly. Note that there are several Go libraries that implement the sdnotify protocol, but most are part of much larger projects which would create a lot of dependabot burden. The bits of the protocol we need are extremely simple to implement in a just a couple of functions. For non-Linux or non-systemd Linux systems, this feature is a no-op. In future work we could potentially implement service notification for Windows as well. Fixes: https://github.com/hashicorp/nomad/issues/3885	2024-05-03 13:42:07 -04:00
James Rasell	facc3e8013	agent: allow configuration of in-memory telemetry sink. (#20166 ) This change adds configuration options for setting the in-memory telemetry sink collection and retention durations. This sink backs the metrics JSON API and previously had hard-coded default values. The new options are particularly useful when running development or debug environments, where metrics collection is desired at a fast and granular rate.	2024-03-25 15:00:18 +00:00
Seth Hoenig	05937ab75b	exec2: add client support for unveil filesystem isolation mode (#20115 ) * exec2: add client support for unveil filesystem isolation mode This PR adds support for a new filesystem isolation mode, "Unveil". The mode introduces a "alloc_mounts" directory where tasks have user-owned directory structure which are bind mounts into the real alloc directory structure. This enables a task driver to use landlock (and maybe the real unveil on openbsd one day) to isolate a task to the task owned directory structure, providing sandboxing. * actually create alloc-mounts-dir directory * fix doc strings about alloc mount dir paths	2024-03-13 08:24:17 -05:00
James Rasell	41555b6370	cli: Fix minor help formatting issue in agent command. (#19743 )	2024-01-17 12:18:00 +00:00
Mike Nomitch	31f4296826	Adds support for failures before warning to Consul service checks (#19336 ) Adds support for failures before warning and failures before critical to the automatically created Nomad client and server services in Consul	2023-12-14 11:33:31 -08:00
Luiz Aoqui	099ee06a60	Revert "deps: update go-metrics to v0.5.3 (#19190 )" (#19374 ) * Revert "deps: update go-metrics to v0.5.3 (#19190)" This reverts commit `ddb060d8b3`. * changelog: add entry for #19374	2023-12-08 08:46:55 -05:00
Luiz Aoqui	c624dc2121	config: fix loading Vault token from env var (#19349 ) The `defaultVault` variable is a pointer to the Vault configuration named `default`. Initially, this variable points to the Vault configuration that is used to load CLI flag values, but after those are merged with the default and config file values the pointer reference must be updated before mutating the config with environment variable values.	2023-12-07 11:56:53 -05:00
Luiz Aoqui	27d2ad1baf	cli: add `-dev-consul` and `-dev-vault` agent mode (#19327 ) The `-dev-consul` and `-dev-vault` flags add default identities and configuration to the Nomad agent to connect and use the workload identity integration with Consul and Vault.	2023-12-07 11:51:20 -05:00
Luiz Aoqui	ddb060d8b3	deps: update go-metrics to v0.5.3 (#19190 ) Update `go-metrics` to v0.5.3 to pick https://github.com/hashicorp/go-metrics/pull/146.	2023-11-28 12:37:57 -05:00
Adriano Caloiaro	f66eb83fc0	Add `go-netaddrs` support to `retry_join` (#18745 )	2023-11-15 10:07:18 -05:00
Tim Gross	9d075c44b2	config: remove old Vault/Consul config blocks from parser (#18997 ) Remove the now-unused original configuration blocks for Consul and Vault from the agent configuration parsing. When the agent needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service (or the default cluster for the agent's own use). This is third of three changesets for this work. Fixes: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Ref: https://github.com/hashicorp/nomad/pull/18994	2023-11-08 09:30:08 -05:00
Tim Gross	8f8265fa6d	add deprecation warning for Vault/Consul token usage (#18863 ) Submitting a Consul or Vault token with a job is deprecated in Nomad 1.7 and intended for removal in Nomad 1.9. Add a deprecation warning to the CLI when the user passes in the appropriate flag or environment variable. Nomad agents will no longer need a Vault token when configured with workload identity, and we'll ignore Vault tokens in the agent config after Nomad 1.9. Log a warning at agent startup. Ref: https://github.com/hashicorp/nomad/issues/15617 Ref: https://github.com/hashicorp/nomad/issues/15618	2023-10-26 10:46:02 -04:00
Michael Schurter	a806363f6d	OpenID Configuration Discovery Endpoint (#18691 ) Added the [OIDC Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) `/.well-known/openid-configuration` endpoint to Nomad, but it is only enabled if the `server.oidc_issuer` parameter is set. Documented the parameter, but without a tutorial trying to actually _use_ this will be very hard. I intentionally did not use https://github.com/hashicorp/cap for the OIDC configuration struct because it's built to be a compliant OIDC provider. Nomad is not trying to be compliant initially because compliance to the spec does not guarantee it will actually satisfy the requirements of third parties. I want to avoid the problem where in an attempt to be standards compliant we ship configuration parameters that lock us in to a certain behavior that we end up regretting. I want to add parameters and behaviors as there's a demonstrable need. Users always have the escape hatch of providing their own OIDC configuration endpoint. Nomad just needs to know the Issuer so that the JWTs match the OIDC configuration. There's no reason the actual OIDC configuration JSON couldn't live in S3 and get served directly from there. Unlike JWKS the OIDC configuration should be static, or at least change very rarely. This PR is just the endpoint extracted from #18535. The `RS256` algorithm still needs to be added in hopes of supporting third parties such as [AWS IAM OIDC Provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html). Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-10-20 17:11:41 -07:00
James Rasell	1ffdd576bb	agent: add config option to enable file and line log detail. (#18768 )	2023-10-16 15:59:16 +01:00
Tim Gross	5001bf4547	consul: use constant instead of "default" literal (#18611 ) Use the constant `structs.ConsulDefaultCluster` instead of the string literal "default", as we've done for Vault.	2023-09-28 16:50:21 -04:00
Luiz Aoqui	868aba57bb	vault: update identity name to start with `vault_` (#18591 ) * vault: update identity name to start with `vault_` In the original proposal, workload identities used to derive Vault tokens were expected to be called just `vault`. But in order to support multiple Vault clusters it is necessary to associate identities with specific Vault cluster configuration. This commit implements a new proposal to have Vault identities named as `vault_<cluster>`.	2023-09-27 15:53:28 -03:00
Juana De La Cuesta	124272c050	server: Add reporting option to agent (#18572 ) * func: add reporting option to agent * func: add test for merge and fix comments * Update config_ce.go * Update config_ce.go * Update config_ce.go * fix: add reporting config to default configuration and update to use must over require * Update command/agent/config_parse.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update nomad/structs/config/reporting.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update nomad/structs/config/reporting.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * style: rename license and reporting config * fix: use default function instead of empty struct --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-09-27 00:11:32 +02:00
Tim Gross	3ee6c31241	ACLs: allow/deny/default config for Consul/Vault clusters by namespace (#18425 ) In Nomad Enterprise when multiple Vault/Consul clusters are configured, cluster admins can control access to clusters for jobs via namespace ACLs, similar to how we've done so for node pools. This changeset updates the ACL configuration structs, but doesn't wire them up.	2023-09-08 11:37:20 -04:00
Tim Gross	a8bad048b6	config: parsing support for multiple Consul clusters in agent config (#18255 ) Add the plumbing we need to accept multiple Consul clusters in Nomad agent configuration, to support upcoming Nomad Enterprise features. The `consul` blocks are differentiated by a new `name` field, and if the `name` is omitted it becomes the "default" Consul configuration. All blocks with the same name are merged together, as with the existing behavior. As with the `vault` block, we're still using HCL1 for parsing configuration and the `Decode` method doesn't parse multiple blocks differentiated only by a field name without a label. So we've had to add an extra parsing pass, similar to what we've done for HCL1 jobspecs. This also revealed a subtle bug in the `vault` block handling of extra keys when there are multiple `vault` blocks, which I've fixed here. For now, all existing consumers will use the "default" Consul configuration, so there's no user-facing behavior change in this changeset other than the contents of the agent self API. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-08-18 15:25:16 -04:00
Tim Gross	74b796e6d0	config: parsing support for multiple Vault clusters in agent config (#18224 ) Add the plumbing we need to accept multiple Vault clusters in Nomad agent configuration, to support upcoming Nomad Enterprise features. The `vault` blocks are differentiated by a new `name` field, and if the `name` is omitted it becomes the "default" Vault configuration. All blocks with the same name are merged together, as with the existing behavior. Unfortunately we're still using HCL1 for parsing configuration and the `Decode` method doesn't parse multiple blocks differentiated only by a field name without a label. So we've had to add an extra parsing pass, similar to what we've done for HCL1 jobspecs. For now, all existing consumers will use the "default" Vault configuration, so there's no user-facing behavior change in this changeset other than the contents of the agent self API. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-08-17 14:10:32 -04:00
hashicorp-copywrite[bot]	a9d61ea3fd	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:29 -05:00
Luiz Aoqui	5db9e64cdd	node pool: node pool upsert on multiregion node register (#17503 ) When registering a node with a new node pool in a non-authoritative region we can't create the node pool because this new pool will not be replicated to other regions. This commit modifies the node registration logic to only allow automatic node pool creation in the authoritative region. In non-authoritative regions, the client is registered, but the node pool is not created. The client is kept in the `initialing` status until its node pool is created in the authoritative region and replicated to the client's region.	2023-06-13 11:28:28 -04:00
Dao Thanh Tung	67e39d5d24	Add check for missing `path` in client `host_volume` config (#17393 )	2023-06-05 19:31:19 -04:00
Luiz Aoqui	81f0b359dd	node pools: register a node in a node pool (#17405 )	2023-06-02 17:50:50 -04:00
Bram Vogelaar	89a4930b1d	agent: display node id on start up for servers (#17084 ) Signed-off-by: Bram Vogelaar <bram@attachmentgenie.com>	2023-05-18 11:23:12 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	b08edf385a	agent: add top-level warning if mTLS is not configured (#16800 ) Nomad's security model requires mTLS in order to secure client-to-server and server-to-server communications. Configuring ACLs alone is not enough. Loudly warn the user if mTLS is not configured in non-dev modes.	2023-04-05 14:43:45 -04:00
jmwilkinson	46f3977db2	Allow wildcard datacenters to be specified in job file (#11170 ) Also allows for default value of `datacenters = ["*"]`	2023-02-02 09:57:45 -05:00
Piotr Kazmierczak	949a6f60c7	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Dao Thanh Tung	f89ac80801	agent: Make agent syslog log level inherit from Nomad agent log (#15625 )	2023-01-04 09:38:06 -05:00
Michael Schurter	01648e615a	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Michael Schurter	f998a2b77b	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00

1 2 3 4 5 ...

259 Commits