nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 00:45:43 +03:00

Author	SHA1	Message	Date
James Rasell	216140255d	cli: Do not always add global DNS name to certificate DNS names. (#26086 ) No matter the passed region identifier, the CLI was always adding "<role>.global.nomad" to the certificate DNS names. This is not what we expect and has been removed. While here, the long deprecated cluster-region flag has been removed. This removal only impacts CLI functionality, so is safe to do.	2025-06-25 07:35:56 +01:00
Piotr Kazmierczak	199d12865f	scheduler: isolate `feasibility` (#26031 ) This change isolates all the code that deals with node selection in the scheduler into its own package called feasible. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-11 20:11:04 +02:00
James Rasell	428f329cab	rpc: Fix data race in yamux config modification for conn handling. (#25978 ) The server RPC handler and RPC connection pool both use a shared configuration object for custom yamux configuration. Both sub-systems were modifying the shared object which could cause a data race. The passed object is now cloned before being modified. This changes also moves where the yamux configuration is cloned and modified to the relevant constructor function. This avoids performing a clone per connection handle or per new connection generated in the RPC pool.	2025-06-05 08:05:46 +01:00
Daniel Bennett	15c01e5a49	ipv6: normalize addrs per RFC-5942 §4 (#25921 ) https://datatracker.ietf.org/doc/html/rfc5952#section-4 * copy NormalizeAddr func from vault * PRs hashicorp/vault#29228 & hashicorp/vault#29517 * normalize bind/advertise addrs * normalize consul/vault addrs	2025-05-22 14:21:30 -04:00
James Rasell	21fd0bbb8a	ci: Regenerate TLS certificates used for testing. (#25804 )	2025-05-02 13:51:02 +01:00
James Rasell	85c30dfd1e	test: Remove use of "mitchellh/go-testing-interface" for stdlib. (#25640 ) The stdlib testing package now includes this interface, so we can remove our dependency on the external library.	2025-04-14 07:43:49 +01:00
Nikita Eliseev	76fb3eb9a1	rpc: added configuration for yamux session (#25466 ) Fixes: https://github.com/hashicorp/nomad/issues/25380	2025-04-02 10:58:23 -04:00
Allison Larson	d1d8945d2e	Add docker plugin config option image_pull_timeout value for default timeout (#25489 ) * Add docker plugin config image_pull_timeout value for default timeout * Add image_pull_timeout docker plugin config to docs * Add changelog	2025-03-24 13:03:14 -07:00
Michael Smithhisler	5c4d0e923d	consul: Remove legacy token based authentication workflow (#25217 )	2025-03-05 15:38:11 -05:00
James Rasell	7268053174	vault: Remove legacy token based authentication workflow. (#25155 ) The legacy workflow for Vault whereby servers were configured using a token to provide authentication to the Vault API has now been removed. This change also removes the workflow where servers were responsible for deriving Vault tokens for Nomad clients. The deprecated Vault config options used byi the Nomad agent have all been removed except for "token" which is still in use by the Vault Transit keyring implementation. Job specification authors can no longer use the "vault.policies" parameter and should instead use "vault.role" when not using the default workload identity. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-28 07:40:02 +00:00
Tim Gross	716df52788	CNI: migrate from persistent state to ephemeral state during restart (#25093 ) In #24650 we switched to using ephemeral state for CNI plugins, so that when a host reboots and we lose all the allocations we don't end up trying to use IPs we created in network namespaces we just destroyed. Unfortunately upgrade testing missed that in a non-reboot scenario, the existing CNI state was being used by plugins like the ipam plugin to hand out the "next available" IP address. So with no state carried over, we might allocate new addresses that conflict with existing allocations. (This can be avoided by draining the node first.) As a compatibility shim, copy the old CNI state directory to the new CNI state directory during agent startup, if the new CNI state directory doesn't already exist. Ref: https://github.com/hashicorp/nomad/pull/24650	2025-02-12 09:25:50 -05:00
stswidwinski	871585ee90	18529 nomad executes any file in plugins (#18530 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2025-02-10 16:08:22 +00:00
Piotr Kazmierczak	611452e1af	stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs (#24993 ) We introduce an alternative solution to the one presented in #24960 which is based on the state store and not previous-next allocation tracking in the reconciler. This new solution reduces cognitive complexity of the scheduler code at the cost of slightly more boilerplate code, but also opens up new possibilities in the future, e.g., allowing users to explicitly "un-stick" volumes with workloads still running. The diagram below illustrates the new logic: SetVolumes() upsertAllocsImpl() sets ns, job +-----------------checks if alloc requests tg in the scheduler v sticky vols and consults \| +-----------------------+ state. If there is no claim, \| \| TaskGroupVolumeClaim: \| it creates one. \| \| - namespace \| \| \| - jobID \| \| \| - tg name \| \| \| - vol ID \| v \| uniquely identify vol \| hasVolumes() +----+------------------+ consults the state \| ^ and returns true \| \| DeleteJobTxn() if there's a match <-----------+ +---------------removes the claim from or if there is no the state previous claim \| \| \| \| +-----------------------------+ +------------------------------------------------------+ scheduler state store	2025-02-07 17:41:01 +01:00
Matt Keeler	833e240597	Upgrade to using hashicorp/go-metrics@v0.5.4 (#24856 ) * Upgrade to using hashicorp/go-metrics@v0.5.4 This also requires bumping the dependencies for: * memberlist * serf * raft * raft-boltdb * (and indirectly hashicorp/mdns due to the memberlist or serf update) Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.	2025-01-31 15:22:00 -05:00
Daniel Bennett	46a39560bb	dynamic host volumes: fingerprint client plugins (#24589 )	2024-12-19 09:25:54 -05:00
Tim Gross	c3735127ae	allow FlattenMultierror to accept standard error	2024-12-19 09:25:54 -05:00
Tim Gross	6a3803c31e	dynamic host volumes: RPC handlers (#24373 ) This changeset implements the RPC handlers for Dynamic Host Volumes, including the plumbing needed to forward requests to clients. The client-side implementation is stubbed and will be done under a separate PR. Ref: https://hashicorp.atlassian.net/browse/NET-11549	2024-12-19 09:25:54 -05:00
Juana De La Cuesta	a9e7166b6b	[gh-24339] Move from streaming stats to polling for docker (#24525 ) * fix: dont stream the docker stats, read them one by one * func: add a NewSafeTicker to the herlper functions * style: remove commented code	2024-11-21 17:36:53 +01:00
Tim Gross	a7f2cb879e	command line tools for redacting keyring from snapshots (#24023 ) In #23977 we moved the keyring into Raft, which can expose key material in Raft snapshots when using the less-secure AEAD keyring instead of KMS. This changeset adds tools for redacting this material from snapshots: * The `operator snapshot state` command gains the ability to display key metadata (only), which respects the `-filter` option. * The `operator snapshot save` command gains a `-redact` option that removes key material from the snapshot after it's downloaded. * A new `operator snapshot redact` command allows removing key material from an existing snapshot.	2024-09-20 15:30:14 -04:00
Tim Gross	44f4970372	keyring in raft (#23977 ) In Nomad 1.4, we implemented a root keyring to support encrypting Variables and signing Workload Identities. The keyring was originally stored with the AEAD-wrapped DEKs and the KEK together in a JSON keystore file on disk. We recently added support for using an external KMS for the KEK to improve the security model for the keyring. But we've encountered multiple instances of the keystore files not getting backed up separately from the Raft snapshot, resulting in failure to restore clusters from backup. Move Nomad's root keyring into Raft (encrypted with a KMS/Vault where available) in order to eliminate operational problems with the separate on-disk keystore. Fixes: https://github.com/hashicorp/nomad/issues/23665 Ref: https://hashicorp.atlassian.net/browse/NET-10523	2024-09-19 13:56:42 -04:00
Seth Hoenig	51215bf102	deps: update to go-set/v3 and refactor to use custom iterators (#23971 ) * deps: update to go-set/v3 * deps: use custom set iterators for looping	2024-09-16 13:40:10 -05:00
Tim Gross	b25f1b66ce	resources: allow job authors to configure size of secrets tmpfs (#23696 ) On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks need larger space for downloading large secrets. This is especially the case for tasks using `templates`, which need extra room to write a temporary file to the secrets directory that gets renamed to the old file atomically. This changeset allows increasing the size of the tmpfs in the `resources` block. Because this is a memory resource, we need to include it in the memory we allocate for scheduling purposes. The task is already prevented from using more memory in the tmpfs than the `resources.memory` field allows, but can bypass that limit by writing to the tmpfs via `template` or `artifact` blocks. Therefore, we need to account for the size of the tmpfs in the allocation resources. Simply adding it to the memory needed when we create the allocation allows it to be accounted for in all downstream consumers, and then we'll subtract that amount from the memory resources just before configuring the task driver. For backwards compatibility, the default value of 1MiB is "free" and ignored by the scheduler. Otherwise we'd be increasing the allocated resources for every existing alloc, which could cause problems across upgrades. If a user explicitly sets `resources.secrets = 1` it will no longer be free. Fixes: https://github.com/hashicorp/nomad/issues/2481 Ref: https://hashicorp.atlassian.net/browse/NET-10070	2024-08-05 16:06:58 -04:00
Tim Gross	9d4686c0df	tls: remove deprecated `prefer_server_cipher_suites` field (#23712 ) The TLS configuration object includes a deprecated `prefer_server_cipher_suites` field. In version of Go prior to 1.17, this property controlled whether a TLS connection would use the cipher suites preferred by the server or by the client. This field is ignored as of 1.17 and, according to the `crypto/tls` docs: "Servers now select the best mutually supported cipher suite based on logic that takes into account inferred client hardware, server hardware, and security." This property has been long-deprecated and leaving it in place may lead to false assumptions about how cipher suites are negotiated in connection to a server. So we want to remove it in Nomad 1.9.0. Fixes: https://github.com/hashicorp/nomad-enterprise/issues/999 Ref: https://hashicorp.atlassian.net/browse/NET-10531	2024-08-01 08:52:05 -04:00
Tim Gross	2ee6043cab	tls: support setting min version to TLS1.3 (#23713 ) Nomad already supports TLS1.3, but not as a minimum version configuration. Update our config validation to allow setting `tls_min_version` to 1.3. Update the documentation to match Vault and warn that the `tls_cipher_suites` field is ignored when TLS is 1.3 Fixes: https://github.com/hashicorp/nomad/issues/20131 Ref: https://hashicorp.atlassian.net/browse/NET-10530	2024-08-01 08:46:32 -04:00
Deniz Onur Duzgun	c82dd76a1b	security: update tls cipher suites (#23551 )	2024-07-11 14:01:45 -04:00
Piotr Kazmierczak	cc01c09f8b	windows: remove winappcontainer and winexec helpers (#23448 ) This removes helper winappcontainer and winexec helper code, since it is no longer needed after #23432	2024-06-28 18:49:56 +02:00
Deniz Onur Duzgun	1cc99cc1b4	bug: resolve type conversion alerts (#20553 )	2024-05-15 13:22:10 -04:00
James Rasell	5041460043	core: do not create evaluations within batch deregister endpoint. (#20510 ) The batch deregister RPC endpoint is only used by the internal garbage collection process, it is not exposed via the HTTP API or used anywhere else. The GC process ensures that a job can only be removed from state if all related evaluations and allocations are in a state that means they can also be removed from state. This means that we do not need to create evaluations when jobs are being deregistered via this endpoint.	2024-05-07 07:39:13 +01:00
James Rasell	3f866a7e82	test: regenerate test TLS certificates. (#20511 )	2024-05-02 13:58:32 +01:00
Daniel Bennett	ca1860ae76	state: enable more reverse sorting (#20410 ) * mainly jobs endpoint * update call sites * add new sort helpers * put sorting in a separate file	2024-04-16 15:10:11 -05:00
Seth Hoenig	ae6c4c8e3f	deps: purge use of old x/exp packages (#20373 )	2024-04-12 08:29:00 -05:00
Tim Gross	76009d89af	tproxy: networking hook changes (#20183 ) When `transparent_proxy` block is present and the network mode is `bridge`, use a different CNI configuration that includes the `consul-cni` plugin. Before invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the allocation. This includes: * Use all the `transparent_proxy` block fields * The reserved ports are added to the inbound exclusion list so the alloc is reachable from outside the mesh * The `expose` blocks and `check` blocks with `expose=true` are added to the inbound exclusion list so health checks work. The `iptables.Config` is then passed as a CNI argument to the `consul-cni` plugin. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-04-04 17:01:07 -04:00
Seth Hoenig	bd2a809135	subproc: lazy lookup nomad binary in self call (#20231 )	2024-03-26 12:33:06 -05:00
Seth Hoenig	77889a16fb	exec2: more tweaks to driver harness (#20221 ) Also add an explicit exit code to subproc package for when a child process is instructed to run an unrunnable command (i.e. cannot be found or is not executable) - with the 127 return code folks using bash are familiar with	2024-03-26 08:02:41 -05:00
Michael Schurter	23e4b7c9d2	Upgrade go-msgpack to v2 (#20173 ) Replaces #18812 Upgraded with: ``` find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';' find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';' go get go get -v -u github.com/hashicorp/raft-boltdb/v2 go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80 ``` see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0 for details	2024-03-21 11:44:23 -07:00
Seth Hoenig	286dce7a2a	exec2: add a client.users configuration block (#20093 ) * exec: add a client.users configuration block For now just add min/max dynamic user values; soon we can also absorb the "user.denylist" and "user.checked_drivers" options from the deprecated client.options map. * give the no-op pool implementation a better name * use explicit error types to make referencing them cleaner in tests * use import alias to not shadow package name	2024-03-08 16:02:32 -06:00
Seth Hoenig	67554b8f91	exec2: implement dynamic workload users taskrunner hook (#20069 ) * exec2: implement dynamic workload users taskrunner hook This PR impelements a TR hook for allocating dynamic workload users from a pool managed by the Nomad client. This adds a new task driver Capability, DynamicWorkloadUsers - which a task driver must indicate in order to make use of this feature. The client config plumbing is coming in a followup PR - in the RFC we realized having a client.users block would be nice to have, with some additional unrelated options being moved from the deprecated client.options config. * learn to spell	2024-03-06 09:34:27 -06:00
Seth Hoenig	57bd39061b	exec2: implement a dynamic users pool (#20065 ) * exec2: implement a dynamic users pool This PR adds an implementation of a Pool from which dynamic users can be allocated on behalf of tasks making use of an upcoming feature of Nomad client (dynamic users). A task hook and client plumbing, etc. will be in follow up PRs. * no need for randomness assertion	2024-03-05 07:35:20 -06:00
James Rasell	4b46ff8ce0	test: fix test datarace within helper broker. (#19974 )	2024-02-15 08:54:56 +00:00
Tim Gross	df86503349	template: sandbox template rendering The Nomad client renders templates in the same privileged process used for most other client operations. During internal testing, we discovered that a malicious task can create a symlink that can cause template rendering to read and write to arbitrary files outside the allocation sandbox. Because the Nomad agent can be restarted without restarting tasks, we can't simply check that the path is safe at the time we write without encountering a time-of-check/time-of-use race. To protect Nomad client hosts from this attack, we'll now read and write templates in a subprocess: * On Linux/Unix, this subprocess is sandboxed via chroot to the allocation directory. This requires that Nomad is running as a privileged process. A non-root Nomad agent will warn that it cannot sandbox the template renderer. * On Windows, this process is sandboxed via a Windows AppContainer which has been granted access to only to the allocation directory. This does not require special privileges on Windows. (Creating symlinks in the first place can be prevented by running workloads as non-Administrator or non-ContainerAdministrator users.) Both sandboxes cause encountered symlinks to be evaluated in the context of the sandbox, which will result in a "file not found" or "access denied" error, depending on the platform. This change will also require an update to Consul-Template to allow callers to inject a custom `ReaderFunc` and `RenderFunc`. This design is intended as a workaround to allow us to fix this bug without creating backwards compatibility issues for running tasks. A future version of Nomad may introduce a read-only mount specifically for templates and artifacts so that tasks cannot write into the same location that the Nomad agent is. Fixes: https://github.com/hashicorp/nomad/issues/19888 Fixes: CVE-2024-1329	2024-02-08 10:40:24 -05:00
Tim Gross	0d3cd1427f	migration: check symlink sources during archive unpack During allocation directory migration, the client was not checking that any symlinks in the archive aren't pointing to somewhere outside the allocation directory. While task driver sandboxing will protect against processes inside the task from reading/writing thru the symlink, this doesn't protect against the client itself from performing unintended operations outside the sandbox. This changeset includes two changes: * Update the archive unpacking to check the source of symlinks and require that they fall within the sandbox. * Fix a bug in the symlink check where it was using `filepath.Rel` which doesn't work for paths in the sibling directories of the sandbox directory. This bug doesn't appear to be exploitable but caused errors in testing. Fixes: https://github.com/hashicorp/nomad/issues/19887	2024-02-08 10:40:24 -05:00
Luiz Aoqui	ce710d49fd	cli: fix `tls ca create` command with `-domain` (#19892 ) The current implementation of the `nomad tls ca create` command ovierrides the value of the `-domain` flag with `"nomad"` if no additional customization is provided. This results in a certificate for the wrong domain or an error if the `-name-constraint` flag is also used. THe logic for `IsCustom()` also seemed reversed. If all custom fields are empty then the certificate is _not_ customized, so `IsCustom()` should return false.	2024-02-07 16:40:51 -05:00
Seth Hoenig	b50b81e488	users: refactor method for getting UID from username (#19840 ) This PR refactors a helper function for getting the UID associated with a given username to also return the GID and home directory. Also adds unit tests on the known values of root and nobody user on Ubuntu Linux.	2024-01-29 13:56:30 -06:00
Luiz Aoqui	41277f823f	license: fix some imports of BUSL-1.1 in MPL-2.0 (#19832 ) Some packages licensed under MPL-2.0 were incorrectly importing code from packages licensed under BUSL-1.1. Not all imports are fixed here as they will require additional work to untangle them. To help track progress this commit adds a Semgrep rule that detects incorrect BUSL-1.1 imports in MPL-2.0 packages.	2024-01-29 12:04:12 -05:00
James Rasell	ff2d0d6453	cli: Fix dummy FSM create to ensure snapshot state command works. (#19630 ) The Nomad state store function was recently updated to validate certain parameters, fixing a panic condition. This change meant dummy FSM used for the snapshot state command was always failing this validation and the command no longer worked. This change adds the required parameter to pass validation and therefore makes the CLI command functional again.	2024-01-05 16:00:24 +00:00
Marvin Chin	be8575a8a2	Fix server shutdown not waiting for worker run completion (#19560 ) * Move group into a separate helper module for reuse * Add shutdownCh to worker The shutdown channel is used to signal that worker has stopped. * Make server shutdown block on workers' shutdownCh * Fix waiting for eval broker state change blocking indefinitely There was a race condition in the GenericNotifier between the Run and WaitForChange functions, where WaitForChange blocks trying to write to a full unsubscribeCh, but the Run function never reads from the unsubscribeCh as it has already stopped. This commit fixes it by unblocking if the notifier has been stopped. * Bound the amount of time server shutdown waits on worker completion * Fix lostcancel linter error * Fix worker test using unexpected worker constructor * Add changelog --------- Co-authored-by: Marvin Chin <marvinchin@users.noreply.github.com>	2024-01-05 08:45:07 -06:00
Luiz Aoqui	e0cea41e37	client: deprecate loading plugins without config (#19189 ) Nomad load all plugins from `plugin_dir` regardless if it is listed in the agent configuration file. This can cause unexpected binaries to be executed. This commit begins the deprecation process of this behaviour. The Nomad agent will emit a warning log for every plugin binary found without a corresponding agent configuration block. --------- Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-11-27 21:36:42 -05:00
Tim Gross	9d075c44b2	config: remove old Vault/Consul config blocks from parser (#18997 ) Remove the now-unused original configuration blocks for Consul and Vault from the agent configuration parsing. When the agent needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service (or the default cluster for the agent's own use). This is third of three changesets for this work. Fixes: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Ref: https://github.com/hashicorp/nomad/pull/18994	2023-11-08 09:30:08 -05:00
Kerim Satirli	5e1bbf90fc	docs: update all URLs to `developer.hashicorp.com` (#16247 )	2023-10-24 11:00:11 -04:00
Seth Hoenig	e3c8700ded	deps: upgrade to go-set/v2 (#18638 ) No functional changes, just cleaning up deprecated usages that are removed in v2 and replace one call of .Slice with .ForEach to avoid making the intermediate copy.	2023-10-05 11:56:17 -05:00

1 2 3 4 5 ...

417 Commits