nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-07 02:45:42 +03:00

Author	SHA1	Message	Date
Aimee Ukasick	d293684d3d	Update rel notes, upgrade links to point to correct previous ver (#25652 )	2025-04-11 10:22:23 -05:00
Tim Gross	27caae2b2a	api: make attempting to remove peer by address a no-op (#25599 ) In Nomad 1.4.0 we removed support for Raft Protocol v2 entirely. But the `Operator.RemoveRaftPeerByAddress` RPC handler was left in place, along with its supporting HTTP API and command line flags. Using this API will always result in the Raft library error "operation not supported with current protocol version". Unfortunately it's still possible in unit tests to exercise this code path, and these tests are quite flaky. This changeset turns the RPC handler and HTTP API into a no-op, removes the associated command line flags, and removes the flaky tests. I've also cleaned up the test for `RemoveRaftPeerByID` to consolidate test servers and use `shoenig/test`. Fixes: https://hashicorp.atlassian.net/browse/NET-12413 Ref: https://github.com/hashicorp/nomad/pull/13467 Ref: https://developer.hashicorp.com/nomad/docs/upgrade/upgrade-specific#raft-protocol-version-2-unsupported Ref: https://github.com/hashicorp/nomad-enterprise/actions/runs/13201513025/job/36855234398?pr=2302	2025-04-10 09:19:25 -04:00
Aimee Ukasick	87aabc9af2	Docs: 1.10 release notes, some factoring, sentinel apply update (#25433 ) * Docs: 1.10 release notes and upgrade factoring * Update based on code review suggestions * add CLI for disabling UI URL hints * fix indentation * nav: list release notes in reverse order fix broken link to v1.6.x docs * Update PKCE section from Daniel's latest PR * update pkce per daniel's suggestion * Add dynamic host volumes governance section from blog	2025-04-09 15:43:58 -07:00
Daniel Bennett	6a0c4f5a3d	auth: oidc: enable pkce only on new auth methods (#25593 ) trying not to violate the principle of least astonishment. we want to only auto-enable PKCE on new auth methods, rather than new or updated auth methods, to avoid a scenario where a Nomad admin updates an auth method sometime in the future -- something innocent like a new client secret -- and their OIDC provider doesn't like PKCE. the main concern is that the provider won't like PKCE in a totally confusing way. error messages rarely say PKCE directly, so why the user's auth method suddenly broke would be a big mystery. this means that to enable it on existing auth methods, you would set `OIDCDisablePKCE = false`, and the double- negative doesn't feel right, so instead, swap the language, so enabling it on existing methods reads sensibly, and to disable it on new methods reads ok-enough: `OIDCEnablePKCE = false`	2025-04-03 10:56:17 -05:00
Aimee Ukasick	9778fa4912	Docs: Fix broken links in main for 1.10 release (#25540 ) * Docs: Fix broken links in main for 1.10 release * Implement Tim's suggestions * Remove link to Portworx from ecosystem page * remove "Portworx" since Portworx 3.2 no longer supports Nomad	2025-04-01 09:09:44 -05:00
Daniel Bennett	8c609ad762	docs: oidc client assertions and pkce (#25375 )	2025-03-20 09:14:17 -05:00
James Rasell	c53ba3e7d1	consul: Remove implicit workload identity when task has a template. (#25298 ) When a task included a template block, Nomad was adding a Consul identity by default which allowed the template to use Consul API template functions even when they were not needed or desired. This change removes the implict addition of Consul identities to tasks when they include a template block. Job specification authors will now need to add a Consul identity or Consul block to their task if they have a template which uses Consul API functions. This change also removes the default addition of a Consul block to all task groups registered and processed by the API package.	2025-03-10 13:49:50 +00:00
Michael Smithhisler	5c4d0e923d	consul: Remove legacy token based authentication workflow (#25217 )	2025-03-05 15:38:11 -05:00
Michael Smithhisler	f2b761f17c	disconnected: removes deprecated disconnect fields (#25284 ) The group level fields stop_after_client_disconnect, max_client_disconnect, and prevent_reschedule_on_lost were deprecated in Nomad 1.8 and replaced by field in the disconnect block. This change removes any logic related to those deprecated fields. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-05 14:46:02 -05:00
James Rasell	7268053174	vault: Remove legacy token based authentication workflow. (#25155 ) The legacy workflow for Vault whereby servers were configured using a token to provide authentication to the Vault API has now been removed. This change also removes the workflow where servers were responsible for deriving Vault tokens for Nomad clients. The deprecated Vault config options used byi the Nomad agent have all been removed except for "token" which is still in use by the Vault Transit keyring implementation. Job specification authors can no longer use the "vault.policies" parameter and should instead use "vault.role" when not using the default workload identity. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-28 07:40:02 +00:00
Paweł Bęza	43885f6854	Allow for in-place update when affinity or spread was changed (#25109 ) Similarly to #6732 it removes checking affinity and spread for inplace update. Both affinity and spread should be as soft preference for Nomad scheduler rather than strict constraint. Therefore modifying them should not trigger job reallocation. Fixes #25070 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-02-14 14:33:18 -05:00
stswidwinski	871585ee90	18529 nomad executes any file in plugins (#18530 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2025-02-10 16:08:22 +00:00
Aimee Ukasick	d9bb241b43	Docs SEO: Update runtime, networking, Nomad vs K8s, Nomad Enterprise, upgrading, release notes, and sectionless pages (#24764 ) * Docs SEO: Updates CE-781,782,785,788 * CE-791 single pages * CE-786 enterprise section * CE-789 release notes * fix content-check error * Update description and add intro body paragraph when appropriate * fix typo * Apply suggestions from Jeff's code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2025-02-03 10:03:36 -06:00
Michael Smithhisler	47c14ddf28	remove remote task execution code (#24909 )	2025-01-29 08:08:34 -05:00
Tim Gross	3a11a0b1e1	quotas: refactor storage limit specification (#24785 ) In anticipation of having quotas for dynamic host volumes, we want the user experience of the storage limits to feel integrated with the other resource limits. This is currently prevented by reusing the `Resources` type instead of having a specific type for `QuotaResources`. Update the quota limit/usage types to use a `QuotaResources` that includes a new storage resources quota block. The wire format for the two types are compatible such that we can migrate the existing variables limit in the FSM. Also fixes improper parallelism in the quota init test where we change working directory to avoid file write conflicts but this breaks when multiple tests are executed in the same process. Ref: https://github.com/hashicorp/nomad-enterprise/pull/2096	2025-01-13 09:25:00 -05:00
Tim Gross	08a6f870ad	cni: use check command when restoring from restart (#24658 ) When the Nomad client restarts and restores allocations, the network namespace for an allocation may exist but no longer be correctly configured. For example, if the host is rebooted and the task was a Docker task using a pause container, the network namespace may be recreated by the docker daemon. When we restore an allocation, use the CNI "check" command to verify that any existing network namespace matches the expected configuration. This requires CNI plugins of at least version 1.2.0 to avoid a bug in older plugin versions that would cause the check to fail. If the check fails, destroy the network namespace and try to recreate it from scratch once. If that fails in the second pass, fail the restore so that the allocation can be recreated (rather than silently having networking fail). This should fix the gap left #24650 for Docker task drivers and any other drivers with the `MustInitiateNetwork` capability. Fixes: https://github.com/hashicorp/nomad/issues/24292 Ref: https://github.com/hashicorp/nomad/pull/24650	2025-01-07 09:38:39 -05:00
Piotr Kazmierczak	f7a4ded2c0	security: add CT executeTemplate to default function_denylist (#24541 ) This PR adds Consul Template's executeTemplate function to the denylist by default, in order to prevent accidental or malicious infinitely recursive execution. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-11-22 19:33:56 +01:00
Piotr Kazmierczak	368241dbf2	security: a more comprehensive env.denylist (#24540 ) A more comprehensive env.denylist that now includes more token, token file and license variables. --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-11-22 18:54:18 +01:00
Michael Schurter	8dd570d6ca	docs: upgrade docs should point at real version (#24438 ) Let users know what happened to 1.9.2 but label the gc change as the first working release (1.9.3).	2024-11-12 11:05:27 -08:00
Piotr Kazmierczak	f7847c6e5b	state: remove TimeTable and rely on objects' modify times instead (#24112 ) Core scheduler relies on a special table in the state store—the TimeTable—to figure out which objects can be GC'd. The TimeTable correlates Raft indices with objects insertion time, a solution we used before most of the objects we store in the state contained timestamps. This introduced a bit of a memory overhead and complexity, but most importantly meant that any GC threshold users set greater than timeTableLimit = 72 * time.Hour was ignored. This PR removes the TimeTable and relies on object timestamps to determine whether they could be GCd or not.	2024-11-01 19:38:04 +01:00
R.B. Boyer	4e8f596311	docs: update broken consul acl token links (#24287 )	2024-10-23 13:34:21 -04:00
Michael Schurter	34cb05d297	docs: explain how to use dots in docker labels (#24074 ) Nomad v1.9.0 (finally!) removes support for HCL1 and the `-hcl1` flag. See #23912 for details. One of the uses of HCL1 over HCL2 was that HCL1 allowed quoted keys in blocks such as env, meta, and Docker's labels: ```hcl some_block { "foo.bar" = "baz" } ``` This works in HCL1 but is invalid HCL2. In HCL2 you must use a map instead of a block: ```hcl some_map = { "eggs.spam" = "works!" } ``` This was such a hassle for users we special cased the `env` and `meta` blocks to be accepted as blocks or maps in #9936. However Docker `labels`, being a task config option, is much harder to special case and commonly needs dots-in-keys for things like DataDog autodiscovery via Docker container labels: https://docs.datadoghq.com/containers/docker/integrations/?tab=labels Luckily `labels` can be specified as a list-of-maps instead: ```hcl labels = [ { "com.datadoghq.ad.check_names" = "[\"openmetrics\"]" "com.datadoghq.ad.init_configs" = "[{}]" } ] ``` So instead of adding more awkward hcl1/2 backward compat code to Nomad, I just updated the docs to hopefully help people hit by this. The only other known workaround is dropping HCL in favor of JSON jobspecs altogether, but that forces a huge migration and maintenance burden on users: https://discuss.hashicorp.com/t/docker-based-autodiscovery-with-datadog-how-can-we-make-it-work/18870	2024-09-27 10:02:50 -07:00
Tim Gross	a3a2028837	docs: update key management docs for keyring-in-Raft (#24026 ) In #23977 we moved the keyring into Raft. This changeset documents the operational changes and adds notes to the upgrade guide.	2024-09-25 10:48:14 -04:00
Tim Gross	192d70cee7	docker: update infra_image to new registry (#23927 ) The gcr.io container registry is shutting down in March. Update the default `image_image` for Docker's "pause" containers to point to the new location hosted by the k8s project. Fixes: https://github.com/hashicorp/nomad/issues/23911 Ref: https://hashicorp.atlassian.net/browse/NET-10942	2024-09-06 14:34:03 -04:00
Tim Gross	06f5fbc5d6	auth: enforce use of node secret and remove legacy auth (#23838 ) As of Nomad 1.6.0, Nomad client agents send their secret with all the RPCs (other than registration). But for backwards compatibility we had to keep a legacy auth method that didn't require the node secret. We've previously announced that this legacy auth method would be removed and that nodes older than 1.6.0 would not be supported with Nomad 1.9.0. This changeset removes the legacy auth method. Ref: https://developer.hashicorp.com/nomad/docs/release-notes/nomad/upcoming#nomad-1-9-0	2024-09-05 14:24:28 -04:00
Tim Gross	a9beef7edd	jobspec: remove HCL1 support (#23912 ) This changeset removes support for parsing jobspecs via the long-deprecated HCLv1. Fixes: https://github.com/hashicorp/nomad/issues/20195 Ref: https://hashicorp.atlassian.net/browse/NET-10220	2024-09-05 09:02:45 -04:00
Austin Culter	ce3e159ee8	docs: update upgrade-specific.mdx (#23906 )	2024-09-04 08:42:27 -04:00
Tim Gross	d5ca07a247	docs: notices of upcoming deprecations and backports (#23683 ) Add a section to the docs describing planned upcoming deprecations and removals. Also added some missing upgrade guide sections missed during the last release.	2024-07-25 10:20:18 -04:00
Tim Gross	2f4353412d	keyring: support prepublishing keys (#23577 ) When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed. Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their `PublishTime`. Update the periodic key rotation to prepublish keys at half the `root_key_rotation_threshold` window, and promote prepublished keys to active after the `PublishTime`. This changeset also fixes two bugs in periodic root key rotation and garbage collection, both of which can't be safely fixed without implementing prepublishing: * Periodic root key rotation would never happen because the default `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM time table. We now compare the `CreateTime` against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref https://github.com/hashicorp/nomad/issues/16359) * Root key garbage collection could GC keys that were used to sign identities. We now wait until `root_key_rotation_threshold` + `root_key_gc_threshold` before GC'ing a key. * When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete. Ref: https://hashicorp.atlassian.net/browse/NET-10398 Ref: https://hashicorp.atlassian.net/browse/NET-10280 Fixes: https://github.com/hashicorp/nomad/issues/19669 Fixes: https://github.com/hashicorp/nomad/issues/23528 Fixes: https://github.com/hashicorp/nomad/issues/19368	2024-07-19 13:29:41 -04:00
Piotr Kazmierczak	d5e1515e80	docker: default to hyper-v isolation on Windows (#23452 )	2024-07-01 08:56:43 +02:00
Piotr Kazmierczak	863d42bc4b	docs: upgrade guide updates for backported Docker windows changes (#23453 ) Upgrade guide should be uniform across all supported versions, otherwise backporting breaking changes is tedious.	2024-06-27 19:35:56 +02:00
Piotr Kazmierczak	0ece7b5c16	docker: validate that containers do not run as ContainerAdmin on Windows (#23443 ) This enables checks for ContainerAdmin user on docker images on Windows. It's only checked if users run docker with process isolation and not hyper-v, because hyper-v provides its own, proper sandboxing. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-06-27 16:22:24 +02:00
Charlie Voiselle	07516c8159	[docs] Add Sentinel info to version-specific upgrade page (#23173 ) The upgrade to sentinel v0.26 is a breaking change, requiring users of custom Sentinel plugins to rebuild them using sentinel-sdk v4	2024-06-26 10:46:38 -04:00
Seth Hoenig	6ad648bec8	networking: Inject implicit constraints on CNI plugins when using bridge mode (#15473 ) This PR adds a job mutator which injects constraints on the job taskgroups that make use of bridge networking. Creating a bridge network makes use of the CNI plugins: bridge, firewall, host-local, loopback, and portmap. Starting with Nomad 1.5 these plugins are fingerprinted on each node, and as such we can ensure jobs are correctly scheduled only on nodes where they are available, when needed.	2024-03-27 16:11:39 -04:00
Juana De La Cuesta	56bf253474	Add docs for disconnected block (#20147 ) Expand the job settings to include the disconnect block and set as deprecated the fields that will be replaced by it.	2024-03-20 10:08:16 +01:00
Michael Schurter	3193ac204f	docs: skipping a major release is fine (#20075 ) Nomad has always placed an extremely high priority on backward compatibility. We have always aimed to support N-2 major releases and usually gone above and beyond that. The new https://www.hashicorp.com/long-term-support policy also mentions that N-2 is what we have always supported, so it's probably time for our docs to reflect that reality.	2024-03-06 08:57:12 -08:00
Seth Hoenig	9410c519ff	drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration (#19599 ) * drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration * fix tests	2024-01-11 08:20:15 -06:00
Seth Hoenig	4b3ee77d6b	docs: update raw_exec driver docs and 1.7 upgrade notes (#19598 )	2024-01-04 08:26:46 -06:00
Etienne Bruines	f18d5c7c32	docs: fix migration to workload identity links (#19508 ) Fixes #19507	2023-12-18 21:27:38 -05:00
Tim Gross	0e42569ffb	docs: note that 1.7.2 yanks 1.7.0-1.7.1 due to CPU fingeprint bug (#19474 )	2023-12-14 11:32:13 -05:00
Tim Gross	ad9520c240	docs: add warning not to use 1.7.0 (#19399 ) Nomad 1.7.0 should be considered "yanked". Add a note about this to the upgrade guide.	2023-12-08 15:19:27 -05:00
Seth Hoenig	39eb17f3ec	docs: describe the need for dmidecode in docs (#19348 )	2023-12-08 10:45:37 -06:00
Luiz Aoqui	e0cea41e37	client: deprecate loading plugins without config (#19189 ) Nomad load all plugins from `plugin_dir` regardless if it is listed in the agent configuration file. This can cause unexpected binaries to be executed. This commit begins the deprecation process of this behaviour. The Nomad agent will emit a warning log for every plugin binary found without a corresponding agent configuration block. --------- Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-11-27 21:36:42 -05:00
Charlie Voiselle	659c0945fc	[core] Honor job's namespace when checking `distinct_hosts` feasibility (#19004 ) * Update distinct_host feasibility checking to honor the job's namespace. Fixes #9792 * Added test to verify original condition and that fix resolved it. * Added documentation	2023-11-17 11:25:10 -05:00
Seth Hoenig	61e21db2b4	docs: add 1.7 cpu upgrade notes and tweak cpu concepts doc (#18977 ) * docs: add 1.7 cpu upgrade notes and tweak cpu concepts doc * docs: fix spelling	2023-11-02 09:58:16 -05:00
Michael Schurter	0b0ae40199	docs: recommend rotating keys on upgrade (#18958 ) RIP EdDSA.	2023-11-01 10:57:33 -07:00
Tim Gross	ea3e711fa6	docs: upgrade guide for integrations deprecation warnings (#18928 ) The Consul and Vault integrations work shipping in Nomad 1.7 will deprecated the existing token-based workflows. These will be removed in Nomad 1.9, so add a note describing this to the upgrade guide.	2023-10-31 13:21:47 -04:00
James Rasell	b44cef0e66	docs: make upgrade version detail clearer. (#18608 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-09-29 08:31:14 +01:00
Luiz Aoqui	54c45ed106	acl: fix parsing of policies with blocks w/o label An ACL policy with a block without label generates unexpected results. For example, a policy such as this: ``` namespace { policy = "read" } ``` Is applied to a namespace called `policy` instead of the documented behaviour of applying it to the `default` namespace. This happens because of the way HCL1 decodes blocks. Since it doesn't know if a block is expected to have a label it applies the `key` tag to the content of the block and, in the example above, the first key is `policy`, so it sets that as the `namespace` block label. Since this happens internally in the HCL decoder it's not possible to detect the problem externally. Fixing the problem inside the decoder is challenging because the JSON and HCL parsers generate different ASTs that makes impossible to differentiate between a JSON tree from an invalid HCL tree within the decoder. The fix in this commit consists of manually parsing the policy after decoding to clear labels that were not set in the file. This allows the validation rules to consistently catch and return any errors, no matter if the policy is an invalid HCL or JSON.	2023-07-19 10:38:08 -04:00
Michael Schurter	5169950562	docs: v1.6.0 requires ipc_lock cap for mlock (#17881 ) Fixes #17780	2023-07-10 11:53:07 -07:00

1 2 3

139 Commits