nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 01:15:43 +03:00

Author	SHA1	Message	Date
Tim Gross	08a6f870ad	cni: use check command when restoring from restart (#24658 ) When the Nomad client restarts and restores allocations, the network namespace for an allocation may exist but no longer be correctly configured. For example, if the host is rebooted and the task was a Docker task using a pause container, the network namespace may be recreated by the docker daemon. When we restore an allocation, use the CNI "check" command to verify that any existing network namespace matches the expected configuration. This requires CNI plugins of at least version 1.2.0 to avoid a bug in older plugin versions that would cause the check to fail. If the check fails, destroy the network namespace and try to recreate it from scratch once. If that fails in the second pass, fail the restore so that the allocation can be recreated (rather than silently having networking fail). This should fix the gap left #24650 for Docker task drivers and any other drivers with the `MustInitiateNetwork` capability. Fixes: https://github.com/hashicorp/nomad/issues/24292 Ref: https://github.com/hashicorp/nomad/pull/24650	2025-01-07 09:38:39 -05:00
Piotr Kazmierczak	f7a4ded2c0	security: add CT executeTemplate to default function_denylist (#24541 ) This PR adds Consul Template's executeTemplate function to the denylist by default, in order to prevent accidental or malicious infinitely recursive execution. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-11-22 19:33:56 +01:00
Piotr Kazmierczak	368241dbf2	security: a more comprehensive env.denylist (#24540 ) A more comprehensive env.denylist that now includes more token, token file and license variables. --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-11-22 18:54:18 +01:00
Michael Schurter	8dd570d6ca	docs: upgrade docs should point at real version (#24438 ) Let users know what happened to 1.9.2 but label the gc change as the first working release (1.9.3).	2024-11-12 11:05:27 -08:00
Piotr Kazmierczak	f7847c6e5b	state: remove TimeTable and rely on objects' modify times instead (#24112 ) Core scheduler relies on a special table in the state store—the TimeTable—to figure out which objects can be GC'd. The TimeTable correlates Raft indices with objects insertion time, a solution we used before most of the objects we store in the state contained timestamps. This introduced a bit of a memory overhead and complexity, but most importantly meant that any GC threshold users set greater than timeTableLimit = 72 * time.Hour was ignored. This PR removes the TimeTable and relies on object timestamps to determine whether they could be GCd or not.	2024-11-01 19:38:04 +01:00
R.B. Boyer	4e8f596311	docs: update broken consul acl token links (#24287 )	2024-10-23 13:34:21 -04:00
Michael Schurter	34cb05d297	docs: explain how to use dots in docker labels (#24074 ) Nomad v1.9.0 (finally!) removes support for HCL1 and the `-hcl1` flag. See #23912 for details. One of the uses of HCL1 over HCL2 was that HCL1 allowed quoted keys in blocks such as env, meta, and Docker's labels: ```hcl some_block { "foo.bar" = "baz" } ``` This works in HCL1 but is invalid HCL2. In HCL2 you must use a map instead of a block: ```hcl some_map = { "eggs.spam" = "works!" } ``` This was such a hassle for users we special cased the `env` and `meta` blocks to be accepted as blocks or maps in #9936. However Docker `labels`, being a task config option, is much harder to special case and commonly needs dots-in-keys for things like DataDog autodiscovery via Docker container labels: https://docs.datadoghq.com/containers/docker/integrations/?tab=labels Luckily `labels` can be specified as a list-of-maps instead: ```hcl labels = [ { "com.datadoghq.ad.check_names" = "[\"openmetrics\"]" "com.datadoghq.ad.init_configs" = "[{}]" } ] ``` So instead of adding more awkward hcl1/2 backward compat code to Nomad, I just updated the docs to hopefully help people hit by this. The only other known workaround is dropping HCL in favor of JSON jobspecs altogether, but that forces a huge migration and maintenance burden on users: https://discuss.hashicorp.com/t/docker-based-autodiscovery-with-datadog-how-can-we-make-it-work/18870	2024-09-27 10:02:50 -07:00
Tim Gross	a3a2028837	docs: update key management docs for keyring-in-Raft (#24026 ) In #23977 we moved the keyring into Raft. This changeset documents the operational changes and adds notes to the upgrade guide.	2024-09-25 10:48:14 -04:00
Tim Gross	192d70cee7	docker: update infra_image to new registry (#23927 ) The gcr.io container registry is shutting down in March. Update the default `image_image` for Docker's "pause" containers to point to the new location hosted by the k8s project. Fixes: https://github.com/hashicorp/nomad/issues/23911 Ref: https://hashicorp.atlassian.net/browse/NET-10942	2024-09-06 14:34:03 -04:00
Tim Gross	06f5fbc5d6	auth: enforce use of node secret and remove legacy auth (#23838 ) As of Nomad 1.6.0, Nomad client agents send their secret with all the RPCs (other than registration). But for backwards compatibility we had to keep a legacy auth method that didn't require the node secret. We've previously announced that this legacy auth method would be removed and that nodes older than 1.6.0 would not be supported with Nomad 1.9.0. This changeset removes the legacy auth method. Ref: https://developer.hashicorp.com/nomad/docs/release-notes/nomad/upcoming#nomad-1-9-0	2024-09-05 14:24:28 -04:00
Tim Gross	a9beef7edd	jobspec: remove HCL1 support (#23912 ) This changeset removes support for parsing jobspecs via the long-deprecated HCLv1. Fixes: https://github.com/hashicorp/nomad/issues/20195 Ref: https://hashicorp.atlassian.net/browse/NET-10220	2024-09-05 09:02:45 -04:00
Austin Culter	ce3e159ee8	docs: update upgrade-specific.mdx (#23906 )	2024-09-04 08:42:27 -04:00
Tim Gross	d5ca07a247	docs: notices of upcoming deprecations and backports (#23683 ) Add a section to the docs describing planned upcoming deprecations and removals. Also added some missing upgrade guide sections missed during the last release.	2024-07-25 10:20:18 -04:00
Tim Gross	2f4353412d	keyring: support prepublishing keys (#23577 ) When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed. Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their `PublishTime`. Update the periodic key rotation to prepublish keys at half the `root_key_rotation_threshold` window, and promote prepublished keys to active after the `PublishTime`. This changeset also fixes two bugs in periodic root key rotation and garbage collection, both of which can't be safely fixed without implementing prepublishing: * Periodic root key rotation would never happen because the default `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM time table. We now compare the `CreateTime` against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref https://github.com/hashicorp/nomad/issues/16359) * Root key garbage collection could GC keys that were used to sign identities. We now wait until `root_key_rotation_threshold` + `root_key_gc_threshold` before GC'ing a key. * When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete. Ref: https://hashicorp.atlassian.net/browse/NET-10398 Ref: https://hashicorp.atlassian.net/browse/NET-10280 Fixes: https://github.com/hashicorp/nomad/issues/19669 Fixes: https://github.com/hashicorp/nomad/issues/23528 Fixes: https://github.com/hashicorp/nomad/issues/19368	2024-07-19 13:29:41 -04:00
Piotr Kazmierczak	d5e1515e80	docker: default to hyper-v isolation on Windows (#23452 )	2024-07-01 08:56:43 +02:00
Piotr Kazmierczak	863d42bc4b	docs: upgrade guide updates for backported Docker windows changes (#23453 ) Upgrade guide should be uniform across all supported versions, otherwise backporting breaking changes is tedious.	2024-06-27 19:35:56 +02:00
Piotr Kazmierczak	0ece7b5c16	docker: validate that containers do not run as ContainerAdmin on Windows (#23443 ) This enables checks for ContainerAdmin user on docker images on Windows. It's only checked if users run docker with process isolation and not hyper-v, because hyper-v provides its own, proper sandboxing. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-06-27 16:22:24 +02:00
Charlie Voiselle	07516c8159	[docs] Add Sentinel info to version-specific upgrade page (#23173 ) The upgrade to sentinel v0.26 is a breaking change, requiring users of custom Sentinel plugins to rebuild them using sentinel-sdk v4	2024-06-26 10:46:38 -04:00
Seth Hoenig	6ad648bec8	networking: Inject implicit constraints on CNI plugins when using bridge mode (#15473 ) This PR adds a job mutator which injects constraints on the job taskgroups that make use of bridge networking. Creating a bridge network makes use of the CNI plugins: bridge, firewall, host-local, loopback, and portmap. Starting with Nomad 1.5 these plugins are fingerprinted on each node, and as such we can ensure jobs are correctly scheduled only on nodes where they are available, when needed.	2024-03-27 16:11:39 -04:00
Juana De La Cuesta	56bf253474	Add docs for disconnected block (#20147 ) Expand the job settings to include the disconnect block and set as deprecated the fields that will be replaced by it.	2024-03-20 10:08:16 +01:00
Michael Schurter	3193ac204f	docs: skipping a major release is fine (#20075 ) Nomad has always placed an extremely high priority on backward compatibility. We have always aimed to support N-2 major releases and usually gone above and beyond that. The new https://www.hashicorp.com/long-term-support policy also mentions that N-2 is what we have always supported, so it's probably time for our docs to reflect that reality.	2024-03-06 08:57:12 -08:00
Seth Hoenig	9410c519ff	drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration (#19599 ) * drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration * fix tests	2024-01-11 08:20:15 -06:00
Seth Hoenig	4b3ee77d6b	docs: update raw_exec driver docs and 1.7 upgrade notes (#19598 )	2024-01-04 08:26:46 -06:00
Etienne Bruines	f18d5c7c32	docs: fix migration to workload identity links (#19508 ) Fixes #19507	2023-12-18 21:27:38 -05:00
Tim Gross	0e42569ffb	docs: note that 1.7.2 yanks 1.7.0-1.7.1 due to CPU fingeprint bug (#19474 )	2023-12-14 11:32:13 -05:00
Tim Gross	ad9520c240	docs: add warning not to use 1.7.0 (#19399 ) Nomad 1.7.0 should be considered "yanked". Add a note about this to the upgrade guide.	2023-12-08 15:19:27 -05:00
Seth Hoenig	39eb17f3ec	docs: describe the need for dmidecode in docs (#19348 )	2023-12-08 10:45:37 -06:00
Luiz Aoqui	e0cea41e37	client: deprecate loading plugins without config (#19189 ) Nomad load all plugins from `plugin_dir` regardless if it is listed in the agent configuration file. This can cause unexpected binaries to be executed. This commit begins the deprecation process of this behaviour. The Nomad agent will emit a warning log for every plugin binary found without a corresponding agent configuration block. --------- Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-11-27 21:36:42 -05:00
Charlie Voiselle	659c0945fc	[core] Honor job's namespace when checking `distinct_hosts` feasibility (#19004 ) * Update distinct_host feasibility checking to honor the job's namespace. Fixes #9792 * Added test to verify original condition and that fix resolved it. * Added documentation	2023-11-17 11:25:10 -05:00
Seth Hoenig	61e21db2b4	docs: add 1.7 cpu upgrade notes and tweak cpu concepts doc (#18977 ) * docs: add 1.7 cpu upgrade notes and tweak cpu concepts doc * docs: fix spelling	2023-11-02 09:58:16 -05:00
Michael Schurter	0b0ae40199	docs: recommend rotating keys on upgrade (#18958 ) RIP EdDSA.	2023-11-01 10:57:33 -07:00
Tim Gross	ea3e711fa6	docs: upgrade guide for integrations deprecation warnings (#18928 ) The Consul and Vault integrations work shipping in Nomad 1.7 will deprecated the existing token-based workflows. These will be removed in Nomad 1.9, so add a note describing this to the upgrade guide.	2023-10-31 13:21:47 -04:00
James Rasell	b44cef0e66	docs: make upgrade version detail clearer. (#18608 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-09-29 08:31:14 +01:00
Luiz Aoqui	54c45ed106	acl: fix parsing of policies with blocks w/o label An ACL policy with a block without label generates unexpected results. For example, a policy such as this: ``` namespace { policy = "read" } ``` Is applied to a namespace called `policy` instead of the documented behaviour of applying it to the `default` namespace. This happens because of the way HCL1 decodes blocks. Since it doesn't know if a block is expected to have a label it applies the `key` tag to the content of the block and, in the example above, the first key is `policy`, so it sets that as the `namespace` block label. Since this happens internally in the HCL decoder it's not possible to detect the problem externally. Fixing the problem inside the decoder is challenging because the JSON and HCL parsers generate different ASTs that makes impossible to differentiate between a JSON tree from an invalid HCL tree within the decoder. The fix in this commit consists of manually parsing the policy after decoding to clear labels that were not set in the file. This allows the validation rules to consistently catch and return any errors, no matter if the policy is an invalid HCL or JSON.	2023-07-19 10:38:08 -04:00
Michael Schurter	5169950562	docs: v1.6.0 requires ipc_lock cap for mlock (#17881 ) Fixes #17780	2023-07-10 11:53:07 -07:00
Bruce Lok	8953e78dc4	fix typo peers.json (#17538 )	2023-06-19 07:56:51 +01:00
Tim Gross	bd59893956	build: remove 386 builds for Nomad 1.6.0 (#17239 ) The 32-bit Intel builds (aka "386") are not tested and likely have bugs involving platform-sized integers when operated at any non-trivial scale. Remove these builds from the upcoming Nomad 1.6.0 and provide recommendations in the upgrade notes for those users who might have hobbyist boards running 32-bit ARM (this will primarily be the RaspberryPi Zero or older spins of the RaspPi). DO NOT BACKPORT TO 1.5.x OR EARLIER!	2023-05-22 13:27:17 -04:00
Lance Haig	7e93f150b5	cli: tls certs not created with correct SANs (#16959 ) The `nomad tls cert` command did not create certificates with the correct SANs for them to work with non default domain and region names. This changset updates the code to support non default domains and regions in the certificates.	2023-05-22 09:31:56 -04:00
Tim Gross	6155ba3bcf	docs: add note to upgrade guide about yanked version (#17115 ) Nomad 1.5.4 shipped with a logmon bug that we rolled out a fix for in Nomad 1.5.5. Unfortunately we can't yank the release but we should leave a note in the upgrade guide telling users to avoid it.	2023-05-08 13:28:45 -04:00
Tim Gross	3ee02ebc97	post release 1.5.5 (#17098 ) * changelog entries for 1.5.5 and missing merge of changelog for 1.5.4, 1.4.9, and 1.3.14 * note on deprecation of `logs.enabled` field	2023-05-05 11:46:08 -04:00
Tim Gross	c3002db815	client: allow `drain_on_shutdown` configuration (#16827 ) Adds a new configuration to clients to optionally allow them to drain their workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting itself and then monitors the drain state as seen by the server until the drain is complete or the deadline expires. If it loses connection with the server, it will monitor local client status instead to ensure allocations are stopped before exiting.	2023-04-14 15:35:32 -04:00
Daniel Bennett	e6da5c70dc	Update enterprise licensing documentation (#16615 ) updated various docs for new expiration behavior and new command `nomad license inspect` to validate pre-upgrade	2023-03-30 16:40:19 -05:00
Luiz Aoqui	f2bfbfaf03	acl: update job eval requirement to `submit-job` (#16463 ) The job evaluate endpoint creates a new evaluation for the job which is a write operation. This change modifies the necessary capability from `read-job` to `submit-job` to better reflect this.	2023-03-13 17:13:54 -04:00
Seth Hoenig	95359b8c4c	client: disable running artifact downloader as nobody (#16375 ) * client: disable running artifact downloader as nobody This PR reverts a change from Nomad 1.5 where artifact downloads were executed as the nobody user on Linux systems. This was done as an attempt to improve the security model of artifact downloading where third party tools such as git or mercurial would be run as the root user with all the security implications thereof. However, doing so conflicts with Nomad's own advice for securing the Client data directory - which when setup with the recommended directory permissions structure prevents artifact downloads from working as intended. Artifact downloads are at least still now executed as a child process of the Nomad agent, and on modern Linux systems make use of the kernel Landlock feature for limiting filesystem access of the child process. * docs: update upgrade guide for 1.5.1 sandboxing * docs: add cl * docs: add title to upgrade guide fix	2023-03-08 15:58:43 -06:00
James Rasell	b677ec7e99	docs: add 1.5.0, 1.4.5, and 1.3.10 pause regression upgrade note. (#16358 )	2023-03-07 18:29:03 +01:00
Tim Gross	8373434b69	docs: clarify upgrade note on 1.4.0 panics (#16171 ) The panic bug for upgrades with older servers that shipped in 1.4.0 was fixed in 1.4.1, which makes the versions described in the warning in the upgrade guide misleading. Clarify the upgrade guide.	2023-02-14 11:26:33 -05:00
Seth Hoenig	511d0c1e70	artifact: protect against unbounded artifact decompression (1.5.0) (#16151 ) * artifact: protect against unbounded artifact decompression Starting with 1.5.0, set defaut values for artifact decompression limits. artifact.decompression_size_limit (default "100GB") - the maximum amount of data that will be decompressed before triggering an error and cancelling the operation artifact.decompression_file_count_limit (default 4096) - the maximum number of files that will be decompressed before triggering an error and cancelling the operation. * artifact: assert limits cannot be nil in validation	2023-02-14 09:28:39 -06:00
Tim Gross	88cd93bd94	docs: fix links in 1.5.0 upgrade guide (#16106 )	2023-02-09 09:39:49 -05:00
Tim Gross	6145cdcd11	cli: remove deprecated `keyring` and `keygen` commands (#16068 ) These command were marked as deprecated in 1.4.0 with intent to remove in 1.5.0. Remove them and clean up the docs.	2023-02-07 09:49:52 -05:00
jmwilkinson	46f3977db2	Allow wildcard datacenters to be specified in job file (#11170 ) Also allows for default value of `datacenters = ["*"]`	2023-02-02 09:57:45 -05:00

1 2 3

124 Commits