nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 02:15:43 +03:00

Author	SHA1	Message	Date
James Rasell	32c25d3935	cli: Remove warning notes from Vault and Consul setup commands. (#25153 )	2025-02-19 09:18:42 +00:00
Michael Smithhisler	ae21ae54a7	docs: add auth-methods section in acl concepts (#24917 ) --------- Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-18 12:29:44 -05:00
Etienne Bruines	0739095d2b	docs: remove non-existing option in task-config.mdx for virt driver (#25146 ) The "network" block does not have a "provider" option, therefore it was removed.	2025-02-18 17:17:04 +01:00
Tim Gross	dc58f247ed	docs: clarify reschedule, migrate, and replacement terminology (#24929 ) Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: https://github.com/hashicorp/nomad/issues/24918 Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-18 09:31:03 -05:00
James Rasell	37fb418a16	deps: Update consul-template to 0.40.0 (#25140 )	2025-02-18 14:14:14 +00:00
Juana De La Cuesta	af2ac87409	Simplify binary overrides on e2e provision (#25122 ) * func: remove the lists to override the nomad_local_binary for servers and clients * docs: add a note to the terraform e2e readme * fix: remove the extra 'windows' from the aws_ami filter * style: hcl fmt	2025-02-17 16:13:32 +01:00
dependabot[bot]	05681afa57	chore(deps): bump golang.org/x/sys from 0.29.0 to 0.30.0 (#25127 ) Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.29.0 to 0.30.0. - [Commits](https://github.com/golang/sys/compare/v0.29.0...v0.30.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-17 13:50:55 +01:00
dependabot[bot]	b7b18d1c50	chore(deps): bump go.etcd.io/bbolt from 1.3.11 to 1.4.0 (#25130 ) Bumps [go.etcd.io/bbolt](https://github.com/etcd-io/bbolt) from 1.3.11 to 1.4.0. - [Release notes](https://github.com/etcd-io/bbolt/releases) - [Commits](https://github.com/etcd-io/bbolt/compare/v1.3.11...v1.4.0) --- updated-dependencies: - dependency-name: go.etcd.io/bbolt dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-17 13:50:00 +01:00
dependabot[bot]	21ad3ed938	chore(deps): bump github.com/hashicorp/go-bexpr from 0.1.13 to 0.1.14 (#25128 ) Bumps [github.com/hashicorp/go-bexpr](https://github.com/hashicorp/go-bexpr) from 0.1.13 to 0.1.14. - [Release notes](https://github.com/hashicorp/go-bexpr/releases) - [Commits](https://github.com/hashicorp/go-bexpr/compare/v0.1.13...v0.1.14) --- updated-dependencies: - dependency-name: github.com/hashicorp/go-bexpr dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-17 11:01:30 +01:00
dependabot[bot]	e87cf9d4b9	chore(deps): bump golang.org/x/time from 0.9.0 to 0.10.0 (#25131 ) Bumps [golang.org/x/time](https://github.com/golang/time) from 0.9.0 to 0.10.0. - [Commits](https://github.com/golang/time/compare/v0.9.0...v0.10.0) --- updated-dependencies: - dependency-name: golang.org/x/time dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-17 10:29:23 +01:00
Paweł Bęza	43885f6854	Allow for in-place update when affinity or spread was changed (#25109 ) Similarly to #6732 it removes checking affinity and spread for inplace update. Both affinity and spread should be as soft preference for Nomad scheduler rather than strict constraint. Therefore modifying them should not trigger job reallocation. Fixes #25070 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-02-14 14:33:18 -05:00
Aimee Ukasick	f1a1ff678c	Docs: Clarify Job status mapping on Job page (#25105 ) * Add dead (stopped) to status mapping to clarify Stopped CE-816 * Pull status mapping into partial and include in job status command * change `complete` to dead in table after discuss with Michael * added clarifications; add CLI status definitions * fixed line endings * fixed typoce816dead	2025-02-14 09:47:11 -06:00
Tim Gross	7b89c0ee28	template: fix client's default retry configuration (#25113 ) In #20165 we fixed a bug where a partially configured `client.template` retry block would set any unset fields to nil instead of their default values. But this patch introduced a regression in the default values, so we were now defaulting to unlimited retries if the retry block was unset. Restore the correct behavior and add better test coverage at both the config parsing and template configuration code. Ref: https://github.com/hashicorp/nomad/pull/20165 Ref: https://github.com/hashicorp/nomad/issues/23305#issuecomment-2643731565	2025-02-14 09:25:41 -05:00
Tim Gross	8c57fd5eb0	fingerprint: initial fingerprint of Vault/Consul should be periodic (#25102 ) In #24526 we updated the Consul and Vault fingerprints so that they are no longer periodic. This fixed a problem that cluster admins reported where rolling updates of Vault or Consul would cause a thundering herd of fingerprint updates across the whole cluster. But if Consul/Vault is not available during the initial fingerprint, it will never get fingerprinted again. This is challenging for cluster updates and black starts because the implicit service startup ordering may require reloads. Instead, have the fingerprinter run periodically but mark that it has made its first successful fingerprint of all Consul/Vault clusters. At that point, we can skip further periodic updates. The `Reload` method will reset the mark and allow the subsequent fingerprint to run normally. Fixes: https://github.com/hashicorp/nomad/issues/25097 Ref: https://github.com/hashicorp/nomad/pull/24526 Ref: https://github.com/hashicorp/nomad/issues/24049	2025-02-13 14:26:04 -05:00
Tim Gross	c2298e0999	Dynamic host volume reference documentation (#24797 )	2025-02-13 12:25:58 -05:00
Jorge Marey	25426f0777	fingerprint: add config option to disable dmidecode (#25108 )	2025-02-13 11:20:48 -05:00
Juana De La Cuesta	af735dce16	F net 11478 enos versions (#25092 ) * fix: change the value of the version used for testing to account for ent versions * func: add more specific test for servers stability * func: change the criteria we use to verify the cluster stability after server upgrades * style: syntax	2025-02-13 10:32:43 +01:00
Aimee Ukasick	35365bc1fb	resolve merge conflicts	2025-02-12 11:43:21 -06:00
Tim Gross	716df52788	CNI: migrate from persistent state to ephemeral state during restart (#25093 ) In #24650 we switched to using ephemeral state for CNI plugins, so that when a host reboots and we lose all the allocations we don't end up trying to use IPs we created in network namespaces we just destroyed. Unfortunately upgrade testing missed that in a non-reboot scenario, the existing CNI state was being used by plugins like the ipam plugin to hand out the "next available" IP address. So with no state carried over, we might allocate new addresses that conflict with existing allocations. (This can be avoided by draining the node first.) As a compatibility shim, copy the old CNI state directory to the new CNI state directory during agent startup, if the new CNI state directory doesn't already exist. Ref: https://github.com/hashicorp/nomad/pull/24650	2025-02-12 09:25:50 -05:00
Tim Gross	f0d3c2834e	upgrade testing: add README and fix authorization header (#25059 ) Add a README describing the setup required for running upgrade testing via Enos. Also fix the authorization header of our `wget` to use the proper header for short-lived tokens, and the output path variable of the artifactory step. Co-authored-by: Juanadelacuesta <8647634+Juanadelacuesta@users.noreply.github.com>	2025-02-12 08:56:47 -05:00
James Rasell	268e90dedf	ci: Update semgrep container version to 1.107.0 (#25078 )	2025-02-12 09:48:26 +00:00
James Rasell	d8841e011f	semgrep: Fix invalid RPC rule and add validation GHA workflow. (#25088 )	2025-02-12 09:44:27 +00:00
Daniel Bennett	1c0caddb98	Merge pull request #25094 from hashicorp/post-1.9.6-release Post 1.9.6 release	2025-02-11 18:06:12 -05:00
Daniel Bennett	c16d318bbe	Merge release 1.9.6 files	2025-02-11 17:36:15 -05:00
hc-github-team-nomad-core	ca21509631	Prepare for next release	2025-02-11 17:03:45 -05:00
hc-github-team-nomad-core	ac36990fe3	Generate files for 1.9.6 release	2025-02-11 17:03:45 -05:00
Piotr Kazmierczak	5468829260	stateful deployments: fix return in the `hasVolumes` feasibility check (#25084 ) A return statement was missing in the sticky volume check—when we weren't able to find a suitable volume, we did not return false. This was caught by e2e test. This PR fixes the issue, and corrects and expands the unit test.	2025-02-11 18:57:48 +01:00
Michael Smithhisler	c4f232f23e	event stream: fix wildcard namespace bypass (#25089 )	2025-02-11 11:06:29 -05:00
Daniel Bennett	92c90af542	e2e: task schedule: pauses vs restarts (#25085 ) CE side of ENT PR: task schedule: pauses are not restart "attempts" distinguish between these two cases: 1. task dies because we "paused" it (on purpose) - should not count against restarts, because nothing is wrong. 2. task dies because it didn't work right - should count against restart attempts, so users can address application issues. with this, the restart{} block is back to its normal behavior, so its documentation applies without caveat.	2025-02-11 09:46:58 -06:00
Aimee Ukasick	8a597a172d	Docs SEO: task drivers and plugins; refactor virt section (#24783 ) * Docs SEO: task drivers and plugins; refactor virt section * add redirects for virt driver files * Some updates. committing rather than stashing * fix content-check errors * Remove docs/devices/ and redirect to plugins/devices * Update docs/drivers descriptions * Move USB device plugin up a level. Finish descriptions. * Apply suggestions from Jeff's code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Apply title case suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * apply title case suggestions; fix indentation --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2025-02-10 15:43:02 -06:00
Michael Smithhisler	ba71c299b1	test: add eval to state store periodic job test (#25083 )	2025-02-10 13:39:08 -05:00
Michael Smithhisler	b5c157df29	state store: remove reschedulable check when getting job status (#25081 )	2025-02-10 12:21:00 -05:00
Tim Gross	87741dd908	deps: remove gofakeit (#25073 ) This dependency is only used to generate mock `Variables`. The only time the faked values are meaningful would be in the state store and RPC handler tests, where we are always setting the values directly so that we can control unblocking behaviors. Remove most of the random generation and remove the dependency. Closes: https://github.com/hashicorp/nomad/pull/25066	2025-02-10 11:53:05 -05:00
stswidwinski	871585ee90	18529 nomad executes any file in plugins (#18530 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2025-02-10 16:08:22 +00:00
Juana De La Cuesta	cfc24116b3	Add tag to instances with OS and add merged output (#25071 ) * func: add a new output that merges both windowa and linux clients, but add tags to distinguish them * fix: outputs cant referrence other outputs in terraform * Update e2e/terraform/provision-infra/compute.tf Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-02-10 17:08:07 +01:00
Juana De La Cuesta	c5d74a96a3	Add module to upgrade clients (#25055 ) * func: add module to upgrade clients * func: add polling to verify the metadata to make sure all clients are up * style: remove unused code * fix: Give the allocations a little time to get to the expected number on teh test health check, to avoid possible flaky tests in the future * fix: set the upgrade version as clients version for the last health check	2025-02-10 17:03:54 +01:00
dependabot[bot]	493f664632	chore(deps): bump github.com/prometheus/common from 0.60.1 to 0.62.0 (#25069 ) Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.60.1 to 0.62.0. - [Release notes](https://github.com/prometheus/common/releases) - [Changelog](https://github.com/prometheus/common/blob/main/RELEASE.md) - [Commits](https://github.com/prometheus/common/compare/v0.60.1...v0.62.0) --- updated-dependencies: - dependency-name: github.com/prometheus/common dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-10 09:39:31 -05:00
dependabot[bot]	43e6b5493f	chore(deps): bump github.com/hashicorp/go-kms-wrapping/wrappers/azurekeyvault/v2 (#25068 ) Bumps [github.com/hashicorp/go-kms-wrapping/wrappers/azurekeyvault/v2](https://github.com/hashicorp/go-kms-wrapping) from 2.0.11 to 2.0.13. - [Commits](https://github.com/hashicorp/go-kms-wrapping/compare/v2.0.11...v2.0.13) --- updated-dependencies: - dependency-name: github.com/hashicorp/go-kms-wrapping/wrappers/azurekeyvault/v2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-10 09:27:36 -05:00
dependabot[bot]	d999e88cef	chore(deps): bump github.com/aws/aws-sdk-go-v2/config (#25067 ) Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.29.4 to 1.29.6. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.29.4...config/v1.29.6) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/config dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-10 09:26:19 -05:00
dependabot[bot]	6eca129a9a	chore(deps): bump github.com/containerd/go-cni from 1.1.11 to 1.1.12 (#25065 ) Bumps [github.com/containerd/go-cni](https://github.com/containerd/go-cni) from 1.1.11 to 1.1.12. - [Release notes](https://github.com/containerd/go-cni/releases) - [Commits](https://github.com/containerd/go-cni/compare/v1.1.11...v1.1.12) --- updated-dependencies: - dependency-name: github.com/containerd/go-cni dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-10 09:25:25 -05:00
Juana De La Cuesta	cae81182dd	fix: refactor to avoid flakiness (#25047 )	2025-02-10 10:53:39 +01:00
Tim Gross	a11325863e	E2E: dynamic host volumes (#25063 ) I merged #24869 having forgotten we don't run these tests in PR CI, so there's a compile error in the test. Fix that error and add the no-op import we use to catch this kind of thing. Ref: https://github.com/hashicorp/nomad/pull/24869	2025-02-07 16:27:36 -05:00
Tim Gross	3f2d4000a6	E2E: dynamic host volume tests for sticky volumes (#24869 ) Add tests for dynamic host volumes where the claiming jobs have `volume.sticky = true`. Includes a test for forced rescheduling and a test for node drain. This changeset includes a new `e2e/v3`-style package for creating dynamic host volumes, so we can reuse that across other tests.	2025-02-07 15:50:54 -05:00
Michael Smithhisler	a6523be478	state store: fix logic for evaluating job status (#24974 )	2025-02-07 15:34:14 -05:00
Daniel Bennett	91194b3cc2	docker: refactor to handle futures more easily (#24992 ) at least one bug has been created because it's easy to miss a future.set() in pullImageImpl() this pulls future.set() out to PullImage(), the same level where it's created and wait()ed	2025-02-07 12:45:17 -06:00
Daniel Bennett	62ef621582	docker: respect image_pull_timeout (#24991 ) I believe the docker driver stopped respecting image_pull_timeout in Nomad 1.9.0 in `981ca36049` this makes the timeout apply again	2025-02-07 11:36:31 -06:00
Piotr Kazmierczak	611452e1af	stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs (#24993 ) We introduce an alternative solution to the one presented in #24960 which is based on the state store and not previous-next allocation tracking in the reconciler. This new solution reduces cognitive complexity of the scheduler code at the cost of slightly more boilerplate code, but also opens up new possibilities in the future, e.g., allowing users to explicitly "un-stick" volumes with workloads still running. The diagram below illustrates the new logic: SetVolumes() upsertAllocsImpl() sets ns, job +-----------------checks if alloc requests tg in the scheduler v sticky vols and consults \| +-----------------------+ state. If there is no claim, \| \| TaskGroupVolumeClaim: \| it creates one. \| \| - namespace \| \| \| - jobID \| \| \| - tg name \| \| \| - vol ID \| v \| uniquely identify vol \| hasVolumes() +----+------------------+ consults the state \| ^ and returns true \| \| DeleteJobTxn() if there's a match <-----------+ +---------------removes the claim from or if there is no the state previous claim \| \| \| \| +-----------------------------+ +------------------------------------------------------+ scheduler state store	2025-02-07 17:41:01 +01:00
Daniel Bennett	3493551c38	docker: surface image pull progress error (#24981 ) set() on the future, so the caller can handle it instead of wait()ing forever and causing the allocation to get stuck "pending"	2025-02-07 10:36:09 -06:00
Tim Gross	d0a6424844	enos: improve documentation around required variables (#25051 ) The variables definitions for Enos upgrade scenarios have a couple of unused variables and some of the documentation strings are ambiguous: * `nomad_region` and `binary_local_path` variables are unused and can be removed. * `nomad_local_binary` refers to the directory where the binaries will be download, not the binaries themselves. Rename to make it clear this belongs to the artifactory fetch and not the provisioning step (which uses the artifactory fetch outputs).	2025-02-07 11:35:50 -05:00
James Rasell	4fbacee328	sec: Remove yamux suppression as vuln has been revoked. (#25044 )	2025-02-07 15:15:15 +00:00

1 2 3 4 5 ...

26683 Commits