nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	566164a321	state: nil-check waiting evals before attempting to cancel them (#26872 ) When we attempt to drop unneeded evals from the eval broker, if the eval has been GC'd before the check is made, we hit a nil pointer. Check that the eval actually exists before attempting to remove it from the broker. Fixes: https://github.com/hashicorp/nomad/issues/26871	2025-10-02 12:24:59 -04:00
Piotr Kazmierczak	3f7cf0b287	e2e: re-enable system scheduler tests (#26869 ) Do not merge until #26868 lands.	2025-10-02 16:18:41 +02:00
Piotr Kazmierczak	f9b95ae896	scheduler: account for infeasible nodes when reconciling system jobs (#26868 ) Node reconciler never took node feasibility into account. In cases when there were nodes excluded from allocation placement due to constraints not being met, for example, the desired total or desired canary numbers were never updated in the reconciler to account for that. Thus, deployments would never become successful.	2025-10-02 16:17:46 +02:00
James Rasell	c3dbb1c589	e2e: Add ability to skip known bad Consul versions in compat test. (#26867 ) If the latest Consul version has known bugs that cause failures of the test suite, it is useful to be able to skip this. Otherwise, CI will fail on all PRs and release branches until a new version is released.	2025-10-02 14:27:17 +01:00
Michael Smithhisler	f2b831a430	docs: add job spec and plugin authoring pages for secrets (#26529 ) --------- Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-10-01 10:46:12 -04:00
Piotr Kazmierczak	2f2a9eb4ce	chore: manual go-getter update (#26861 )	2025-10-01 10:34:09 +02:00
dependabot[bot]	c89e0dcb43	chore(deps): bump github.com/prometheus/client_golang (#26844 ) Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.23.0 to 1.23.2. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](https://github.com/prometheus/client_golang/compare/v1.23.0...v1.23.2) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-version: 1.23.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-01 08:49:52 +02:00
Chris Roberts	1cf8d35245	docs: fix broken link for plugin guide (#26843 )	2025-09-30 09:21:28 -05:00
Piotr Kazmierczak	eaa0fe0e27	scheduler: always set the right deployment status for system jobs that require promotion (#26851 ) In cases where system jobs had the same amount of canary allocations deployed as there were eligible nodes, the scheduler would incorrectly mark the deployment as complete, as if auto promotion was set. This edge case uncovered a bug in the setDeploymentStatusAndUpdates method, and since we round up canary nodes, it may not be such an edge case afterall. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-09-30 09:18:59 +02:00
James Rasell	e6a04e06d1	acl: Check for duplicate or invalid keys when writing new policies (#26836 ) ACL policies are parsed when creating, updating, or compiling the resulting ACL object when used. This parsing was silently ignoring duplicate singleton keys, or invalid keys which does not grant any additional access, but is a poor UX and can be unexpected. This change parses all new policy writes and updates, so that duplicate or invalid keys return an error to the caller. This is called strict parsing. In order to correctly handle upgrades of clusters which have existing policies that would fall foul of the change, a lenient parsing mode is also available. This allows the policy to continue to be parsed and compiled after an upgrade without the need for an operator to correct the policy document prior to further use. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-09-30 08:16:59 +01:00
James Rasell	250b8f9d07	server: Gate node identity generation on server min version. (#26847 )	2025-09-29 15:17:00 +01:00
James Rasell	0f88530bd8	server: Gate node introduction generation on server min version. (#26849 )	2025-09-29 14:30:40 +01:00
James Rasell	f5c563155e	server: Only generate identities for nodes that meet min version. (#26842 ) In a cluster where the Nomad servers have been upgraded before the clients, the cluster leader will generate an identity for each client at every heartbeat. The clients will not have the code path to handle this response, so the object is thrown away. This wastes server resources. This change introduces a minimum version check to the logic which decides whether an identity should be generated. In the situation above, the leader will now decline to generate identities for Nomad clients running pre-1.11 versions.	2025-09-26 14:38:39 +01:00
James Rasell	61a4a02166	docs: Add node identity concepts page and other missing items. (#26830 ) Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-09-26 07:44:58 +01:00
Daniel Bennett	bdf08e1461	e2e: ui: fix playwright tag once and for all (#26840 ) I don't ever want to see this error again: ``` Error: browserType.launch: Executable doesn't exist at /ms-playwright/chromium_headless_shell-1193/chrome-linux/headless_shell ╔══════════════════════════════════════════════════════════════════════╗ ║ Looks like Playwright Test or Playwright was just updated to 1.55.1. ║ ║ Please update docker image as well. ║ ║ - current: mcr.microsoft.com/playwright:v1.55.0-jammy ║ ║ - required: mcr.microsoft.com/playwright:v1.55.1-jammy ║ ║ ║ ║ <3 Playwright Team ║ ╚══════════════════════════════════════════════════════════════════════╝ at global-setup.js:20 18 \| } 19 \| > 20 \| const browser = await chromium.launch(); \| ^ 21 \| const context = await browser.newContext({ ignoreHTTPSErrors: true }); 22 \| const page = await context.newPage(); 23 \| await page.goto(NOMAD_ADDR+'/ui/settings/tokens'); at module.exports (/src/global-setup.js:20:34) ``` I'm sure this will be the end of it!	2025-09-25 13:02:49 -04:00
Tim Gross	9bc2190508	CSI: serialize node plugin RPCs per-volume (#26832 ) In #26831 we're preventing unexpected node RPCs by ensuring that the volume watcher only unpublishes when allocations are client-terminal. To mitigate any remaining similar issues, add serialization of node plugin RPCs, as we did for controller plugin RPCs in #17996 and as recommended ("SHOULD") by the CSI specification. Here we can do per-volume serialization rather than per-plugin serialization. Reorder the methods of the `volumeManager` in the client so that each interface method and its directly-associated helper methods read from top-to-bottom, instead of a mix of directions. Ref: https://github.com/hashicorp/nomad/pull/17996 Ref: https://github.com/hashicorp/nomad/pull/26831	2025-09-25 11:29:44 -04:00
Tim Gross	fbcdb125da	end-to-end testing improvements for CSI (#26834 ) While working on #26831 and #26832 I made some minor improvements to our end-to-end test setup for CSI: * bump the AWS EBS plugin versions to latest release (1.48.0) * remove the unnnecessary `datacenters` field from the AWS EBS plugin jobs * add a name tag to the EBS volumes we create * add a user-specific name tag to the cluster name when using the makefile to deploy a cluster * add volumes and other missing variables from the `provision-infra` module to the main E2E module Ref: https://github.com/hashicorp/nomad/pull/26832 Ref: https://github.com/hashicorp/nomad/pull/26831	2025-09-25 09:27:15 -04:00
Tim Gross	40241b261b	CSI: ensure only client-terminal allocs are treated as past claims (#26831 ) The volume watcher checks whether any allocations that have claims are terminal so that it knows if it's safe to unpublish the volume. This check was considering a claim as unpublishable if the allocation was terminal on either the server or client, rather than the client alone. In many circumstances this is safe. But if an allocation takes a while to stop (ex. it has a `shutdown_delay`), it's possible for garbage collection to run in the window between when the alloc is marked server-terminal and when the task is actually stopped. The server unpublishes the volume which sends a node plugin RPC. The plugin unmounts the volume while it's in use, and then unmounts it again when the allocation stops and the CSI postrun hook runs. If the task writes to the volume during the unmounting process, some providers end up in a broken state and the volume is not usable unless it's detached and reattached. Fix this by considering a claim a "past claim" only when the allocation is client terminal. This way if garbage collection runs while we're waiting for allocation shutdown, the alloc will only be server-terminal and we won't send the extra node RPCs. Fixes: https://github.com/hashicorp/nomad/issues/24130 Fixes: https://github.com/hashicorp/nomad/issues/25819 Ref: https://hashicorp.atlassian.net/browse/NMD-1001	2025-09-25 09:24:53 -04:00
James Rasell	c80c60965f	node pool: Allow specifying node identity ttl in HCL or JSON spec. (#26825 ) The node identity TTL defaults to 24hr but can be altered by setting the node identity TTL parameter. In order to allow setting and viewing the value, the field is now plumbed through the CLI and HTTP API. In order to parse the HCL, a new helper package has been created which contains generic parsing and decoding functionality for dealing with HCL that contains time durations. hclsimple can be used when this functionality is not needed. In order to parse the JSON, custom marshal and unmarshal functions have been created as used in many other places. The node pool init command has been updated to include this new parameter, although commented out, so reference. The info command now includes the TTL in its output too.	2025-09-24 14:20:34 +01:00
Aimee Ukasick	6d4c8b3efe	Update CODEOWNERS (#26827 ) change web-presence to web-devdot so web engineers not on the devdot team don't get assigned	2025-09-23 09:22:23 -05:00
Daniel Bennett	1d6fddd11f	build: ui: setup-node v4.4.0 (#26826 ) for actions/cache upgrade, specifically to account for https://github.com/actions/toolkit/discussions/1890	2025-09-22 15:35:09 -04:00
dependabot[bot]	ccd497b46f	chore(deps): bump github.com/shoenig/go-m1cpu from 0.1.6 to 0.1.7 (#26817 ) Bumps [github.com/shoenig/go-m1cpu](https://github.com/shoenig/go-m1cpu) from 0.1.6 to 0.1.7. - [Release notes](https://github.com/shoenig/go-m1cpu/releases) - [Commits](https://github.com/shoenig/go-m1cpu/compare/v0.1.6...v0.1.7) --- updated-dependencies: - dependency-name: github.com/shoenig/go-m1cpu dependency-version: 0.1.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-22 09:58:50 +02:00
dependabot[bot]	63e4376d3c	chore(deps): bump golang.org/x/mod from 0.27.0 to 0.28.0 (#26814 ) Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.27.0 to 0.28.0. - [Commits](https://github.com/golang/mod/compare/v0.27.0...v0.28.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-version: 0.28.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-22 09:58:28 +02:00
Tim Gross	b5530128df	docs: expand on allocation GC details (#26792 ) Expand on the documentation of allocation garbage collection: * Explain that server-side GC of allocations is tied to the GC of the evaluation that spawned the allocation. * Explain that server-side GC of allocations will force them to be immediately GC'd on the client regardless of the client-side configurations. Ref: https://github.com/hashicorp/nomad/issues/26765 Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com> Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2025-09-19 12:17:17 -04:00
Aimee Ukasick	377674f93e	Contributing README: Add section for creating an issue (#26805 ) * Add section for creating an issue * incorporate feedback Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update contributing/README.md --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-09-19 10:49:54 -05:00
Piotr Kazmierczak	ceeee1f68c	e2e: set longer job submission time for failing system jobs (#26809 ) I cannot replicate this locally, but it appears that on CI some of our system jobs take longer than the default 20s to finish deploying. This PR is just to make sure this isn't the reason these tests fail.	2025-09-19 17:25:53 +02:00
Tim Gross	0367b60ca9	changelog for Nomad Enterprise 1.9.13+ent and 1.8.17+ent (#26806 ) Due to the delayed release of Nomad Enterprise, we didn't have the changelog entries for these two releases.	2025-09-19 11:22:58 -04:00
Piotr Kazmierczak	f767db5639	e2e: fix TestScaling/TestScaling_System (#26804 )	2025-09-19 15:31:32 +02:00
James Rasell	8e553ad95b	build: Add tzdata to Docker container final image. (#26794 ) Nomad's periodic block includes a "time_zone" parameter which lets operators set the time zone at which the next launch interval is checked against. For this to work, Nomad needs to use the "time.LoadLocation" which in-turn can use multiple TZ data sources. When using the Docker image to trigger Nomad job registrations, it currently does not have access to any TZ data, meaning it is only aware of UTC. Adding the tzdata package contents to the release image provides the required data for this to work. It would have also been possible to set the "-tags" build tag when releasing Nomad which would embed a copy of the timezone database in the code. We decided against using the build tag approach as it is a subtle way that we could introduce bugs that are very difficult to track down and we prefer the commit approach.	2025-09-19 08:55:57 +01:00
ethel-hashicorp	6ea57a589d	SMRE-733: Updates post-install text to properly reflect the updated IPLA blurb (#26791 )	2025-09-19 07:35:58 +01:00
Piotr Kazmierczak	f42239bf6c	api: add DefaultUpdateStrategy to system jobs if missing (#26777 ) From 1.11, Nomad system jobs will feature deployments, and thus jobspecs missing an update block should be canonicalized to have one.	2025-09-18 15:21:23 +02:00
Tim Gross	3ef25e5867	ACL: allow workload identities to list/get their own policies (#26772 ) In most RPC endpoints we use the resolved ACL object to determine whether a given auth token or identity has access to the object of interest to the RPC. In #15870 we adjusted this across most of the RPCs to handle workload identity. But in the ACL endpoints that read policies, we can't use the resolved ACL object and have to go back to the original token and lookup the policies it has access to. So we need to resolve any workload-associated policies during that lookup as well. Fixes: https://github.com/hashicorp/nomad/issues/26764 Ref: https://hashicorp.atlassian.net/browse/NMD-990 Ref: https://github.com/hashicorp/nomad/pull/15870	2025-09-18 09:10:37 -04:00
James Rasell	a206ff3858	test: Fix test flake in client get registration token (#26796 ) The test was incorrectly writing to state that registration had been finished before writing the node identity token. This is the opposite of what happens in the client code and caused a timing issue which meant we read registration as completed before we had the identity available and therefore returned the secret ID.	2025-09-18 13:56:17 +01:00
Piotr Kazmierczak	46dfd9d992	scheduler: do not create deployments for system job reschedules (#26789 ) System jobs that get rescheduled should not get new deployments.	2025-09-18 14:54:54 +02:00
Tim Gross	3432b0a2d6	consul: only add fingerprint link if unique.consul.name is set (#26787 ) In Nomad Enterprise we can fingerprint multiple Consul datacenters. If neither is `"default"` then we end up with warning logs about adding a "link". The `Link` field on the `Node` struct is a map of attributes that only contributes to the node's computed hash. The `"consul"` key's value is derived from the `unique.consul.name` attribute, which only exists if there's a default Consul cluster. Update the fingerprint to skip setting the link field if there's no `unique.consul.name`, and lower the warning log for malformed fields to debug; this is a minor scheduling optimization largely captured by existing Consul fields in the node computed class. The only reason not to remove it entirely is to avoid changing computed classes on existing large clusters. Fixes: https://github.com/hashicorp/nomad/issues/26781 Ref: https://hashicorp.atlassian.net/browse/NMD-998	2025-09-17 13:23:01 -04:00
Jeff Boruszak	6dce21bc85	Merge pull request #26682 from hashicorp/docs/versioned-redirect-fix docs: Versioned docs redirect fixes	2025-09-17 08:58:37 -07:00
Tim Gross	4e75e99f1a	windows: use/accept platform-specific signal for stopping agent (#26780 ) On Windows, the `os.Process.Signal` method returns an error when sending `os.Interrupt` (SIGINT) because it isn't implemented. This causes test servers in the `testutil` packages to break on Windows. Use the platform specific syscalls to generate the SIGINT instead. The agent's signal handler also did not correctly handle the Ctrl-C because we were masking os.Interrupt instead of SIGINT. Fixes: https://github.com/hashicorp/nomad/issues/26775 Co-authored-by: Chris Roberts <croberts@hashicorp.com>	2025-09-17 11:32:20 -04:00
Aimee Ukasick	fca783c566	Add 1.10.5 release notes (#26782 )	2025-09-17 08:59:43 -05:00
James Rasell	ac5a77af56	docs: Add client identity HTTP API detail on api-docs page. (#26774 ) Co-authored-by: Aimee Ukasick <Aimee.Ukasick@ibm.com>	2025-09-17 14:05:37 +01:00
Piotr Kazmierczak	4874622ebd	e2e: test canary updates for system jobs (#26776 )	2025-09-17 10:20:03 +02:00
boruszak	8ab61f37b3	Fix accidental "s	2025-09-16 14:23:59 -07:00
Michael Smithhisler	1a19a16ee9	docs: fix link in multiregion job spec page (#26755 )	2025-09-16 13:00:42 -05:00
James Rasell	2abd72d433	http: Fix client identity renew call when node ID is in URI. (#26773 ) When calling the client identity renew API, it is possible the target node ID is provided by either the URI or within the request body. This change fixes a bug where all calls using a node_id query parameter would be reject as it failed to decode the empty request body. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-09-16 15:15:39 +01:00
Olli Janatuinen	6398ef9475	secrets: Support custom plugins in Windows (#26751 ) Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>	2025-09-16 09:14:50 -04:00
Daniel Bennett	f47cb5d10f	e2e: adjust flaky timings (#26771 ) hopefully fixes: ``` TestOversubscription/testExec: oversubscription_test.go:57: submitting job: "./input/exec.hcl" oversubscription_test.go:72: oversubscription_test.go:72: expected condition to pass within wait context ↪ error: wait: timeout exceeded: expect '31457280' in stdout, got: 'stat {...}/cat.stdout.0: no such file or directory' ``` and in separate runs, ``` TestTaskAPI/testTaskAPI_Auth: taskapi_test.go:85: taskapi_test.go:85: expected string to have suffix ↪ suffix: Unauthorized ↪ string: ``` ``` TestTaskAPI/testTaskAPI_Auth: taskapi_test.go:85: taskapi_test.go:85: expected string to have suffix ↪ suffix: Forbidden ↪ string: ```	2025-09-15 15:54:53 -04:00
dependabot[bot]	ababacc9ab	chore(deps): bump github.com/shoenig/test from 1.12.1 to 1.12.2 in /api (#26757 ) * chore(deps): bump github.com/shoenig/test from 1.12.1 to 1.12.2 in /api Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 1.12.1 to 1.12.2. - [Release notes](https://github.com/shoenig/test/releases) - [Commits](https://github.com/shoenig/test/compare/v1.12.1...v1.12.2) --- updated-dependencies: - dependency-name: github.com/shoenig/test dependency-version: 1.12.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * root dep needs to be updated too --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-09-15 09:06:41 -04:00
dependabot[bot]	2baeffec92	chore(deps-dev): bump prettier from 3.5.3 to 3.6.2 in /website (#26162 ) Bumps [prettier](https://github.com/prettier/prettier) from 3.5.3 to 3.6.2. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](https://github.com/prettier/prettier/compare/3.5.3...3.6.2) --- updated-dependencies: - dependency-name: prettier dependency-version: 3.6.2 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-15 08:51:31 -04:00
dependabot[bot]	be1fdc0d53	chore(deps): bump golang.org/x/crypto from 0.41.0 to 0.42.0 (#26758 ) Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.41.0 to 0.42.0. - [Commits](https://github.com/golang/crypto/compare/v0.41.0...v0.42.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-version: 0.42.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-15 08:48:31 -04:00
dependabot[bot]	16533b3d34	chore(deps): bump google.golang.org/grpc from 1.75.0 to 1.75.1 (#26760 ) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.75.0 to 1.75.1. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.75.0...v1.75.1) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.75.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-15 08:48:15 -04:00
dependabot[bot]	24ef9fa928	chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/ec2/imds (#26762 ) Bumps [github.com/aws/aws-sdk-go-v2/feature/ec2/imds](https://github.com/aws/aws-sdk-go-v2) from 1.18.6 to 1.18.7. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/config/v1.18.7/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.18.6...config/v1.18.7) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/feature/ec2/imds dependency-version: 1.18.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-15 08:16:49 -04:00

1 2 3 4 5 ...

27525 Commits