nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	4ce937884d	scheduler: move result mutation into `computeStop` (#26351 ) The `computeStop` method returns two values that only get used to mutate the result and the untainted set. Move the mutation into the method to match the work done in #26325. Ref: https://github.com/hashicorp/nomad/pull/26325 Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-29 08:23:06 -04:00
Tim Gross	e062f87b07	docs: fix typo in redirect URL domain (#26384 )	2025-07-28 16:28:27 -04:00
Tim Gross	501608ca68	docs: document handling of unset affinity/constraint values (#26354 ) Affinities and contraints use similar feasibility checking logic to determine if a given node matches (although affinities don't support all the same operators). Most operators don't allow `value` to be unset. Update the docs to reflect this. Fixes: https://github.com/hashicorp/nomad/issues/24983	2025-07-28 14:12:43 -04:00
Tim Gross	b286a8ee9c	docs: update Consul/Vault compatibility matrix (#26368 ) Update our support matrix to show currently-supported versions of Consul, Vault, and Nomad.	2025-07-28 13:48:38 -04:00
Tim Gross	192dec4297	docs: fix self-referencing link for raw_exec driver config (#26353 ) During the big docs rearchitecture, we split up the task driver pages into separate job declaration and driver configuration pages. The link for the `raw_exec` driver to the configuration page is a self-reference.	2025-07-28 13:48:23 -04:00
Tim Gross	513ec02486	docs: explain access modes for CSI and DHV volumes (#26352 ) The documentation for CSI and DHV has a list of the available access modes, but doesn't explain what they mean in terms of what jobs can request, the scheduler behavior, or the CSI plugin behavior. Expand on the information available in the CSI specification and provide a description of DHV's behavior as well. Ref: https://github.com/container-storage-interface/spec/blob/master/spec.md#createvolume	2025-07-28 13:48:01 -04:00
Tim Gross	6e5ecb6bb0	E2E: update Consul/Vault compat versions tested (#26369 ) Update our E2E compatibility test for Consul and Vault to only include back to the oldest-supported LTS versions of Consul and Vault. This will still leave a few unsupported non-LTS versions in the matrix between the two oldest LTS, but this is a small number of tests and fixing it would mean hard-coding the LTS support matrix in our tests.	2025-07-28 12:03:30 -04:00
dependabot[bot]	d418260b6d	chore(deps): bump google.golang.org/grpc from 1.73.0 to 1.74.2 (#26357 ) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.73.0 to 1.74.2. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.73.0...v1.74.2) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.74.2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-28 11:27:49 -04:00
dependabot[bot]	a90f82bd0f	chore(deps): bump github.com/aws/smithy-go from 1.22.4 to 1.22.5 (#26355 ) Bumps [github.com/aws/smithy-go](https://github.com/aws/smithy-go) from 1.22.4 to 1.22.5. - [Release notes](https://github.com/aws/smithy-go/releases) - [Changelog](https://github.com/aws/smithy-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/smithy-go/compare/v1.22.4...v1.22.5) --- updated-dependencies: - dependency-name: github.com/aws/smithy-go dependency-version: 1.22.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-28 11:00:15 -04:00
James Rasell	fe42c5bab0	ci: Revert hclogvet running across entire codebase. (#26365 ) It seems the tool requires a little attention and does not run well across our enterprise codebase. Rolling back that makefile change, so it does not stop enterprise work, backport, CI, etc.	2025-07-28 15:53:40 +01:00
dependabot[bot]	e561bdb476	chore(deps): bump github.com/hashicorp/consul-template (#26356 ) Bumps [github.com/hashicorp/consul-template](https://github.com/hashicorp/consul-template) from 0.41.0 to 0.41.1. - [Release notes](https://github.com/hashicorp/consul-template/releases) - [Changelog](https://github.com/hashicorp/consul-template/blob/v0.41.1/CHANGELOG.md) - [Commits](https://github.com/hashicorp/consul-template/compare/v0.41.0...v0.41.1) --- updated-dependencies: - dependency-name: github.com/hashicorp/consul-template dependency-version: 0.41.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-28 10:02:59 -04:00
dependabot[bot]	5bc5f4f9f1	chore(deps): bump github.com/aws/aws-sdk-go-v2/config (#26358 ) Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.29.17 to 1.29.18. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.29.17...config/v1.29.18) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/config dependency-version: 1.29.18 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-28 10:02:27 -04:00
James Rasell	f2417ffb89	ci: Update hclogvet and correctly run across codebase. (#26362 )	2025-07-28 14:15:33 +01:00
James Rasell	20251b675d	Add CLI and API components for creating node introduction tokens via ACL endpoint. (#26332 )	2025-07-25 13:28:45 +01:00
Tim Gross	26554e544e	scheduler: move result mutation into `computeUpdates` (#26336 ) The `computeUpdate` method returns 4 different values, some of which are just different shapes of the same data and only ever get used to be applied to the result in the caller. Move the mutation of the result into `computeUpdates` to match the work done in #26325. Clean up the return signature so that only slices we need downstream are returned, and fix the incorrect docstring. Also fix a silent bug where the `inplace` set includes the original alloc and not the updated version. This has no functional change because all existing callers only ever look at the length of this slice, but it will prevent future bugs if that ever changes. Ref: https://github.com/hashicorp/nomad/pull/26325 Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-25 08:21:37 -04:00
James Rasell	5989d5862a	ci: Update golangci-lint to v2 and fix highlighted issues. (#26334 )	2025-07-25 10:44:08 +01:00
James Rasell	842f316615	Merge branch 'main' into f-NMD-763-introduction	2025-07-25 08:27:53 +01:00
James Rasell	2ef837f02f	cli: Ensure all no argument console messages are the same. (#26331 ) Use a constant to ensure consistency across the CLI when displaying a console message indicating the command was passed arguments when it takes none.	2025-07-25 07:05:10 +01:00
Aimee Ukasick	ccaa3b7325	add table to service.port entry (#26344 )	2025-07-24 14:00:05 -05:00
Tim Gross	b91d1726ce	docs: clarify namespace support in autoscaler (#26337 ) The current autoscaler docs implies that it has minimal or non-working support for Nomad namespaces. Whereas in fact the namespace support works fine but just doesn't allow configuring multiple namespaces without using a wildcard (for now). Make this more clear and fix the reference to the configuration "below", which is no longer on that same page. Ref: https://github.com/hashicorp/nomad-autoscaler/issues/65	2025-07-24 12:16:24 -04:00
Aimee Ukasick	55926afe11	Docs: Clarify service.connect examples (#26330 ) * Docs: CE-997 clarify connect examples * fix DSN typos * CE-996 clarify agent config consul.client_auto_join * add (formerly Consul Connect) * remove 'Nomad and Consul are	2025-07-24 10:59:03 -05:00
Tim Gross	2c4be7fc2e	Reconciler mutation improvements (#26325 ) Refactors of the `computeGroup` code in the reconciler to make understanding its mutations more manageable. Some of this work makes mutation more consistent but more importantly it's intended to make it readily _detectable_ while still being readable. Includes: * In the `computeCanaries` function, we mutate the dstate and the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function. * In the `computeMigrations` function, we mutate the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function. * In the `cancelUnneededCanaries` function, we mutate the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function, and annotate which `allocSet`s are mutated by taking a pointer to the set. * The `createRescheduleLaterEvals` function currently mutates the results and returns updates to mutate the results in the caller. Move all this mutation into the function to help cleanup `computeGroup`. * Extract `computeReconnecting` method from `computeGroup`. There's some tangled logic in `computeGroup` for determining changes to make for reconnecting allocations. Pull this out into its own function. Annotate mutability in the function by passing pointers to `allocSet` where needed, and mutate the result to update counts. Rename the old `computeReconnecting` method to `appendReconnectingUpdates` to mirror the naming of the similar logic for disconnects. * Extract `computeDisconnecting` method from `computeGroup`. There's some tangled logic in `computeGroup` for determining changes to make for disconnected allocations. Pull this out into its own function. Annotate mutability in the function by passing pointers to `allocSet` where needed, and mutate the result to update counts. * The `appendUnknownDisconnectingUpdates` method used to create updates for disconnected allocations mutates one of its `allocSet` arguments to change the allocations that the reschedule now set points to. Pull this update out into the caller. * A handful of small docstring and helper function fixes Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-24 08:33:49 -04:00
James Rasell	62f1dbebfb	server: Add RPC and HTTP functionality for node intro token gen. (#26320 ) The node introduction workflow will utilise JWT's that can be used as authentication tokens on initial client registration. This change implements the basic builder for this JWT claim type and the RPC and HTTP handler functionality that will expose this to the operator.	2025-07-23 14:32:26 +01:00
Tim Gross	e675491eb6	refactor uses of `allocSet` in reconciler (#26324 ) The reconciler contains a large set of methods and functions that operate on `allocSet` (a map of allocation IDs to their allocs). Update these so that they are consistently methods that are documented to not consume the `allocSet`. This sets the stage for further improvements around mutability in the reconciler. This changeset also includes a few related refactors: * Use the `allocSet` alias in every location it's relevant in the reconciler, for consistency and clarity. * Move the filter functions and related helpers in the `allocs.go` file into the `filters.go` file. * Update the method receiver on `allocSet` to match everywhere and generally improve the docstrings on the filter functions. Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-23 08:57:41 -04:00
Jeff Boruszak	61cb8f6f10	Merge pull request #26270 from hashicorp/docs/redirects-for-versioning docs: Versioned redirect logic	2025-07-22 14:14:23 -07:00
Aimee Ukasick	e6d63faf58	Fix typo (#26319 )	2025-07-22 09:53:31 -05:00
James Rasell	7466dd71b2	server: Add new `server.client_introduction` config block. (#26315 ) The new configuration block exposes some key options which allow cluster administrators to control certain client introduction behaviours. This change introduces the new block and plumbing, so that it is exposed in the Nomad server for consumption via internal processes.	2025-07-22 08:50:19 +01:00
Michael Smithhisler	36b4aa79df	docs: fix link to nomad schedulers (#26302 )	2025-07-21 08:53:29 -05:00
dependabot[bot]	66c22971b0	chore(deps): bump github.com/klauspost/cpuid/v2 from 2.2.11 to 2.3.0 (#26305 )	2025-07-21 13:20:34 +01:00
dependabot[bot]	c6584e241c	chore(deps): bump github.com/docker/cli (#26307 )	2025-07-21 12:31:50 +01:00
dependabot[bot]	e26f7af91c	chore(deps): bump github.com/miekg/dns from 1.1.66 to 1.1.67 (#26306 )	2025-07-21 11:26:44 +01:00
dependabot[bot]	9facc80924	chore(deps): bump github.com/golang-jwt/jwt/v5 from 5.2.2 to 5.2.3 (#26308 )	2025-07-21 10:36:03 +01:00
dependabot[bot]	5b05d99257	chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/ec2/imds (#26304 )	2025-07-21 08:24:15 +01:00
Piotr Kazmierczak	973a554808	scheduler: remove unnecessary reconnecting and ignore allocset assignment (#26298 ) These values aren't used anywhere, and the code is confusing as is.	2025-07-21 09:06:52 +02:00
dependabot[bot]	edd7b5b8e7	chore(deps): bump golang.org/x/crypto from 0.39.0 to 0.40.0 (#26276 ) Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.39.0 to 0.40.0. - [Commits](https://github.com/golang/crypto/compare/v0.39.0...v0.40.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-version: 0.40.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-18 16:06:26 -04:00
dependabot[bot]	0d3056ef8a	chore(deps): bump golang.org/x/sys from 0.33.0 to 0.34.0 (#26275 ) Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.33.0 to 0.34.0. - [Commits](https://github.com/golang/sys/compare/v0.33.0...v0.34.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-version: 0.34.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-18 13:39:38 -04:00
Deniz Onur Duzgun	f23ca749e7	build: update toolchain to go 1.24.5 (#26297 )	2025-07-18 13:38:11 -04:00
dependabot[bot]	da77c0134e	chore(deps): bump golang.org/x/mod from 0.25.0 to 0.26.0 (#26274 ) Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.25.0 to 0.26.0. - [Commits](https://github.com/golang/mod/compare/v0.25.0...v0.26.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-version: 0.26.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-18 10:55:38 -04:00
Tim Gross	333dd94362	scheduler: exit early on count=0 and filter out server-terminal (#26292 ) When a task group is removed from a jobspec, the reconciler stops all allocations and immediately returns from `computeGroup`. We can do the same for when the group has been scaled-to-zero, but doing so runs into an inconsistency in the way that server-terminal allocations are handled. Prior to this change server-terminal allocations fall through `computeGroup` without being marked as `ignore`, unless they are terminal canaries, in which case they are marked `stop` (but this is a no-op). This inconsistency causes a _tiny_ amount of extra `Plan.Submit`/Raft traffic, but more importantly makes it more difficult to make test assertions for `stop` vs `ignore` vs fallthrough. Remove this inconsistency by filtering out server-terminal allocations early in `computeGroup`. This brings the cluster reconciler's behavior closer to the node reconciler's behavior, except that the node reconciler discards _all_ terminal allocations because it doesn't support rescheduling. This changeset required adjustments to two tests, but the tests themselves were a bit of a mess: * In https://github.com/hashicorp/nomad/pull/25726 we added a test of how canaries were treated when on draining nodes. But the test didn't correctly configure the job with an update block, leading to misleading test behavior. Fix the test to exercise the intended behavior and refactor for clarity. * While working on reconciler behaviors around stopped allocations, I found it extremely hard to follow the intent of the disconnected client tests because many of the fields in the table-driven test are switches for more complex behavior or just tersely named. Attempt to make this a little more legible by moving some branches directly into fields, renaming some fields, and flattening out some branching. Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-18 08:51:52 -04:00
dependabot[bot]	c9ebf01b4a	chore(deps): bump github.com/docker/docker (#26272 ) Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.3.1+incompatible to 28.3.2+incompatible. - [Release notes](https://github.com/docker/docker/releases) - [Commits](https://github.com/docker/docker/compare/v28.3.1...v28.3.2) --- updated-dependencies: - dependency-name: github.com/docker/docker dependency-version: 28.3.2+incompatible dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-17 15:40:00 -04:00
James Rasell	dce4284361	Merge branch 'main' into f-NMD-763-identity	2025-07-17 07:35:16 +01:00
Allison Larson	918e1eb123	Correctly canonicalize lifecycle block when missing hook value (#26285 )	2025-07-16 11:40:16 -07:00
Aimee Ukasick	0d620607fe	add blog links and video to nomad vs k8s (#26286 )	2025-07-16 12:56:42 -05:00
James Rasell	953a149180	client: Allow operators to force a client to renew its identity. (#26277 ) The Nomad client will have its identity renewed according to the TTL which defaults to 24h. In certain situations such as root keyring rotation, operators may want to force clients to renew their identities before the TTL threshold is met. This change introduces a client HTTP and RPC endpoint which will instruct the node to request a new identity at its next heartbeat. This can be used via the API or a new command. While this is a manual intervention step on top of the any keyring rotation, it dramatically reduces the initial feature complexity as it provides an asynchronous and efficient method of renewal that utilises existing functionality.	2025-07-16 14:56:00 +01:00
Tim Gross	35f3f6ce41	scheduler: add disconnect and reschedule info to reconciler output (#26255 ) The `DesiredUpdates` struct that we send to the Read Eval API doesn't include information about disconnect/reconnect and rescheduling. Annotate the `DesiredUpdates` with this data, and adjust the `eval status` command to display only those fields that have non-zero values in order to make the output width manageable. Ref: https://hashicorp.atlassian.net/browse/NMD-815	2025-07-16 08:46:38 -04:00
Tim Gross	9a288ef493	deployment watcher: refactoring testing (#26284 ) While investigating whether the deploymentwatcher would need updates to implement system deployments, I discovered that some of the tests are racy and make assertions about called functions without waiting. Update these tests to wait where needed, and generally clean them up while we're in here. In particular I've removed the heavyweight mocking in lieu of checking the call counts and then asserting the expected state store changes. Ref: https://hashicorp.atlassian.net/browse/NMD-892	2025-07-16 08:46:24 -04:00
Allison Larson	3ca518e89c	Add node_pool to blockedEval metric (#26215 ) Adds the node_pool to the blockedEval metrics that get emitted for resource/cpu, along with the dc and node class.	2025-07-15 09:48:04 -07:00
Tim Gross	279775082c	sysbatch: correctly validate that reschedule policy is not allowed (#26279 ) System and sysbatch jobs don't support the reschedule block, because we'd always replace allocations back onto the same node. The job validation for system jobs asserts that the user hasn't set a `reschedule` block so that users aren't submitting jobs expecting it to be supported. But this validation was missing for sysbatch jobs. Validate that sysbatch jobs don't have a reschedule block.	2025-07-15 10:47:02 -04:00
Daniel Bennett	089c148236	allocrunner: run all postrun hooks, even on error (#26271 ) e.g. if the consul postrun hook fails, continue running the subsequent postrun hooks, which among other things includes network/CNI/iptables cleanup.	2025-07-14 13:55:33 -04:00
James Rasell	8096ea4129	client: Handle identities from servers and use for RPC auth. (#26218 ) Nomad servers, if upgraded, can return node identities as part of the register and update/heartbeat response objects. The Nomad client will now handle this and store it as appropriate within its memory and statedb. The client will now use any stored identity for RPC authentication with a fallback to the secretID. This supports upgrades paths where the Nomad clients are updated before the Nomad servers.	2025-07-14 14:24:43 +01:00

... 4 5 6 7 8 ...

27545 Commits