nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
James Rasell	5989d5862a	ci: Update golangci-lint to v2 and fix highlighted issues. (#26334 )	2025-07-25 10:44:08 +01:00
James Rasell	2ef837f02f	cli: Ensure all no argument console messages are the same. (#26331 ) Use a constant to ensure consistency across the CLI when displaying a console message indicating the command was passed arguments when it takes none.	2025-07-25 07:05:10 +01:00
Aimee Ukasick	ccaa3b7325	add table to service.port entry (#26344 )	2025-07-24 14:00:05 -05:00
Tim Gross	b91d1726ce	docs: clarify namespace support in autoscaler (#26337 ) The current autoscaler docs implies that it has minimal or non-working support for Nomad namespaces. Whereas in fact the namespace support works fine but just doesn't allow configuring multiple namespaces without using a wildcard (for now). Make this more clear and fix the reference to the configuration "below", which is no longer on that same page. Ref: https://github.com/hashicorp/nomad-autoscaler/issues/65	2025-07-24 12:16:24 -04:00
Aimee Ukasick	55926afe11	Docs: Clarify service.connect examples (#26330 ) * Docs: CE-997 clarify connect examples * fix DSN typos * CE-996 clarify agent config consul.client_auto_join * add (formerly Consul Connect) * remove 'Nomad and Consul are	2025-07-24 10:59:03 -05:00
Tim Gross	2c4be7fc2e	Reconciler mutation improvements (#26325 ) Refactors of the `computeGroup` code in the reconciler to make understanding its mutations more manageable. Some of this work makes mutation more consistent but more importantly it's intended to make it readily _detectable_ while still being readable. Includes: * In the `computeCanaries` function, we mutate the dstate and the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function. * In the `computeMigrations` function, we mutate the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function. * In the `cancelUnneededCanaries` function, we mutate the result and then the return values are used to further mutate the result in the caller. Move all this mutation into the function, and annotate which `allocSet`s are mutated by taking a pointer to the set. * The `createRescheduleLaterEvals` function currently mutates the results and returns updates to mutate the results in the caller. Move all this mutation into the function to help cleanup `computeGroup`. * Extract `computeReconnecting` method from `computeGroup`. There's some tangled logic in `computeGroup` for determining changes to make for reconnecting allocations. Pull this out into its own function. Annotate mutability in the function by passing pointers to `allocSet` where needed, and mutate the result to update counts. Rename the old `computeReconnecting` method to `appendReconnectingUpdates` to mirror the naming of the similar logic for disconnects. * Extract `computeDisconnecting` method from `computeGroup`. There's some tangled logic in `computeGroup` for determining changes to make for disconnected allocations. Pull this out into its own function. Annotate mutability in the function by passing pointers to `allocSet` where needed, and mutate the result to update counts. * The `appendUnknownDisconnectingUpdates` method used to create updates for disconnected allocations mutates one of its `allocSet` arguments to change the allocations that the reschedule now set points to. Pull this update out into the caller. * A handful of small docstring and helper function fixes Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-24 08:33:49 -04:00
Tim Gross	e675491eb6	refactor uses of `allocSet` in reconciler (#26324 ) The reconciler contains a large set of methods and functions that operate on `allocSet` (a map of allocation IDs to their allocs). Update these so that they are consistently methods that are documented to not consume the `allocSet`. This sets the stage for further improvements around mutability in the reconciler. This changeset also includes a few related refactors: * Use the `allocSet` alias in every location it's relevant in the reconciler, for consistency and clarity. * Move the filter functions and related helpers in the `allocs.go` file into the `filters.go` file. * Update the method receiver on `allocSet` to match everywhere and generally improve the docstrings on the filter functions. Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-23 08:57:41 -04:00
Jeff Boruszak	61cb8f6f10	Merge pull request #26270 from hashicorp/docs/redirects-for-versioning docs: Versioned redirect logic	2025-07-22 14:14:23 -07:00
Aimee Ukasick	e6d63faf58	Fix typo (#26319 )	2025-07-22 09:53:31 -05:00
Michael Smithhisler	36b4aa79df	docs: fix link to nomad schedulers (#26302 )	2025-07-21 08:53:29 -05:00
dependabot[bot]	66c22971b0	chore(deps): bump github.com/klauspost/cpuid/v2 from 2.2.11 to 2.3.0 (#26305 )	2025-07-21 13:20:34 +01:00
dependabot[bot]	c6584e241c	chore(deps): bump github.com/docker/cli (#26307 )	2025-07-21 12:31:50 +01:00
dependabot[bot]	e26f7af91c	chore(deps): bump github.com/miekg/dns from 1.1.66 to 1.1.67 (#26306 )	2025-07-21 11:26:44 +01:00
dependabot[bot]	9facc80924	chore(deps): bump github.com/golang-jwt/jwt/v5 from 5.2.2 to 5.2.3 (#26308 )	2025-07-21 10:36:03 +01:00
dependabot[bot]	5b05d99257	chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/ec2/imds (#26304 )	2025-07-21 08:24:15 +01:00
Piotr Kazmierczak	973a554808	scheduler: remove unnecessary reconnecting and ignore allocset assignment (#26298 ) These values aren't used anywhere, and the code is confusing as is.	2025-07-21 09:06:52 +02:00
dependabot[bot]	edd7b5b8e7	chore(deps): bump golang.org/x/crypto from 0.39.0 to 0.40.0 (#26276 ) Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.39.0 to 0.40.0. - [Commits](https://github.com/golang/crypto/compare/v0.39.0...v0.40.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-version: 0.40.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-18 16:06:26 -04:00
dependabot[bot]	0d3056ef8a	chore(deps): bump golang.org/x/sys from 0.33.0 to 0.34.0 (#26275 ) Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.33.0 to 0.34.0. - [Commits](https://github.com/golang/sys/compare/v0.33.0...v0.34.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-version: 0.34.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-18 13:39:38 -04:00
Deniz Onur Duzgun	f23ca749e7	build: update toolchain to go 1.24.5 (#26297 )	2025-07-18 13:38:11 -04:00
dependabot[bot]	da77c0134e	chore(deps): bump golang.org/x/mod from 0.25.0 to 0.26.0 (#26274 ) Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.25.0 to 0.26.0. - [Commits](https://github.com/golang/mod/compare/v0.25.0...v0.26.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-version: 0.26.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-18 10:55:38 -04:00
Tim Gross	333dd94362	scheduler: exit early on count=0 and filter out server-terminal (#26292 ) When a task group is removed from a jobspec, the reconciler stops all allocations and immediately returns from `computeGroup`. We can do the same for when the group has been scaled-to-zero, but doing so runs into an inconsistency in the way that server-terminal allocations are handled. Prior to this change server-terminal allocations fall through `computeGroup` without being marked as `ignore`, unless they are terminal canaries, in which case they are marked `stop` (but this is a no-op). This inconsistency causes a _tiny_ amount of extra `Plan.Submit`/Raft traffic, but more importantly makes it more difficult to make test assertions for `stop` vs `ignore` vs fallthrough. Remove this inconsistency by filtering out server-terminal allocations early in `computeGroup`. This brings the cluster reconciler's behavior closer to the node reconciler's behavior, except that the node reconciler discards _all_ terminal allocations because it doesn't support rescheduling. This changeset required adjustments to two tests, but the tests themselves were a bit of a mess: * In https://github.com/hashicorp/nomad/pull/25726 we added a test of how canaries were treated when on draining nodes. But the test didn't correctly configure the job with an update block, leading to misleading test behavior. Fix the test to exercise the intended behavior and refactor for clarity. * While working on reconciler behaviors around stopped allocations, I found it extremely hard to follow the intent of the disconnected client tests because many of the fields in the table-driven test are switches for more complex behavior or just tersely named. Attempt to make this a little more legible by moving some branches directly into fields, renaming some fields, and flattening out some branching. Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-18 08:51:52 -04:00
dependabot[bot]	c9ebf01b4a	chore(deps): bump github.com/docker/docker (#26272 ) Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.3.1+incompatible to 28.3.2+incompatible. - [Release notes](https://github.com/docker/docker/releases) - [Commits](https://github.com/docker/docker/compare/v28.3.1...v28.3.2) --- updated-dependencies: - dependency-name: github.com/docker/docker dependency-version: 28.3.2+incompatible dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-17 15:40:00 -04:00
Allison Larson	918e1eb123	Correctly canonicalize lifecycle block when missing hook value (#26285 )	2025-07-16 11:40:16 -07:00
Aimee Ukasick	0d620607fe	add blog links and video to nomad vs k8s (#26286 )	2025-07-16 12:56:42 -05:00
Tim Gross	35f3f6ce41	scheduler: add disconnect and reschedule info to reconciler output (#26255 ) The `DesiredUpdates` struct that we send to the Read Eval API doesn't include information about disconnect/reconnect and rescheduling. Annotate the `DesiredUpdates` with this data, and adjust the `eval status` command to display only those fields that have non-zero values in order to make the output width manageable. Ref: https://hashicorp.atlassian.net/browse/NMD-815	2025-07-16 08:46:38 -04:00
Tim Gross	9a288ef493	deployment watcher: refactoring testing (#26284 ) While investigating whether the deploymentwatcher would need updates to implement system deployments, I discovered that some of the tests are racy and make assertions about called functions without waiting. Update these tests to wait where needed, and generally clean them up while we're in here. In particular I've removed the heavyweight mocking in lieu of checking the call counts and then asserting the expected state store changes. Ref: https://hashicorp.atlassian.net/browse/NMD-892	2025-07-16 08:46:24 -04:00
Allison Larson	3ca518e89c	Add node_pool to blockedEval metric (#26215 ) Adds the node_pool to the blockedEval metrics that get emitted for resource/cpu, along with the dc and node class.	2025-07-15 09:48:04 -07:00
Tim Gross	279775082c	sysbatch: correctly validate that reschedule policy is not allowed (#26279 ) System and sysbatch jobs don't support the reschedule block, because we'd always replace allocations back onto the same node. The job validation for system jobs asserts that the user hasn't set a `reschedule` block so that users aren't submitting jobs expecting it to be supported. But this validation was missing for sysbatch jobs. Validate that sysbatch jobs don't have a reschedule block.	2025-07-15 10:47:02 -04:00
Daniel Bennett	089c148236	allocrunner: run all postrun hooks, even on error (#26271 ) e.g. if the consul postrun hook fails, continue running the subsequent postrun hooks, which among other things includes network/CNI/iptables cleanup.	2025-07-14 13:55:33 -04:00
boruszak	a70e09f508	typo fix	2025-07-11 13:34:27 -07:00
boruszak	a6ec64063d	neater formatting	2025-07-11 13:19:50 -07:00
boruszak	9c6ca4fd21	Tutorial archive redirects for versioned docs.	2025-07-11 13:14:05 -07:00
boruszak	5cf2d36b29	Versioned redirect logic	2025-07-11 13:13:42 -07:00
Tim Gross	b23ab5ac15	docs: clarify requirements for deleting volumes (#26240 ) If you delete a CSI volume, the volume cannot be currently claimed by an allocation or in the process of being unpublished. This is documented in the CLI but not the API. Also, the documentation incorrectly says that the `volume delete` command silently returns without error if the volume doesn't exist, but that's incorrect. Fixes: https://github.com/hashicorp/nomad/issues/24756	2025-07-11 15:01:06 -04:00
Tim Gross	bf44eddd9f	docs: note that CSI volume name must be unique (#26249 ) When we originally implemented CSI, Nomad did not support the `CreateVolume` workflow, so the volume name field was just a display name. The `CreateVolume` CSI RPC requires that the volume name be unique. In retrospect, Nomad should probably have mapped the namespace + ID to the volume name field, but because we didn't the name field must be unique per storage provider. In future work we should try to figure out a way to unwind that decision but in the meantime let's make that requirement clear in the documentation. Ref: https://gitlab.com/rocketduck/csi-plugin-nfs/-/issues/21	2025-07-11 14:57:53 -04:00
Piotr Kazmierczak	08b3db104d	docs: update reconciler diagram to reflect recent refactors (#26260 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-07-11 15:34:07 +02:00
Tim Gross	26302ab25d	reconciler: share assertions in property tests (#26259 ) Refactor the reconciler property tests to extract functions for safety property assertions we'll share between different job types for the same reconciler.	2025-07-11 09:27:22 -04:00
Aimee Ukasick	9af1642a1f	add redirect to handle new 1.9, 1.8 /commands path (#26254 )	2025-07-10 15:08:44 -05:00
Frédéric Praca	7e47aa3a1f	fix(doc): fix links for task driver plugins (#26250 ) host URL was wrong, changed from develoepr to developer	2025-07-10 14:33:45 -05:00
Tim Gross	3bb1c9aeaf	docs: more details for `alloc status` (#26243 ) The `alloc status` documentation is missing information about placement metrics. Ref: https://hashicorp.atlassian.net/browse/NMD-818	2025-07-10 08:57:37 -04:00
Tim Gross	29bfda6c51	docs: more details for `eval status` (#26242 ) The `eval status` documentation is missing the recently-added reconciler annotations. Ref: https://hashicorp.atlassian.net/browse/NMD-818	2025-07-10 08:57:27 -04:00
James Rasell	7c5a5782bc	client: Use single time variable when handling heartbeat response. (#26238 ) When the client handles an update status response from the server, it modifies its heartbeat stop tracker with a time set once the RPC call returns. It optionally also emits a log message, if the client suspects it has missed a heartbeat. These times were originally tracked by two different calls to the time function which were executed 2 microseconds apart. There is no reason we cannot use a single time variable for both uses which saves us one whole call to time.Now.	2025-07-10 08:07:32 +01:00
Tim Gross	74f7a8f037	scheduler: basic node reconciler safety properties for system jobs (#26216 ) Property test assertions for the core safety properties of the node reconciler, for system jobs. Ref: https://hashicorp.atlassian.net/browse/NMD-814 Ref: https://github.com/hashicorp/nomad/pull/26167	2025-07-09 14:44:05 -04:00
Tim Gross	94e03f894a	scheduler: basic cluster reconciler safety properties for batch jobs (#26172 ) Property test assertions for the core safety proprerties of the cluster reconciler, for batch jobs. The changeset includes fixes for any bugs found during work-in-progress, which will get pulled out to their own PRs. Ref: https://hashicorp.atlassian.net/browse/NMD-814 Ref: https://github.com/hashicorp/nomad/pull/26167	2025-07-09 14:43:55 -04:00
Piotr Kazmierczak	e50db4d1b8	scheduler: property testing of cancelUnneededCanaries (#26204 ) In the spirit of #26180 Internal ref: https://hashicorp.atlassian.net/browse/NMD-814	2025-07-09 13:46:13 -04:00
Tim Gross	7c6c1ed0d3	scheduler: reconciler should constrain placements to count (#26239 ) While working on property testing in #26172 we discovered there are scenarios where the reconciler will produce more than the expected number of placements. Testing of those scenarios at the whole-scheduler level shows that this gets handled correctly downstream of the reconciler, but this makes it harder to reason about reconciler behavior. Cap the number of placements in the reconciler. Ref: https://github.com/hashicorp/nomad/pull/26172	2025-07-09 11:51:01 -04:00
Tim Gross	eb47d1ca11	scheduler: eliminate dead code in node reconciler (#26236 ) While working on property testing in #26216, I discovered we had unreachable code in the node reconciler. The `diffSystemAllocsForNode` function receives a set of non-terminal allocations, but then has branches where it assumes the allocations might be terminal. It's trivially provable that these allocs are always live, as the system scheduler splits the set of known allocs into live and terminal sets before passing them into the node reconciler. Eliminate the unreachable code and improve the variable names to make the known state of the allocs more clear in the reconciler code. Ref: https://github.com/hashicorp/nomad/pull/26216	2025-07-09 11:31:04 -04:00
Piotr Kazmierczak	8bc6abcd2e	scheduler: basic cluster reconciler safety properties for service jobs (#26167 )	2025-07-09 17:30:37 +02:00
Tim Gross	009927d4e8	changelog: note that 1.9.11 and 1.8.15 are ENT-only (#26237 )	2025-07-09 10:13:58 -04:00
Aimee Ukasick	53b083b8c5	Docs: Nomad IA (#26063 ) * Move commands from docs to its own root-level directory * temporarily use modified dev-portal branch with nomad ia changes * explicitly clone nomad ia exp branch * retrigger build, fixed dev-portal broken build * architecture, concepts and get started individual pages * fix get started section destinations * reference section * update repo comment in website-build.sh to show branch * docs nav file update capitalization * update capitalization to force deploy * remove nomad-vs-kubernetes dir; move content to what is nomad pg * job section * Nomad operations category, deploy section * operations category, govern section * operations - manage * operations/scale; concepts scheduling fix * networking * monitor * secure section * remote auth-methods folder and move up pages to sso; linkcheck * Fix install2deploy redirects * fix architecture redirects * Job section: Add missing section index pages * Add section index pages so breadcrumbs build correctly * concepts/index fix front matter indentation * move task driver plugin config to new deploy section * Finish adding full URL to tutorials links in nav * change SSO to Authentication in nav and file system * Docs NomadIA: Move tutorials into NomadIA branch (#26132) * Move governance and policy from tutorials to docs * Move tutorials content to job-declare section * run jobs section * stateful workloads * advanced job scheduling * deploy section * manage section * monitor section * secure/acl and secure/authorization * fix example that contains an unseal key in real format * remove images from sso-vault * secure/traffic * secure/workload-identities * vault-acl change unseal key and root token in command output sample * remove lines from sample output * fix front matter * move nomad pack tutorials to tools * search/replace /nomad/tutorials links * update acl overview with content from deleted architecture/acl * fix spelling mistake * linkcheck - fix broken links * fix link to Nomad variables tutorial * fix link to Prometheus tutorial * move who uses Nomad to use cases page; move spec/config shortcuts add dividers * Move Consul out of Integrations; move namespaces to govern * move integrations/vault to secure/vault; delete integrations * move ref arch to docs; rename Deploy Nomad back to Install Nomad * address feedback * linkcheck fixes * Fixed raw_exec redirect * add info from /nomad/tutorials/manage-jobs/jobs * update page content with newer tutorial * link updates for architecture sub-folders * Add redirects for removed section index pages. Fix links. * fix broken links from linkcheck * Revert to use dev-portal main branch instead of nomadIA branch * build workaround: add intro-nav-data.json with single entry * fix content-check error * add intro directory to get around Vercel build error * workound for emtpry directory * remove mdx from /intro/ to fix content-check and git snafu * Add intro index.mdx so Vercel build should work --------- Co-authored-by: Tu Nguyen <im2nguyen@gmail.com>	2025-07-08 19:24:52 -05:00

1 2 3 4 5 ...

27267 Commits