nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Dmitrii Andreev	c588527ae4	added changelog entry	2025-10-09 12:56:29 +03:00
Brendan MacDonell	26485c45a2	Add job_max_count option to keep Nomad server from running out of memory (#26858 ) If a Nomad job is started with a large number of instances (e.g. 4 billion), then the Nomad servers that attempt to schedule it will run out of memory and crash. While it's unlikely that anyone would intentionally schedule a job with 4 billion instances, we have occasionally run into issues with bugs in external automation. For example, an automated deployment system running on a test environment had an off-by-one error, and deployed a job with count = uint32(-1), causing the Nomad servers for that environment to run out of memory and crash. To prevent this, this PR introduces a job_max_count Nomad server configuration parameter. job_max_count limits the number of allocs that may be created from a job. The default value is 50000 - this is low enough that a job with the maximum possible number of allocs will not require much memory on the server, but is still much higher than the number of allocs in the largest Nomad job we have ever run.	2025-10-06 09:35:10 -04:00
Piotr Kazmierczak	48863bda8a	client: upgrade consul-template to include template hashing bugfix (#26880 ) This pulls latest consul-template to include the https://github.com/hashicorp/consul-template/pull/2096 bugfix.	2025-10-03 15:18:42 +02:00
Piotr Kazmierczak	c7233854b9	build: upgrade to Go 1.25.1 (#26823 )	2025-10-03 08:59:12 +02:00
Allison Larson	e40164abce	Add preserve-resources flag (#26841 ) * Add preserve-resources flag when registering a job * Add preserve-resources flag to website docs * Add changelog * Update tests, docs * Preserve counts & resources in fsm * Update doc * Update preservation of resources/count to happen in StateStore	2025-10-02 13:56:59 -07:00
Michael Smithhisler	696ad4789e	vault: default tlsSkipVerify to false (#26664 ) The transit keyring uses the go-kms-wrapper for parsing the vault config and errors if tlsSkipVerify is an empty string. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-10-02 12:28:05 -04:00
Tim Gross	566164a321	state: nil-check waiting evals before attempting to cancel them (#26872 ) When we attempt to drop unneeded evals from the eval broker, if the eval has been GC'd before the check is made, we hit a nil pointer. Check that the eval actually exists before attempting to remove it from the broker. Fixes: https://github.com/hashicorp/nomad/issues/26871	2025-10-02 12:24:59 -04:00
James Rasell	e6a04e06d1	acl: Check for duplicate or invalid keys when writing new policies (#26836 ) ACL policies are parsed when creating, updating, or compiling the resulting ACL object when used. This parsing was silently ignoring duplicate singleton keys, or invalid keys which does not grant any additional access, but is a poor UX and can be unexpected. This change parses all new policy writes and updates, so that duplicate or invalid keys return an error to the caller. This is called strict parsing. In order to correctly handle upgrades of clusters which have existing policies that would fall foul of the change, a lenient parsing mode is also available. This allows the policy to continue to be parsed and compiled after an upgrade without the need for an operator to correct the policy document prior to further use. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-09-30 08:16:59 +01:00
Tim Gross	9bc2190508	CSI: serialize node plugin RPCs per-volume (#26832 ) In #26831 we're preventing unexpected node RPCs by ensuring that the volume watcher only unpublishes when allocations are client-terminal. To mitigate any remaining similar issues, add serialization of node plugin RPCs, as we did for controller plugin RPCs in #17996 and as recommended ("SHOULD") by the CSI specification. Here we can do per-volume serialization rather than per-plugin serialization. Reorder the methods of the `volumeManager` in the client so that each interface method and its directly-associated helper methods read from top-to-bottom, instead of a mix of directions. Ref: https://github.com/hashicorp/nomad/pull/17996 Ref: https://github.com/hashicorp/nomad/pull/26831	2025-09-25 11:29:44 -04:00
Tim Gross	40241b261b	CSI: ensure only client-terminal allocs are treated as past claims (#26831 ) The volume watcher checks whether any allocations that have claims are terminal so that it knows if it's safe to unpublish the volume. This check was considering a claim as unpublishable if the allocation was terminal on either the server or client, rather than the client alone. In many circumstances this is safe. But if an allocation takes a while to stop (ex. it has a `shutdown_delay`), it's possible for garbage collection to run in the window between when the alloc is marked server-terminal and when the task is actually stopped. The server unpublishes the volume which sends a node plugin RPC. The plugin unmounts the volume while it's in use, and then unmounts it again when the allocation stops and the CSI postrun hook runs. If the task writes to the volume during the unmounting process, some providers end up in a broken state and the volume is not usable unless it's detached and reattached. Fix this by considering a claim a "past claim" only when the allocation is client terminal. This way if garbage collection runs while we're waiting for allocation shutdown, the alloc will only be server-terminal and we won't send the extra node RPCs. Fixes: https://github.com/hashicorp/nomad/issues/24130 Fixes: https://github.com/hashicorp/nomad/issues/25819 Ref: https://hashicorp.atlassian.net/browse/NMD-1001	2025-09-25 09:24:53 -04:00
James Rasell	8e553ad95b	build: Add tzdata to Docker container final image. (#26794 ) Nomad's periodic block includes a "time_zone" parameter which lets operators set the time zone at which the next launch interval is checked against. For this to work, Nomad needs to use the "time.LoadLocation" which in-turn can use multiple TZ data sources. When using the Docker image to trigger Nomad job registrations, it currently does not have access to any TZ data, meaning it is only aware of UTC. Adding the tzdata package contents to the release image provides the required data for this to work. It would have also been possible to set the "-tags" build tag when releasing Nomad which would embed a copy of the timezone database in the code. We decided against using the build tag approach as it is a subtle way that we could introduce bugs that are very difficult to track down and we prefer the commit approach.	2025-09-19 08:55:57 +01:00
ethel-hashicorp	6ea57a589d	SMRE-733: Updates post-install text to properly reflect the updated IPLA blurb (#26791 )	2025-09-19 07:35:58 +01:00
Tim Gross	3ef25e5867	ACL: allow workload identities to list/get their own policies (#26772 ) In most RPC endpoints we use the resolved ACL object to determine whether a given auth token or identity has access to the object of interest to the RPC. In #15870 we adjusted this across most of the RPCs to handle workload identity. But in the ACL endpoints that read policies, we can't use the resolved ACL object and have to go back to the original token and lookup the policies it has access to. So we need to resolve any workload-associated policies during that lookup as well. Fixes: https://github.com/hashicorp/nomad/issues/26764 Ref: https://hashicorp.atlassian.net/browse/NMD-990 Ref: https://github.com/hashicorp/nomad/pull/15870	2025-09-18 09:10:37 -04:00
Tim Gross	3432b0a2d6	consul: only add fingerprint link if unique.consul.name is set (#26787 ) In Nomad Enterprise we can fingerprint multiple Consul datacenters. If neither is `"default"` then we end up with warning logs about adding a "link". The `Link` field on the `Node` struct is a map of attributes that only contributes to the node's computed hash. The `"consul"` key's value is derived from the `unique.consul.name` attribute, which only exists if there's a default Consul cluster. Update the fingerprint to skip setting the link field if there's no `unique.consul.name`, and lower the warning log for malformed fields to debug; this is a minor scheduling optimization largely captured by existing Consul fields in the node computed class. The only reason not to remove it entirely is to avoid changing computed classes on existing large clusters. Fixes: https://github.com/hashicorp/nomad/issues/26781 Ref: https://hashicorp.atlassian.net/browse/NMD-998	2025-09-17 13:23:01 -04:00
Tim Gross	4e75e99f1a	windows: use/accept platform-specific signal for stopping agent (#26780 ) On Windows, the `os.Process.Signal` method returns an error when sending `os.Interrupt` (SIGINT) because it isn't implemented. This causes test servers in the `testutil` packages to break on Windows. Use the platform specific syscalls to generate the SIGINT instead. The agent's signal handler also did not correctly handle the Ctrl-C because we were masking os.Interrupt instead of SIGINT. Fixes: https://github.com/hashicorp/nomad/issues/26775 Co-authored-by: Chris Roberts <croberts@hashicorp.com>	2025-09-17 11:32:20 -04:00
Tim Gross	ac86225e09	metrics: reduce heap usage of eval broker metrics (#26737 ) The metrics on the eval broker include labels for the job ID, but under a high volume of dispatch workloads, this results in excessive heap usage on the leader. Dispatch workloads should use their parent ID rather than their child ID for any metrics we collect. Also, eliminate an extra copy of the labels. And remove the extremely high cardinality `"eval_id"` label from the `nomad.broker.eval_waiting` metric. Fixes: https://github.com/hashicorp/nomad/issues/26657	2025-09-12 08:29:46 -04:00
Michael Smithhisler	c20f854d16	client: set network status on tasks when restoring allocations (#26699 ) The allocation network hook was not properly restoring network status from state when the network had previously been setup. This led to missing environment variables, misconfigured hosts file, and resolv.conf when a task was restarted after the nomad agent has restarted. --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2025-09-11 13:10:21 -04:00
Michael Smithhisler	f58e915bd3	scheduler: allow device count to use different vendors/models (#26649 ) A small optimization in the scheduler required users to specify specific models of devices if the required count was higher than the individual model/vendor on the node. This change removes that optimization to allow for more intuitive device scheduling when different vendor/model device types exist on a node.	2025-09-10 07:12:38 -04:00
Michael Smithhisler	37da98be1c	Merge pull request #26681 from hashicorp/NMD-760-nomad-secrets-block Secrets Block: merge feature branch to main	2025-09-09 10:46:18 -04:00
Tim Gross	0b69999698	Revert go-getter update (#26731 ) The `go-getter` update in https://github.com/hashicorp/nomad/pull/26713 is not passing tests upstream (apparently https://github.com/hashicorp/go-getter/pull/548 is the origin of the problem but that PR did not ever run tests). The issue being fixed isn't a critical vulnerability, so in the interest of preparing us for the next release, revert the `go-getter` change but keep the Go toolchain update. We'll skip go-getter 1.8.0 and pick up the next patch version once its issues are fixed. Reverts commit `8a96929870`.	2025-09-09 09:28:08 -04:00
Tim Gross	f86a141026	scheduler: don't sort reserved port ranges before adding to bitmap (#26712 ) During a large volume dispatch load test, I discovered that a lot of the total scheduling time is being spent calling `structs.ParsePortRanges` repeatedly, in order to parse the reserved ports configuration of the node (ex. converting `"80,8000-8001"` to `[]int{80, 8000, 8001}`). A close examination of the profiles shows that the bulk of the time is being spent hashing the keys for the map of ports we use for de-duplication, and then sorting the resulting slice. The `(*NetworkIndex) SetNode` method that calls the offending `ParsePortRanges` merges all the ports into the `UsedPorts` map of bitmaps at scheduling time. Which means the consumer of the slice is already de-duplicating and doesn't care about the order. The only other caller of `ParsePortRanges` is when we validate the configuration file, and that throws away the slice entirely. By skipping de-duplication and not sorting, we can cut down the runtime of this function by 30x and memory usage by 3x. Ref: https://github.com/hashicorp/nomad/blob/v1.10.4/nomad/structs/network.go#L201 Fixes: https://github.com/hashicorp/nomad/issues/26654	2025-09-08 12:05:21 -04:00
Deniz Onur Duzgun	8a96929870	bump: go and go-getter versions (#26713 ) * bump: go and go-getter versions * add changelog	2025-09-08 11:10:25 -04:00
Michael Smithhisler	56b7a8da5c	secrets: add changelog for secret block	2025-09-05 16:09:33 -04:00
Tim Gross	0e9eb5ae43	dispatch: write evaluation atomically with dispatch registration (#26710 ) In #8435 (shipped in 0.12.1), we updated the `Job.Register` RPC to atomically write the eval along with the job. But this didn't get copied to `Job.Dispatch`. Under excessive load testing we demonstrated this can result in dispatched jobs without corresponding evals. Update the dispatch RPC to write the eval in the same Raft log as the job registration. Note that we don't need to version-check this change for upgrades, because the register and dispatch RPCs share the same `JobRegisterRequestType` Raft message, and therefore all supported server versions already look for the eval in the FSM. If an updated leader includes the eval, older followers will write the eval. If a non-updated leader writes the eval in a separate Raft entry, updated followers will write those evals normally. Fixes: https://github.com/hashicorp/nomad/issues/26655 Ref: https://hashicorp.atlassian.net/browse/NMD-947 Ref: https://github.com/hashicorp/nomad/pull/8435	2025-09-05 14:53:08 -04:00
Piotr Kazmierczak	964cc8b8ca	Merge pull request #26708 from hashicorp/f-system-deployments scheduler: system deployments	2025-09-05 18:23:41 +02:00
Piotr Kazmierczak	3e4d2b731c	scheduler: changelog entry for system deployments	2025-09-05 17:52:27 +02:00
James Rasell	1916a16311	exec: Set LOGNAME env var on exec based drivers. (#26703 ) Typically the `LOGNAME` environment variable should be set according to the values within `/etc/passwd` and represents the name of the logged in user. This should be set, where possible, alongside the USER and HOME variables for all drivers that use the shared executor and do not use a sub-shell.	2025-09-05 14:07:27 +01:00
Daniel Bennett	9682aa2724	consul connect: allow "cni/*" network mode (#26449 ) don't require "bridge" network mode when using connect{} we document this as "at your own risk" because CNI configuration is so flexible that we can't guarantee a user's network will work, but Nomad's "bridge" CNI config may be used as a reference.	2025-09-04 12:29:50 -04:00
Juana De La Cuesta	2944a34b58	Reuse token if it exists on client reconnect (#26604 ) Currently every time a client starts, it creates a new consul token per service or task,. This PR changes the behaviour , it persists consul ACL token to the client state and it starts by looking up a token before creating a new one. Fixes: #20184 Fixes: #20185	2025-09-04 15:27:57 +02:00
Chris Roberts	c3dcdb5413	[cli] Add windows service commands (#26442 ) Adds a new `windows` command which is available when running on a Windows hosts. The command includes two new subcommands: * `service install` * `service uninstall` The `service install` command will install the called binary into the Windows program files directory, create a new Windows service, setup configuration and data directories, and register the service with the Window eventlog. If the service and/or binary already exist, the service will be stopped, service and eventlog updated if needed, binary replaced, and the service started again. The `service uninstall` command will stop the service, remove the Windows service, and deregister the service with the eventlog. It will not remove the configuration/data directory nor will it remove the installed binary.	2025-09-02 16:40:35 -07:00
James Rasell	cddc1b0127	config: Validate keyring config to catch invalid provider types. (#26673 )	2025-09-02 11:07:49 +01:00
Michael Smithhisler	485356c3d3	csi: fix volume registration error (#26642 )	2025-08-27 15:00:16 -04:00
Chris Roberts	fd1e40537c	[artifact] add artifact inspection after download (#26608 ) This adds artifact inspection after download to detect any issues with the content fetched. Currently this means checking for any symlinks within the artifact that resolve outside the task or allocation directories. On platforms where lockdown is available (some Linux) this inspection is not performed. The inspection can be disabled with the DisableArtifactInspection option. A dedicated option for disabling this behavior allows the DisableFilesystemIsolation option to be enabled but still have artifacts inspected after download.	2025-08-27 10:37:34 -07:00
Michael Smithhisler	da4cf07ff4	logs: skip logging SIGPIPE signal (#26582 )	2025-08-21 09:08:49 -04:00
Deniz Onur Duzgun	1f7e8cdda3	deps: bump go-getter to v1.7.9 (#26533 ) * deps: bump go-getter to v1.7.9 * add changelog * update changelog	2025-08-18 10:48:21 -04:00
Wim	f712d5db90	Add AllocIPv6 option to allow IPv6 address being used for service registration (#25632 ) Fixes #25627 by adding an extra `alloc_advertise_ipv6` option similar to the `AdvertiseIPv6Addr` with the docker driver config. Fixes: https://github.com/hashicorp/nomad/issues/25627	2025-08-08 15:01:46 -04:00
Allison Larson	e16a3339ad	Add CSI Volume Sentinel Policy scaffolding (#26438 ) * Add ent policy enforcement stubs to CSI Volume create/register * Wire policy override/warnings through CSI volume register/create * Add new scope to sentinel apply * Sanitize CSISecrets & CSIMountOptions * Add sentinel policy scope to ui * Update docs for new sentinel scope/policy * Create new api funcs for CSI endpoints * fix sentinel csi ui test * Update sentinel-policy docs * Add changelog * Update docs from feedback	2025-08-07 12:03:18 -07:00
Deniz Onur Duzgun	79bf619833	build: update toolchain to go 1.24.6 (#26451 ) * build: update toolchain to go 1.24.6 * add changelog	2025-08-07 08:44:41 -04:00
Tim Gross	6563d0ec3c	wait for service registration cleanup until allocs marked lost (#26424 ) When a node misses a heartbeat and is marked down, Nomad deletes service registration instances for that node. But if the node then successfully heartbeats before its allocations are marked lost, the services are never restored. The node is unaware that it has missed a heartbeat and there's no anti-entropy on the node in any case. We already delete services when the plan applier marks allocations as stopped, so deleting the services when the node goes down is only an optimization to more quickly divert service traffic. But because the state after a plan apply is the "canonical" view of allocation health, this breaks correctness. Remove the code path that deletes services from nodes when nodes go down. Retain the state store code that deletes services when allocs are marked terminal by the plan applier. Also add a path in the state store to delete services when allocs are marked terminal by the client. This gets back some of the optimization but avoids the correctness bug because marking the allocation client-terminal is a one way operation. Fixes: https://github.com/hashicorp/nomad/issues/16983	2025-08-06 13:40:37 -04:00
tehut	21841d3067	Add historical journald and log export flags to operator debug command (#26410 ) * Add -log-file-export and -log-lookback commands to add historical log to debug capture * use monitor.PrepFile() helper for other historical log tests	2025-08-04 13:55:25 -07:00
Daniel Bennett	7c633f8109	exec: don't panic on rootless raw_exec tasks (#26401 ) the executor dies, leaving an orphaned process still running. the panic fix: * don't `panic()` * and return an empty, but non-nil, func on cgroup error feature fix: * allow non-root agent to proceed with exec when cgroups are off	2025-08-04 13:58:35 -04:00
tehut	d709accaf5	Add nomad monitor export command (#26178 ) * Add MonitorExport command and handlers * Implement autocomplete * Require nomad in serviceName * Fix race in StreamReader.Read * Add and use framer.Flush() to coordinate function exit * Add LogFile to client/Server config and read NomadLogPath in rpcHandler instead of HTTPServer * Parameterize StreamFixed stream size	2025-08-01 10:26:59 -07:00
Gautam Kumar	6f81222ec8	CL: improve `acl policy self` output for management tokens (#26396 ) Improved the acl policy self CLI command to handle both management and client tokens. Management tokens now display a clear message indicating global access with no individual policies. Fixes: https://github.com/hashicorp/nomad/issues/26389	2025-08-01 09:02:47 -04:00
Tim Gross	333dd94362	scheduler: exit early on count=0 and filter out server-terminal (#26292 ) When a task group is removed from a jobspec, the reconciler stops all allocations and immediately returns from `computeGroup`. We can do the same for when the group has been scaled-to-zero, but doing so runs into an inconsistency in the way that server-terminal allocations are handled. Prior to this change server-terminal allocations fall through `computeGroup` without being marked as `ignore`, unless they are terminal canaries, in which case they are marked `stop` (but this is a no-op). This inconsistency causes a _tiny_ amount of extra `Plan.Submit`/Raft traffic, but more importantly makes it more difficult to make test assertions for `stop` vs `ignore` vs fallthrough. Remove this inconsistency by filtering out server-terminal allocations early in `computeGroup`. This brings the cluster reconciler's behavior closer to the node reconciler's behavior, except that the node reconciler discards _all_ terminal allocations because it doesn't support rescheduling. This changeset required adjustments to two tests, but the tests themselves were a bit of a mess: * In https://github.com/hashicorp/nomad/pull/25726 we added a test of how canaries were treated when on draining nodes. But the test didn't correctly configure the job with an update block, leading to misleading test behavior. Fix the test to exercise the intended behavior and refactor for clarity. * While working on reconciler behaviors around stopped allocations, I found it extremely hard to follow the intent of the disconnected client tests because many of the fields in the table-driven test are switches for more complex behavior or just tersely named. Attempt to make this a little more legible by moving some branches directly into fields, renaming some fields, and flattening out some branching. Ref: https://hashicorp.atlassian.net/browse/NMD-819	2025-07-18 08:51:52 -04:00
Allison Larson	918e1eb123	Correctly canonicalize lifecycle block when missing hook value (#26285 )	2025-07-16 11:40:16 -07:00
Allison Larson	3ca518e89c	Add node_pool to blockedEval metric (#26215 ) Adds the node_pool to the blockedEval metrics that get emitted for resource/cpu, along with the dc and node class.	2025-07-15 09:48:04 -07:00
Tim Gross	279775082c	sysbatch: correctly validate that reschedule policy is not allowed (#26279 ) System and sysbatch jobs don't support the reschedule block, because we'd always replace allocations back onto the same node. The job validation for system jobs asserts that the user hasn't set a `reschedule` block so that users aren't submitting jobs expecting it to be supported. But this validation was missing for sysbatch jobs. Validate that sysbatch jobs don't have a reschedule block.	2025-07-15 10:47:02 -04:00
Daniel Bennett	089c148236	allocrunner: run all postrun hooks, even on error (#26271 ) e.g. if the consul postrun hook fails, continue running the subsequent postrun hooks, which among other things includes network/CNI/iptables cleanup.	2025-07-14 13:55:33 -04:00
Tim Gross	e13ceab855	host volumes: require allocs to be client terminal to delete vols (#26213 ) The RPC handler for deleting dynamic host volumes has a check that any allocations associated with a volume are client-terminal before deleting the volume. But the state store delete that happens after we send client RPCs to the plugin checks that the allocs are non-terminal on both server and client. This can improperly allow deleting a volume from a client but then not being able to delete it from the state store because of a time-of-check / time-of-use bug. If the allocation fails/completes on the client before the server marks its desired status as terminal, or if the allocation is marked server-terminal during the client RPC, we can get a volume that passes the first check but not the second check that happens in the state store and cannot be deleted. Update the state store delete method to require that any allocation for a volume is client terminal in order to delete the volume, not just server terminal. Fixes: https://github.com/hashicorp/nomad/issues/26140 Ref: https://hashicorp.atlassian.net/browse/NMD-883	2025-07-07 14:48:06 -04:00
Tim Gross	5c909213ce	scheduler: add reconciler annotations to completed evals (#26188 ) The output of the reconciler stage of scheduling is only visible via debug-level logs, typically accessible only to the cluster admin. We can give job authors better ability to understand what's happening to their jobs if we expose this information to them in the `eval status` command. Add the reconciler's desired updates to the evaluation struct so it can be exposed in the API. This increases the size of evals by roughly 15% in the state store, or a bit more when there are preemptions (but we expect this will be a small minority of evals). Ref: https://hashicorp.atlassian.net/browse/NMD-818 Fixes: https://github.com/hashicorp/nomad/issues/15564	2025-07-07 09:40:21 -04:00

1 2 3 4 5 ...

1492 Commits