nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Juana De La Cuesta	bdfd573fc4	Update the scaling policies when deregistering a job (#25911 ) * func: Update the scaling policies when deregistering a job * func: Add tests for updating the policy * docs: add changelog * func: set back the old order * style: rearrange for clarity and to reuse the watchset * func: set the policies to teh last submitted when starting a job * func: expand tests of teh start job command to include job submission * func: Expand the tests to verify the correct state of the scaling policy after job start * Update command/job_start.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update nomad/fsm_test.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * func: add warning when there is no previous job submission --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-02 16:11:38 +02:00
Michael Smithhisler	4c8257d0c7	client: add once mode to template block (#25922 )	2025-05-28 11:45:11 -04:00
Tim Gross	3f59860254	host volumes: add configuration to GC on node GC (#25903 ) When a node is garbage collected, any dynamic host volumes on the node are orphaned in the state store. We generally don't want to automatically collect these volumes and risk data loss, and have provided a CLI flag to `-force` remove them in #25902. But for clusters running on ephemeral cloud instances (ex. AWS EC2 in an autoscaling group), deleting host volumes may add excessive friction. Add a configuration knob to the client configuration to remove host volumes from the state store on node GC. Ref: https://github.com/hashicorp/nomad/pull/25902 Ref: https://github.com/hashicorp/nomad/issues/25762 Ref: https://hashicorp.atlassian.net/browse/NMD-705	2025-05-27 10:22:08 -04:00
tehut	55523ecf8e	Add NodeMaxAllocations to client configuration (#25785 ) * Set MaxAllocations in client config Add NodeAllocationTracker struct to Node struct Evaluate MaxAllocations in AllocsFit function Set up cli config parsing Integrate maxAllocs into AllocatedResources view Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-05-22 12:49:27 -07:00
Daniel Bennett	15c01e5a49	ipv6: normalize addrs per RFC-5942 §4 (#25921 ) https://datatracker.ietf.org/doc/html/rfc5952#section-4 * copy NormalizeAddr func from vault * PRs hashicorp/vault#29228 & hashicorp/vault#29517 * normalize bind/advertise addrs * normalize consul/vault addrs	2025-05-22 14:21:30 -04:00
Chris Roberts	1aa416e2f2	Support applying policy to all jobs within namespace (#25871 ) Workflow identities currently support ACL policies being applied to a job ID within a namespace. With this update an ACL policy can be applied to a namespace. This results in the ACL policy being applied to all jobs within the namespace.	2025-05-21 07:44:14 -07:00
Tim Gross	41cf1b03b4	host volumes: -force flag for delete (#25902 ) When a node is garbage collected, we leave behind the dynamic host volume in the state store. We don't want to automatically garbage collect the volumes and risk data loss, but we should allow these to be removed via the API. Fixes: https://github.com/hashicorp/nomad/issues/25762 Fixes: https://hashicorp.atlassian.net/browse/NMD-705	2025-05-21 08:55:52 -04:00
Piotr Kazmierczak	cdc308a0eb	wi: new endpoint for listing workload attached ACL policies (#25588 ) This introduces a new HTTP endpoint (and an associated CLI command) for querying ACL policies associated with a workload identity. It allows users that want to learn about the ACL capabilities from within WI-tasks to know what sort of policies are enabled. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-05-19 19:54:12 +02:00
Martina Santangelo	18eddf53a4	commands: adds job start command to start stopped jobs (#24150 ) --------- Co-authored-by: Michael Smithhisler <michael.smithhisler@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-05-14 15:17:44 -04:00
James Rasell	ef25c3d55a	cli: Fix help indentation format on node meta commands. (#25851 )	2025-05-14 14:53:48 +01:00
Tim Gross	8a5a057d88	offline license utilization reporting (#25844 ) Nomad Enterprise users operating in air-gapped or otherwise secured environments don't want to send license reporting metrics directly from their servers. Implement manual/offline reporting by periodically recording usage metrics snapshots in the state store, and providing an API and CLI by which cluster administrators can download the snapshot for review and out-of-band transmission to HashiCorp. This is the CE portion of the work required for implemention in the Enterprise product. Nomad CE does not perform utilization reporting. Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673 Ref: https://hashicorp.atlassian.net/browse/NMD-68 Ref: https://go.hashi.co/rfc/nmd-210	2025-05-14 09:51:13 -04:00
hc-github-team-nomad-core	9ef42e9807	Generate files for 1.10.1 release	2025-05-13 14:26:48 +02:00
James Rasell	0b265d2417	encrypter: Track initial tasks for is ready calculation. (#25803 ) The server startup could "hang" to the view of an operator if it had a key that could not be decrypted or replicated loaded from the FSM at startup. In order to prevent this happening, the server startup function will now use a timeout to wait for the encrypter to be ready. If the timeout is reached, the error is sent back to the caller which fails the CLI command. This bubbling of error message will also flush to logs which will provide addition operator feedback. The server only cares about keys loaded from the FSM snapshot and trailing logs before the encrypter should be classed as ready. So that the encrypter ready function does not get blocked by keys added outside of the initial Raft load, we take a snapshot of the decryption tasks as we enter the blocking call, and class these as our barrier.	2025-05-07 15:38:16 +01:00
Tim Gross	da592ab1b7	testing: fix vault setup test's reliance on specific Raft index (#25806 ) The test for `nomad setup vault` command expects a specific `CreateIndex` for the job it creates. Any Raft write when a server comes up or establishes leadership can cause this test to break. Interpolate the expected index as we've done for other indexes on the job to make this test less brittle. Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673#issuecomment-2847619747	2025-05-02 14:30:10 -04:00
Juanadelacuesta	9288a3141a	func and docs: Use the config from the client and not from the agent that is already parsed. Add the breaking change to the release notes	2025-04-30 10:53:02 +02:00
Juanadelacuesta	949571e313	func: read the config from the agent, dont reparse	2025-04-24 05:01:53 +02:00
Juanadelacuesta	46343ee56e	func: use the client's configured drain deadline to calculate the graceful timeout when terminating an agent	2025-04-23 23:59:50 +02:00
Juanadelacuesta	c91f24681d	style: add changelog	2025-04-23 23:28:54 +02:00
Juana De La Cuesta	9778a31e29	Update command/agent/command.go Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-04-23 23:18:09 +02:00
Juana De La Cuesta	39b3d63172	Update command/agent/command.go Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-04-23 23:18:02 +02:00
Juana De La Cuesta	313f430fdd	Update command/agent/command.go Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2025-04-23 23:17:36 +02:00
Juanadelacuesta	b47b962439	fix: use a timer instead of a time.After to avoid memory leaks	2025-04-23 16:26:46 +02:00
Juanadelacuesta	38c27b7e7f	fix: make teh disctintion when for os.Interrupt	2025-04-23 16:20:04 +02:00
Juanadelacuesta	adf038b495	fix: correct the logic for LeaveOnTerm or LeaveOnInt depending on the incoming signal	2025-04-23 16:03:12 +02:00
Juanadelacuesta	b375974bc3	style: add comments	2025-04-23 15:47:37 +02:00
Juanadelacuesta	c5c4272aee	func: force agent return if there is an error on reload	2025-04-23 15:14:48 +02:00
Piotr Kazmierczak	df3b00bce0	acl: use WhoAmI RPC endpoint in /acl/token/self (#25547 ) ResolveToken RPC endpoint was only used by the /acl/token/self API. We should migrate to the WI-aware WhoAmI instead. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-04-22 17:53:39 +02:00
Daniel Bennett	c46521a80d	cli: operator debug: respect NOMAD_REGION env var (#25716 ) properly filter out regions other than the one specified like the -namespace flag does	2025-04-21 17:06:50 -04:00
tehut	b11619010e	Add priority flag to Dispatch CLI and API (#25622 ) * Add priority flag to Dispatch CLI and DispatchOpts() helper to HTTP API	2025-04-18 13:24:52 -07:00
Arian van Putten	d28af58cbb	agent: implement sd-notify reload correctly (#25636 ) First of all, we should not send the unix time, but the monotonic time. Second of all, RELOADING= and MONOTONIC_USEC fields should be sent in single message not two separate messages. From the man page of [systemd.service](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Type=) > notification message via sd_notify(3) that contains the "RELOADING=1" field in > combination with "MONOTONIC_USEC=" set to the current monotonic time (i.e. > CLOCK_MONOTONIC in clock_gettime(2)) in μs, formatted as decimal string. [sd_notify](https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html) now has code samples of the protocol to clarify. Without these changes, if you'd set Type=notify-reload on the agen'ts systemd unit, systemd would kill the service due to the service not responding to reload correctly.	2025-04-14 11:38:56 -04:00
Michael Schurter	c5451cf300	Merge pull request #25635 from hashicorp/post-1.10.0-release Post 1.10.0 release	2025-04-10 10:32:24 -07:00
Tim Gross	27caae2b2a	api: make attempting to remove peer by address a no-op (#25599 ) In Nomad 1.4.0 we removed support for Raft Protocol v2 entirely. But the `Operator.RemoveRaftPeerByAddress` RPC handler was left in place, along with its supporting HTTP API and command line flags. Using this API will always result in the Raft library error "operation not supported with current protocol version". Unfortunately it's still possible in unit tests to exercise this code path, and these tests are quite flaky. This changeset turns the RPC handler and HTTP API into a no-op, removes the associated command line flags, and removes the flaky tests. I've also cleaned up the test for `RemoveRaftPeerByID` to consolidate test servers and use `shoenig/test`. Fixes: https://hashicorp.atlassian.net/browse/NET-12413 Ref: https://github.com/hashicorp/nomad/pull/13467 Ref: https://developer.hashicorp.com/nomad/docs/upgrade/upgrade-specific#raft-protocol-version-2-unsupported Ref: https://github.com/hashicorp/nomad-enterprise/actions/runs/13201513025/job/36855234398?pr=2302	2025-04-10 09:19:25 -04:00
hc-github-team-nomad-core	71af41b4b1	Generate files for 1.10.0 release	2025-04-09 16:03:21 -07:00
hc-github-team-nomad-core	239c5f11ee	Generate files for 1.10.0 release	2025-04-09 16:03:21 -07:00
Daniel Bennett	5c8e436de9	auth: oidc: disable pkce by default (#25600 ) our goal of "enable by default, only for new auth methods" proved to be unwieldy, so instead make it a simple bool, disabled by default.	2025-04-07 12:36:09 -05:00
hc-github-team-nomad-core	a18faebda1	Generate files for 1.10.0-rc.1 release	2025-04-03 18:21:58 +00:00
Daniel Bennett	6a0c4f5a3d	auth: oidc: enable pkce only on new auth methods (#25593 ) trying not to violate the principle of least astonishment. we want to only auto-enable PKCE on new auth methods, rather than new or updated auth methods, to avoid a scenario where a Nomad admin updates an auth method sometime in the future -- something innocent like a new client secret -- and their OIDC provider doesn't like PKCE. the main concern is that the provider won't like PKCE in a totally confusing way. error messages rarely say PKCE directly, so why the user's auth method suddenly broke would be a big mystery. this means that to enable it on existing auth methods, you would set `OIDCDisablePKCE = false`, and the double- negative doesn't feel right, so instead, swap the language, so enabling it on existing methods reads sensibly, and to disable it on new methods reads ok-enough: `OIDCEnablePKCE = false`	2025-04-03 10:56:17 -05:00
Nikita Eliseev	76fb3eb9a1	rpc: added configuration for yamux session (#25466 ) Fixes: https://github.com/hashicorp/nomad/issues/25380	2025-04-02 10:58:23 -04:00
Allison Larson	17d191ae24	Add -group flag to `alloc exec`, `alloc logs` command (#25568 ) * Add -group flag to `alloc exec`, `alloc logs` command * fixup! Add -group flag to `alloc exec`, `alloc logs` command * Add -group option to alloc fs * Add changelog	2025-03-31 14:17:45 -07:00
Daniel Bennett	99c25fc635	dhv: mkdir plugin parameters: uid,guid,mode (#25533 ) also remove Error logs from client rpc and promote plugin Debug logs to Error (since they have more info in them)	2025-03-28 10:13:13 -05:00
James Rasell	ea25503705	cli: Use meta response index to start monitoring volume create. (#25514 )	2025-03-25 14:00:46 +00:00
James Rasell	27ad88ac17	test: Calculate agent endpoint scheduler count, not static. (#25473 )	2025-03-21 13:47:53 +00:00
James Rasell	b3f28f9387	test: Use runtime CPUs for test not static number. (#25458 )	2025-03-20 09:05:36 +00:00
James Rasell	5a157eb123	server: Validate config num schedulers is between 0 and num CPUs. (#25441 ) The `server.num_scheduler` configuration value should be a value between 0 and the number of CPUs on the machine. The Nomad agent was not validating the configuration parameter which meant you could use a negative value or a value much larger than the available machine CPUs. This change enforces validation of the configuration value both on server startup and when the agent is reloaded. The Nomad API was only performing negative value validation when updating the scheduler number via this method. This change adds to the validation to ensure the number is not greater than the CPUs on the machine.	2025-03-20 07:29:57 +00:00
James Rasell	61b2b9d3d0	agent: Improve retry joiner code with small refactor. (#25422 ) The agent retry joiner implementation had different parameters to control its execution for agents running in server and client mode. The agent would set up individual joiners depending on the agent mode, making the object parameter overhead unrequired. This change removes the excess configuration options for the joiner, reducing code complexity slighly and hopefully making future modifications in this area easier to make.	2025-03-18 15:55:52 +00:00
James Rasell	3e1f56c1c0	cli: Add volume type to delete error messages when API call fails. (#25392 )	2025-03-14 14:59:41 +00:00
Daniel Bennett	3322254e5b	cli: acl auth-method info: add client assertion (#25370 ) and pkce	2025-03-12 12:38:03 -05:00
hc-github-team-nomad-core	e1b9bd8ab0	Generate files for 1.10.0-beta.1 release	2025-03-12 10:37:46 +00:00
Daniel Bennett	04db81951f	test: fix go 1.24 test complaints (#25346 ) e.g. Error: nomad/leader_test.go:382:12: non-constant format string in call to (*testing.common).Fatalf	2025-03-11 11:01:39 -05:00
Tim Gross	1ffb7ab3fb	dynamic host volumes: allow plugins to return an error message (#25341 ) Errors from `volume create` or `volume delete` only get logged by the client agent, which may make it harder for volume authors to debug these tasks if they are not also the cluster administrator with access to host logs. Allow plugins to include an optional error message in their response. Because we can't count on receiving this response (the error could come before the plugin executes), we parse this message optimistically and include it only if available. Ref: https://hashicorp.atlassian.net/browse/NET-12087	2025-03-11 11:06:57 -04:00

1 2 3 4 5 ...

3955 Commits