nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Michael Smithhisler	485356c3d3	csi: fix volume registration error (#26642 )	2025-08-27 15:00:16 -04:00
James Rasell	e5eb125264	agent: Ensure node identity renew handler decodes the request body. (#26638 ) The HTTP request body contains the node ID where the request should be routed and without decoding this, we cannot route to anything other than local nodes.	2025-08-27 14:06:12 +01:00
Chris Roberts	4b9597a31d	[agent] Fix error checking within retry join (#26434 ) The `RetryJoin` function checks for an error and logs it before retrying. The error variables were shadowed which resulted in the errors never being logged. This predefines the variables to prevent them from being shadowed. The testlog package was also updated to support providing a custom writer which allows logging output to be easily caught and inspected.	2025-08-26 14:18:12 -07:00
Juana De La Cuesta	e7868639d6	func: add the correct value for costumer feedback on var error (#26601 )	2025-08-21 15:37:53 +02:00
Michael Smithhisler	da4cf07ff4	logs: skip logging SIGPIPE signal (#26582 )	2025-08-21 09:08:49 -04:00
James Rasell	3b0b7db1a1	client: Add client identity API, CLI, and RPC workflow. (#26543 ) The Nomad clients store their Nomad identity in memory and within their state store. While active, it is not possible to dump the state to view the stored identity token, so having a way to view the current claims while running aids debugging and operations. This change adds a client identity workflow, allowing operators to view the current claims of the nodes identity. It does not return any of the signing key material.	2025-08-19 08:25:51 +01:00
James Rasell	1ae83114c1	ci: Run hclogvet across all codebase and fix found issue. (#26545 )	2025-08-18 15:06:11 +01:00
Daniel Bennett	9f806e3063	Post 1.10.4 release main (#26521 ) * Generate files for 1.10.4 release * Prepare for next release * Merge release 1.10.4 files --------- Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2025-08-14 12:22:32 -04:00
Joey	c997afe0de	chore: Fix function name in comment (#26511 )	2025-08-13 15:06:50 +01:00
Allison Larson	e16a3339ad	Add CSI Volume Sentinel Policy scaffolding (#26438 ) * Add ent policy enforcement stubs to CSI Volume create/register * Wire policy override/warnings through CSI volume register/create * Add new scope to sentinel apply * Sanitize CSISecrets & CSIMountOptions * Add sentinel policy scope to ui * Update docs for new sentinel scope/policy * Create new api funcs for CSI endpoints * fix sentinel csi ui test * Update sentinel-policy docs * Add changelog * Update docs from feedback	2025-08-07 12:03:18 -07:00
Aimee Ukasick	a30cb2f137	Update UI, code comment, and README links to docs, tutorials (#26429 ) * Update UI, code comment, and README links to docs, tutorials * fix typo in ephemeral disks learn more link url * feedback on typo Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-08-06 09:40:23 -05:00
James Rasell	1c63ad50d9	Merge pull request #26430 from hashicorp/f-NMD-763-introduction introduction: The initial implementation code for node introduction.	2025-08-06 14:41:16 +02:00
Tim Gross	0ae5b3f39b	eval status: sort plan annotations by task group (#26428 ) The plan annotations table isn't sorted by task group, which makes for a less beautiful UX and a flaky test.	2025-08-05 09:36:12 -04:00
James Rasell	ad508616dc	Merge branch 'main' into f-NMD-763-introduction	2025-08-05 08:56:51 +01:00
James Rasell	350662c88e	Merge pull request #26291 from hashicorp/f-NMD-763-identity identity: The initial implementation code for node identity.	2025-08-05 09:52:28 +02:00
James Rasell	80a26306bf	intro: Add node introduction flow for Nomad client registration. (#26405 ) This change implements the client -> server workflow for Nomad node introduction. A Nomad node can optionally be started with an introduction token, which is a signed JWT containing claims for the node registration. The server handles this according to the enforcement configuration. The introduction token can be provided by env var, cli flag, or by placing it within a default filesystem location. The latter option does not override the CLI or env var. The region claims has been removed from the initial claims set of the intro identity. This boundary is guarded by mTLS and aligns with the node identity.	2025-08-05 08:23:44 +01:00
tehut	21841d3067	Add historical journald and log export flags to operator debug command (#26410 ) * Add -log-file-export and -log-lookback commands to add historical log to debug capture * use monitor.PrepFile() helper for other historical log tests	2025-08-04 13:55:25 -07:00
tehut	d709accaf5	Add nomad monitor export command (#26178 ) * Add MonitorExport command and handlers * Implement autocomplete * Require nomad in serviceName * Fix race in StreamReader.Read * Add and use framer.Flush() to coordinate function exit * Add LogFile to client/Server config and read NomadLogPath in rpcHandler instead of HTTPServer * Parameterize StreamFixed stream size	2025-08-01 10:26:59 -07:00
Gautam Kumar	6f81222ec8	CL: improve `acl policy self` output for management tokens (#26396 ) Improved the acl policy self CLI command to handle both management and client tokens. Management tokens now display a clear message indicating global access with no individual policies. Fixes: https://github.com/hashicorp/nomad/issues/26389	2025-08-01 09:02:47 -04:00
James Rasell	20251b675d	Add CLI and API components for creating node introduction tokens via ACL endpoint. (#26332 )	2025-07-25 13:28:45 +01:00
James Rasell	5989d5862a	ci: Update golangci-lint to v2 and fix highlighted issues. (#26334 )	2025-07-25 10:44:08 +01:00
James Rasell	842f316615	Merge branch 'main' into f-NMD-763-introduction	2025-07-25 08:27:53 +01:00
James Rasell	2ef837f02f	cli: Ensure all no argument console messages are the same. (#26331 ) Use a constant to ensure consistency across the CLI when displaying a console message indicating the command was passed arguments when it takes none.	2025-07-25 07:05:10 +01:00
James Rasell	62f1dbebfb	server: Add RPC and HTTP functionality for node intro token gen. (#26320 ) The node introduction workflow will utilise JWT's that can be used as authentication tokens on initial client registration. This change implements the basic builder for this JWT claim type and the RPC and HTTP handler functionality that will expose this to the operator.	2025-07-23 14:32:26 +01:00
James Rasell	7466dd71b2	server: Add new `server.client_introduction` config block. (#26315 ) The new configuration block exposes some key options which allow cluster administrators to control certain client introduction behaviours. This change introduces the new block and plumbing, so that it is exposed in the Nomad server for consumption via internal processes.	2025-07-22 08:50:19 +01:00
James Rasell	dce4284361	Merge branch 'main' into f-NMD-763-identity	2025-07-17 07:35:16 +01:00
James Rasell	953a149180	client: Allow operators to force a client to renew its identity. (#26277 ) The Nomad client will have its identity renewed according to the TTL which defaults to 24h. In certain situations such as root keyring rotation, operators may want to force clients to renew their identities before the TTL threshold is met. This change introduces a client HTTP and RPC endpoint which will instruct the node to request a new identity at its next heartbeat. This can be used via the API or a new command. While this is a manual intervention step on top of the any keyring rotation, it dramatically reduces the initial feature complexity as it provides an asynchronous and efficient method of renewal that utilises existing functionality.	2025-07-16 14:56:00 +01:00
Tim Gross	35f3f6ce41	scheduler: add disconnect and reschedule info to reconciler output (#26255 ) The `DesiredUpdates` struct that we send to the Read Eval API doesn't include information about disconnect/reconnect and rescheduling. Annotate the `DesiredUpdates` with this data, and adjust the `eval status` command to display only those fields that have non-zero values in order to make the output width manageable. Ref: https://hashicorp.atlassian.net/browse/NMD-815	2025-07-16 08:46:38 -04:00
Tim Gross	b23ab5ac15	docs: clarify requirements for deleting volumes (#26240 ) If you delete a CSI volume, the volume cannot be currently claimed by an allocation or in the process of being unpublished. This is documented in the CLI but not the API. Also, the documentation incorrectly says that the `volume delete` command silently returns without error if the volume doesn't exist, but that's incorrect. Fixes: https://github.com/hashicorp/nomad/issues/24756	2025-07-11 15:01:06 -04:00
hc-github-team-nomad-core	ccba3ae6a2	Generate files for 1.10.3 release	2025-07-08 16:47:39 -07:00
James Rasell	2f30205102	client: Add state functionality for set and get client identities. (#26184 ) The Nomad client will persist its own identity within its state store for restart persistence. The added benefit of using it over the filesystem is that it supports transactions. This is useful when considering the identity will be renewed periodically.	2025-07-07 15:28:27 +01:00
Tim Gross	5c909213ce	scheduler: add reconciler annotations to completed evals (#26188 ) The output of the reconciler stage of scheduling is only visible via debug-level logs, typically accessible only to the cluster admin. We can give job authors better ability to understand what's happening to their jobs if we expose this information to them in the `eval status` command. Add the reconciler's desired updates to the evaluation struct so it can be exposed in the API. This increases the size of evals by roughly 15% in the state store, or a bit more when there are preemptions (but we expect this will be a small minority of evals). Ref: https://hashicorp.atlassian.net/browse/NMD-818 Fixes: https://github.com/hashicorp/nomad/issues/15564	2025-07-07 09:40:21 -04:00
James Rasell	d6757609dc	cli: Fix a bug where self token lookups via token CLI flag failed. (#26183 ) The meta client looks for both an environment variable and a CLI flag when generating a client. The CLI UUID checker needs to do this also, so we account for users using both env vars and CLI flag tokens.	2025-07-03 13:50:42 +01:00
Chris Roberts	493e7b2faa	command: prevent server panic on graceful shutdown (#26171 ) When performing a graceful shutdown the client drain configuration is checked for a deadline which is appended to the timeout. When running as a server the client will not be set. Attempting to get the drain deadline will result in a panic. This checks for the client being available prior to fetching the deadline value.	2025-07-01 15:54:03 -07:00
James Rasell	d5b2d5078b	rpc: Generate node identities with node RPC handlers when needed. (#26165 ) When a Nomad client register or re-registers, the RPC handler will generate and return a node identity if required. When an identity is generated, the signing key ID will be stored within the node object, to ensure a root key is not deleted until it is not used. During normal client operation it will periodically heartbeat to the Nomad servers to indicate aliveness. The RPC handler that is used for this action has also been updated to conditionally perform identity generation. Performing it here means no extra RPC handlers are required and we inherit the jitter in identity generation from the heartbeat mechanism. The identity generation check methods are performed from the RPC request arguments, so they a scoped to the required behaviour and can handle the nuance of each RPC. Failure to generate an identity is considered terminal to the RPC call. The client will include behaviour to retry this error which is always caused by the encrypter not being ready unless the servers keyring has been corrupted.	2025-07-01 16:07:21 +01:00
Allison Larson	63f0788747	Expose Kind field for Consul Service Registrations (#26170 ) * consul: Add service kind to jobspec * consul: Add kind to service docs * Add changelog	2025-06-30 14:32:23 -07:00
Tim Gross	aa3c08d069	eval status: enrich with related evals and placed allocs tables (#26156 ) When debugging an evaluation, you almost always want to know about all the related evaluations and what allocations were placed by that evaluation (and where), not just failed placements. We can enrich the command by adding the `related` query parameter to the API, and having the command query for the evaluations allocations automatically. Emit this data as a pair of new tables and expose fields like quota limits, and previous/next/blocked eval without the `-verbose` flag. Update the docs to include the full output and remove references to long-removed behavior of the `-json` flag. Ref: https://hashicorp.atlassian.net/browse/NMD-818 Ref: https://go.hashi.co/rfc/nmd-212	2025-06-30 09:23:36 -04:00
Piotr Kazmierczak	7647491588	cli: fix panic when starting stopped jobs with no scaling policies (#26131 ) Restoring scaling policies during the start of a stopped job did not account for jobs that didn't have any scaling policies, and led to a panic when users tried to restart such jobs.	2025-06-25 11:19:56 +02:00
James Rasell	7a5f5750b0	test: Wait for client when enabled in test agent if possible. (#26129 ) When a test starts an agent and the client is enabled, we can wait until this reaches the ready state within the set up method. This mimics what we already do with leadership and the root keyring and should reduce flakey tests where it assume the client is ready as soon as the set up function returns, which is not guaranteed. The change exposed a couple of TLS reload tests which were not using the test agent correctly. They were setting up a client even though it would never be able to join the cluster due to TLS configuration issues. These have been fixed.	2025-06-25 10:00:28 +01:00
James Rasell	216140255d	cli: Do not always add global DNS name to certificate DNS names. (#26086 ) No matter the passed region identifier, the CLI was always adding "<role>.global.nomad" to the certificate DNS names. This is not what we expect and has been removed. While here, the long deprecated cluster-region flag has been removed. This removal only impacts CLI functionality, so is safe to do.	2025-06-25 07:35:56 +01:00
Chris Roberts	4dbf645bf7	command: prevent panic on graceful shutdown (#26018 ) When performing a graceful shutdown a channel is used to wait for the agent to leave. The channel is closed when the agent leaves successfully, but it also is closed within a deferral. If the agent successfully leaves and closes the channel, a panic will occur when the channel is closed the second time within the deferral. To prevent this from occurring, the channel closing is wrapped within a `OnceFunc` so the channel is only closed once.	2025-06-12 09:35:57 -07:00
Chris Roberts	eeec603975	command: prevent early exit from graceful shutdown (#26023 ) While waiting for the agent to leave during a graceful shutdown the wait can be interrupted immediately if another signal is received. It is common that while waiting a `SIGPIPE` is received from journald causing the wait to end early. This results in the agent not finishing the leave process and reporting an error when the process has stopped. Instead of allowing any signal to interrupt the wait, the signal is checked for a `SIGPIPE` and if matched will continue waiting.	2025-06-12 08:56:55 -07:00
hc-github-team-nomad-core	1e49d9eb44	Generate files for 1.10.2 release	2025-06-10 14:35:25 -07:00
James Rasell	e95148c10d	consul: Fix data race within test by using mutex to read map. (#25977 )	2025-06-04 15:09:37 +01:00
Juana De La Cuesta	bdfd573fc4	Update the scaling policies when deregistering a job (#25911 ) * func: Update the scaling policies when deregistering a job * func: Add tests for updating the policy * docs: add changelog * func: set back the old order * style: rearrange for clarity and to reuse the watchset * func: set the policies to teh last submitted when starting a job * func: expand tests of teh start job command to include job submission * func: Expand the tests to verify the correct state of the scaling policy after job start * Update command/job_start.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update nomad/fsm_test.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * func: add warning when there is no previous job submission --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-02 16:11:38 +02:00
Michael Smithhisler	4c8257d0c7	client: add once mode to template block (#25922 )	2025-05-28 11:45:11 -04:00
Tim Gross	3f59860254	host volumes: add configuration to GC on node GC (#25903 ) When a node is garbage collected, any dynamic host volumes on the node are orphaned in the state store. We generally don't want to automatically collect these volumes and risk data loss, and have provided a CLI flag to `-force` remove them in #25902. But for clusters running on ephemeral cloud instances (ex. AWS EC2 in an autoscaling group), deleting host volumes may add excessive friction. Add a configuration knob to the client configuration to remove host volumes from the state store on node GC. Ref: https://github.com/hashicorp/nomad/pull/25902 Ref: https://github.com/hashicorp/nomad/issues/25762 Ref: https://hashicorp.atlassian.net/browse/NMD-705	2025-05-27 10:22:08 -04:00
tehut	55523ecf8e	Add NodeMaxAllocations to client configuration (#25785 ) * Set MaxAllocations in client config Add NodeAllocationTracker struct to Node struct Evaluate MaxAllocations in AllocsFit function Set up cli config parsing Integrate maxAllocs into AllocatedResources view Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-05-22 12:49:27 -07:00
Daniel Bennett	15c01e5a49	ipv6: normalize addrs per RFC-5942 §4 (#25921 ) https://datatracker.ietf.org/doc/html/rfc5952#section-4 * copy NormalizeAddr func from vault * PRs hashicorp/vault#29228 & hashicorp/vault#29517 * normalize bind/advertise addrs * normalize consul/vault addrs	2025-05-22 14:21:30 -04:00
Chris Roberts	1aa416e2f2	Support applying policy to all jobs within namespace (#25871 ) Workflow identities currently support ACL policies being applied to a job ID within a namespace. With this update an ACL policy can be applied to a namespace. This results in the ACL policy being applied to all jobs within the namespace.	2025-05-21 07:44:14 -07:00

1 2 3 4 5 ...

3999 Commits