nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-10 20:35:42 +03:00

Author	SHA1	Message	Date
Piotr Kazmierczak	199d12865f	scheduler: isolate `feasibility` (#26031 ) This change isolates all the code that deals with node selection in the scheduler into its own package called feasible. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-11 20:11:04 +02:00
Michael Smithhisler	5c4d0e923d	consul: Remove legacy token based authentication workflow (#25217 )	2025-03-05 15:38:11 -05:00
James Rasell	7268053174	vault: Remove legacy token based authentication workflow. (#25155 ) The legacy workflow for Vault whereby servers were configured using a token to provide authentication to the Vault API has now been removed. This change also removes the workflow where servers were responsible for deriving Vault tokens for Nomad clients. The deprecated Vault config options used byi the Nomad agent have all been removed except for "token" which is still in use by the Vault Transit keyring implementation. Job specification authors can no longer use the "vault.policies" parameter and should instead use "vault.role" when not using the default workload identity. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>	2025-02-28 07:40:02 +00:00
Piotr Kazmierczak	611452e1af	stateful deployments: use `TaskGroupVolumeClaim` table to associate volume requests with volume IDs (#24993 ) We introduce an alternative solution to the one presented in #24960 which is based on the state store and not previous-next allocation tracking in the reconciler. This new solution reduces cognitive complexity of the scheduler code at the cost of slightly more boilerplate code, but also opens up new possibilities in the future, e.g., allowing users to explicitly "un-stick" volumes with workloads still running. The diagram below illustrates the new logic: SetVolumes() upsertAllocsImpl() sets ns, job +-----------------checks if alloc requests tg in the scheduler v sticky vols and consults \| +-----------------------+ state. If there is no claim, \| \| TaskGroupVolumeClaim: \| it creates one. \| \| - namespace \| \| \| - jobID \| \| \| - tg name \| \| \| - vol ID \| v \| uniquely identify vol \| hasVolumes() +----+------------------+ consults the state \| ^ and returns true \| \| DeleteJobTxn() if there's a match <-----------+ +---------------removes the claim from or if there is no the state previous claim \| \| \| \| +-----------------------------+ +------------------------------------------------------+ scheduler state store	2025-02-07 17:41:01 +01:00
Tim Gross	6a3803c31e	dynamic host volumes: RPC handlers (#24373 ) This changeset implements the RPC handlers for Dynamic Host Volumes, including the plumbing needed to forward requests to clients. The client-side implementation is stubbed and will be done under a separate PR. Ref: https://hashicorp.atlassian.net/browse/NET-11549	2024-12-19 09:25:54 -05:00
Tim Gross	a7f2cb879e	command line tools for redacting keyring from snapshots (#24023 ) In #23977 we moved the keyring into Raft, which can expose key material in Raft snapshots when using the less-secure AEAD keyring instead of KMS. This changeset adds tools for redacting this material from snapshots: * The `operator snapshot state` command gains the ability to display key metadata (only), which respects the `-filter` option. * The `operator snapshot save` command gains a `-redact` option that removes key material from the snapshot after it's downloaded. * A new `operator snapshot redact` command allows removing key material from an existing snapshot.	2024-09-20 15:30:14 -04:00
Tim Gross	44f4970372	keyring in raft (#23977 ) In Nomad 1.4, we implemented a root keyring to support encrypting Variables and signing Workload Identities. The keyring was originally stored with the AEAD-wrapped DEKs and the KEK together in a JSON keystore file on disk. We recently added support for using an external KMS for the KEK to improve the security model for the keyring. But we've encountered multiple instances of the keystore files not getting backed up separately from the Raft snapshot, resulting in failure to restore clusters from backup. Move Nomad's root keyring into Raft (encrypted with a KMS/Vault where available) in order to eliminate operational problems with the separate on-disk keystore. Fixes: https://github.com/hashicorp/nomad/issues/23665 Ref: https://hashicorp.atlassian.net/browse/NET-10523	2024-09-19 13:56:42 -04:00
James Rasell	5041460043	core: do not create evaluations within batch deregister endpoint. (#20510 ) The batch deregister RPC endpoint is only used by the internal garbage collection process, it is not exposed via the HTTP API or used anywhere else. The GC process ensures that a job can only be removed from state if all related evaluations and allocations are in a state that means they can also be removed from state. This means that we do not need to create evaluations when jobs are being deregistered via this endpoint.	2024-05-07 07:39:13 +01:00
Daniel Bennett	ca1860ae76	state: enable more reverse sorting (#20410 ) * mainly jobs endpoint * update call sites * add new sort helpers * put sorting in a separate file	2024-04-16 15:10:11 -05:00
Michael Schurter	23e4b7c9d2	Upgrade go-msgpack to v2 (#20173 ) Replaces #18812 Upgraded with: ``` find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';' find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';' go get go get -v -u github.com/hashicorp/raft-boltdb/v2 go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80 ``` see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0 for details	2024-03-21 11:44:23 -07:00
James Rasell	ff2d0d6453	cli: Fix dummy FSM create to ensure snapshot state command works. (#19630 ) The Nomad state store function was recently updated to validate certain parameters, fixing a panic condition. This change meant dummy FSM used for the snapshot state command was always failing this validation and the command no longer worked. This change adds the required parameter to pass validation and therefore makes the CLI command functional again.	2024-01-05 16:00:24 +00:00
Marvin Chin	be8575a8a2	Fix server shutdown not waiting for worker run completion (#19560 ) * Move group into a separate helper module for reuse * Add shutdownCh to worker The shutdown channel is used to signal that worker has stopped. * Make server shutdown block on workers' shutdownCh * Fix waiting for eval broker state change blocking indefinitely There was a race condition in the GenericNotifier between the Run and WaitForChange functions, where WaitForChange blocks trying to write to a full unsubscribeCh, but the Run function never reads from the unsubscribeCh as it has already stopped. This commit fixes it by unblocking if the notifier has been stopped. * Bound the amount of time server shutdown waits on worker completion * Fix lostcancel linter error * Fix worker test using unexpected worker constructor * Add changelog --------- Co-authored-by: Marvin Chin <marvinchin@users.noreply.github.com>	2024-01-05 08:45:07 -06:00
James Rasell	6108f5c4c3	admin: rename _oss files to _ce (#18209 )	2023-08-18 07:47:24 +01:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Luiz Aoqui	1e679ccba2	node pool: initial base work (#17163 ) Implementation of the base work for the new node pools feature. It includes a new `NodePool` struct and its corresponding state store table. Upon start the state store is populated with two built-in node pools that cannot be modified nor deleted: * `all` is a node pool that always includes all nodes in the cluster. * `default` is the node pool where nodes that don't specify a node pool in their configuration are placed.	2023-05-15 10:49:08 -04:00
Luiz Aoqui	ee5a08dbb2	Revert "hashicorp/go-msgpack v2 (#16810 )" (#17047 ) This reverts commit `8a98520d56`.	2023-05-01 17:18:34 -04:00
Ian Fijolek	8a98520d56	hashicorp/go-msgpack v2 (#16810 ) * Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0 Fixes #16808 * Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack * deps: use go-msgpack v2.0.0 go-msgpack v2.1.0 includes some code changes that we will need to investigate furthere to assess its impact on Nomad, so keeping this dependency on v2.0.0 for now since it's no-op. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-17 17:02:05 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
James Rasell	7dbbf6bc58	acl: add binding rule object state schema and functionality. (#15511 ) This change adds a new table that will store ACL binding rule objects. The two indexes allow fast lookups by their ID, or by which auth method they are linked to. Snapshot persist and restore functionality ensures this table can be saved and restored from snapshots. In order to write and delete the object to state, new Raft messages have been added. All RPC request and response structs, along with object functions such as diff and canonicalize have been included within this work as it is nicely separated from the other areas of work.	2022-12-14 08:48:18 +01:00
Piotr Kazmierczak	02253e6f19	acl: sso auth method schema and store functions (#15191 ) This PR implements ACLAuthMethod type, acl_auth_methods table schema and crud state store methods. It also updates nomadSnapshot.Persist and nomadSnapshot.Restore methods in order for them to work with the new table, and adds two new Raft messages: ACLAuthMethodsUpsertRequestType and ACLAuthMethodsDeleteRequestType This PR is part of the SSO work captured under ☂️ ticket #13120.	2022-11-10 19:42:41 +01:00
James Rasell	bf46203930	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-30 08:59:13 +01:00
Tim Gross	d1faead371	rename SecureVariables to Variables throughout	2022-08-26 16:06:24 -04:00
James Rasell	7b3bd1017d	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-25 12:14:29 +01:00
Tim Gross	3af6937cf3	move secure variable conflict resolution to state store (#13922 ) Move conflict resolution implementation into the state store with a new Apply RPC. This also makes the RPC for secure variables much more similar to Consul's KV, which will help us support soft deletes in a post-1.4.0 version of Nomad. Reimplement quotas in the state store functions. Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-08-15 11:19:53 -04:00
James Rasell	d6a9c142ca	core: add ACL role state schema and functionality. (#13955 ) This commit includes the new state schema for ACL roles along with state interaction functions for CRUD actions. The change also includes snapshot persist and restore functionality and the addition of FSM messages for Raft updates which will come via RPC endpoints.	2022-08-09 09:33:41 +02:00
Tim Gross	9b1bea1bc1	secure variables: initial state store (#12932 ) Implement the core SecureVariable and RootKey structs in memdb, provide the minimal skeleton for FSM, and a dummy storage and keyring RPC endpoint.	2022-07-11 13:34:01 -04:00
Tim Gross	596203c7ff	snapshot restore-from-archive streaming and filtering (#13658 ) Stream snapshot to FSM when restoring from archive The `RestoreFromArchive` helper decompresses the snapshot archive to a temporary file before reading it into the FSM. For large snapshots this performs a lot of disk IO. Stream decompress the snapshot as we read it, without first writing to a temporary file. Add bexpr filters to the `RestoreFromArchive` helper. The operator can pass these as `-filter` arguments to `nomad operator snapshot state` (and other commands in the future) to include only desired data when reading the snapshot.	2022-07-11 10:48:00 -04:00
James Rasell	d49cf2388a	Merge branch 'main' into f-1.3-boogie-nights	2022-03-23 09:41:25 +01:00
Seth Hoenig	b242957990	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Luiz Aoqui	154264fcd9	Add pagination, filtering and sort to more API endpoints (#12186 )	2022-03-08 20:54:17 -05:00
James Rasell	12265ee9d1	events: add state objects and logic for service registrations.	2022-02-28 10:44:58 +01:00
Seth Hoenig	42c6d5a5c5	command: switch from raft-boltdb to raft-boltdb/v2	2022-02-23 14:43:59 -06:00
Seth Hoenig	b2fe196e42	agent: switch to go.etc.io/bbolt for state store This PR modifies the server and client agents to use `go.etc.io/bbolt` as the implementation for their state stores.	2022-02-23 14:28:31 -06:00
Seth Hoenig	b432f377cf	api: return sorted results in certain list endpoints These API endpoints now return results in chronological order. They can return results in reverse chronological order by setting the query parameter ascending=true. - Eval.List - Deployment.List	2022-02-15 13:48:28 -06:00
Michael Schurter	fa3de735cf	cli: return error from raft commands if db is open Before this change trying to run `nomad operator raft {info,logs}` on an inuse raft.db would cause the command to block until the agent using raft.db is closed. After this change the command will block for 1s before returning a (hopefully) helpful error message. This change also sets the ReadOnly mode on the underlying BoltDb to ensure diagnostics make no changes to the underlying store. We have no evidence this has ever occurred, but it seems like a useful safety measure. No changelog added since this is a minor tweak in a "new" feature (it was hidden in previous relases).	2021-12-16 11:41:01 -08:00
Tim Gross	bd18a452ab	cli: stream raft logs to operator raft logs subcommand (#11684 ) The `nomad operator raft logs` command uses a raft helper that reads in the logs from raft and serializes them to JSON. The previous implementation returned the slice of all logs and then serializes the entire object. Update the helper to stream the log entries and then serialize them as newline-delimited JSON.	2021-12-16 13:38:58 -05:00
Mahmood Ali	68bae12fd4	Raft Debugging Improvements (#11414 )	2021-11-04 10:16:12 -04:00
Mahmood Ali	6c414cd5f9	gofmt all the files mostly to handle build directives in 1.17.	2021-10-01 10:14:28 -04:00
Tim Gross	a12f44705a	RPC endpoints to support 'nomad ui -login' RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and `ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`). Includes adding expiration to the periodic core GC job.	2021-03-10 08:17:56 -05:00
Drew Bailey	da45c95956	Send events to EventSinks (#9171 ) * Process to send events to configured sinks This PR adds a SinkManager to a server which is responsible for managing managed sinks. Managed sinks subscribe to the event broker and send events to a sink writer (webhook). When changes to the eventstore are made the sinkmanager and managed sink are responsible for reloading or starting a new managed sink. * periodically check in sink progress to raft Save progress on the last successfully sent index to raft. This allows a managed sink to resume close to where it left off in the event of a lost server or leadership change dereference eventsink so we can accurately use the watchch When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger	2020-10-26 17:27:54 -04:00
Drew Bailey	fbb199d416	event sink crud operation api (#9155 ) * network sink rpc/api plumbing state store methods and restore upsert sink test get sink delete sink event sink list and tests go generate new msg types validate sink on upsert * go generate	2020-10-23 14:23:00 -04:00
Michael Schurter	6c2f589ed0	Update generate_msgtypes.sh now that iota is gone	2020-10-22 15:26:32 -07:00
davemay99	5ab5c25fb5	return explicit error if not found/empty path falls through	2020-09-29 14:55:28 -04:00
davemay99	3aaf32d1a3	satisfy the linter	2020-09-29 01:37:15 -04:00
davemay99	a90f66eab9	reverting export of fixTime	2020-09-29 01:21:03 -04:00
davemay99	77c780598d	finish refactoring walk to search for any file	2020-09-29 01:17:10 -04:00
davemay99	32c24afcba	refactor functions to find raft.db	2020-09-24 19:00:53 -04:00
davemay99	36c88b536a	logging tweaks	2020-09-24 17:30:50 -04:00
davemay99	c5cd1990f6	export FixTime to allow external use	2020-09-24 16:47:58 -04:00
davemay99	c301e8c5cf	add missing import	2020-09-21 11:20:08 -04:00

1 2

53 Commits