This change isolates all the code that deals with node selection in the
scheduler into its own package called feasible.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
The legacy workflow for Vault whereby servers were configured
using a token to provide authentication to the Vault API has now
been removed. This change also removes the workflow where servers
were responsible for deriving Vault tokens for Nomad clients.
The deprecated Vault config options used byi the Nomad agent have
all been removed except for "token" which is still in use by the
Vault Transit keyring implementation.
Job specification authors can no longer use the "vault.policies"
parameter and should instead use "vault.role" when not using the
default workload identity.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
We introduce an alternative solution to the one presented in #24960 which is
based on the state store and not previous-next allocation tracking in the
reconciler. This new solution reduces cognitive complexity of the scheduler
code at the cost of slightly more boilerplate code, but also opens up new
possibilities in the future, e.g., allowing users to explicitly "un-stick"
volumes with workloads still running.
The diagram below illustrates the new logic:
SetVolumes() upsertAllocsImpl()
sets ns, job +-----------------checks if alloc requests
tg in the scheduler v sticky vols and consults
| +-----------------------+ state. If there is no claim,
| | TaskGroupVolumeClaim: | it creates one.
| | - namespace |
| | - jobID |
| | - tg name |
| | - vol ID |
v | uniquely identify vol |
hasVolumes() +----+------------------+
consults the state | ^
and returns true | | DeleteJobTxn()
if there's a match <-----------+ +---------------removes the claim from
or if there is no the state
previous claim
| | | |
+-----------------------------+ +------------------------------------------------------+
scheduler state store
This changeset implements the RPC handlers for Dynamic Host Volumes, including
the plumbing needed to forward requests to clients. The client-side
implementation is stubbed and will be done under a separate PR.
Ref: https://hashicorp.atlassian.net/browse/NET-11549
In #23977 we moved the keyring into Raft, which can expose key material in Raft
snapshots when using the less-secure AEAD keyring instead of KMS. This changeset
adds tools for redacting this material from snapshots:
* The `operator snapshot state` command gains the ability to display key
metadata (only), which respects the `-filter` option.
* The `operator snapshot save` command gains a `-redact` option that removes key
material from the snapshot after it's downloaded.
* A new `operator snapshot redact` command allows removing key material from an
existing snapshot.
In Nomad 1.4, we implemented a root keyring to support encrypting Variables and
signing Workload Identities. The keyring was originally stored with the
AEAD-wrapped DEKs and the KEK together in a JSON keystore file on disk. We
recently added support for using an external KMS for the KEK to improve the
security model for the keyring. But we've encountered multiple instances of the
keystore files not getting backed up separately from the Raft snapshot,
resulting in failure to restore clusters from backup.
Move Nomad's root keyring into Raft (encrypted with a KMS/Vault where available)
in order to eliminate operational problems with the separate on-disk keystore.
Fixes: https://github.com/hashicorp/nomad/issues/23665
Ref: https://hashicorp.atlassian.net/browse/NET-10523
The batch deregister RPC endpoint is only used by the internal
garbage collection process, it is not exposed via the HTTP API or
used anywhere else.
The GC process ensures that a job can only be removed from state
if all related evaluations and allocations are in a state that
means they can also be removed from state. This means that we do
not need to create evaluations when jobs are being deregistered
via this endpoint.
Replaces #18812
Upgraded with:
```
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';'
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';'
go get
go get -v -u github.com/hashicorp/raft-boltdb/v2
go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80
```
see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0
for details
The Nomad state store function was recently updated to validate
certain parameters, fixing a panic condition. This change meant
dummy FSM used for the snapshot state command was always failing
this validation and the command no longer worked.
This change adds the required parameter to pass validation and
therefore makes the CLI command functional again.
* Move group into a separate helper module for reuse
* Add shutdownCh to worker
The shutdown channel is used to signal that worker has stopped.
* Make server shutdown block on workers' shutdownCh
* Fix waiting for eval broker state change blocking indefinitely
There was a race condition in the GenericNotifier between the
Run and WaitForChange functions, where WaitForChange blocks
trying to write to a full unsubscribeCh, but the Run function never
reads from the unsubscribeCh as it has already stopped.
This commit fixes it by unblocking if the notifier has been stopped.
* Bound the amount of time server shutdown waits on worker completion
* Fix lostcancel linter error
* Fix worker test using unexpected worker constructor
* Add changelog
---------
Co-authored-by: Marvin Chin <marvinchin@users.noreply.github.com>
Implementation of the base work for the new node pools feature. It includes a new `NodePool` struct and its corresponding state store table.
Upon start the state store is populated with two built-in node pools that cannot be modified nor deleted:
* `all` is a node pool that always includes all nodes in the cluster.
* `default` is the node pool where nodes that don't specify a node pool in their configuration are placed.
* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0
Fixes#16808
* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack
* deps: use go-msgpack v2.0.0
go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
This change adds a new table that will store ACL binding rule
objects. The two indexes allow fast lookups by their ID, or by
which auth method they are linked to. Snapshot persist and
restore functionality ensures this table can be saved and
restored from snapshots.
In order to write and delete the object to state, new Raft messages
have been added.
All RPC request and response structs, along with object functions
such as diff and canonicalize have been included within this work
as it is nicely separated from the other areas of work.
This PR implements ACLAuthMethod type, acl_auth_methods table schema and crud state store methods. It also updates nomadSnapshot.Persist and nomadSnapshot.Restore methods in order for them to work with the new table, and adds two new Raft messages: ACLAuthMethodsUpsertRequestType and ACLAuthMethodsDeleteRequestType
This PR is part of the SSO work captured under ☂️ ticket #13120.
Move conflict resolution implementation into the state store with a new Apply RPC.
This also makes the RPC for secure variables much more similar to Consul's KV,
which will help us support soft deletes in a post-1.4.0 version of Nomad.
Reimplement quotas in the state store functions.
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
This commit includes the new state schema for ACL roles along with
state interaction functions for CRUD actions.
The change also includes snapshot persist and restore
functionality and the addition of FSM messages for Raft updates
which will come via RPC endpoints.
Stream snapshot to FSM when restoring from archive
The `RestoreFromArchive` helper decompresses the snapshot archive to a
temporary file before reading it into the FSM. For large snapshots
this performs a lot of disk IO. Stream decompress the snapshot as we
read it, without first writing to a temporary file.
Add bexpr filters to the `RestoreFromArchive` helper.
The operator can pass these as `-filter` arguments to `nomad operator
snapshot state` (and other commands in the future) to include only
desired data when reading the snapshot.
These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.
- Eval.List
- Deployment.List
Before this change trying to run `nomad operator raft {info,logs}` on an
inuse raft.db would cause the command to block until the agent using
raft.db is closed.
After this change the command will block for 1s before returning a
(hopefully) helpful error message.
This change also sets the ReadOnly mode on the underlying BoltDb to
ensure diagnostics make no changes to the underlying store. We have no
evidence this has ever occurred, but it seems like a useful safety
measure.
No changelog added since this is a minor tweak in a "new" feature (it
was hidden in previous relases).
The `nomad operator raft logs` command uses a raft helper that reads
in the logs from raft and serializes them to JSON. The previous
implementation returned the slice of all logs and then serializes the
entire object. Update the helper to stream the log entries and then
serialize them as newline-delimited JSON.
RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and
`ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`).
Includes adding expiration to the periodic core GC job.
* Process to send events to configured sinks
This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.
* periodically check in sink progress to raft
Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change
dereference eventsink so we can accurately use the watchch
When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger
* network sink rpc/api plumbing
state store methods and restore
upsert sink test
get sink
delete sink
event sink list and tests
go generate new msg types
validate sink on upsert
* go generate