The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the third in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.
This patch involves creating a new "virtual" ACL object for checking permissions
on client operations and a matching `AuthenticateClientOnly` method for
client-only RPCs that can produce that ACL.
Unlike the server ACLs PR, this also includes a special case for "legacy" client
RPCs where the client was not previously sending the secret as it
should (leaning on mTLS only). Those client RPCs were fixed in Nomad 1.6.0, but
it'll take a while before we can guarantee they'll be present during upgrades.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
The WID manager will only sign WI tokens for the allocation's task group. We're
accidentally looping over all the task groups, which for jobs with multiple task
groups results in a failure in the `consul_hook`.
Nomad imports the Vault SDK to get testing helpers, but it turns out the only
thing actually in use was a single string constant for the Vault namespace
header. Remove this dependency and hardcode the constant to reduce dependency
churn.
Includes changes to WID Manager that make it request signed identities for
services, as well as a few improvements to WIHandle introduced in #18672.
---------
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
When clients are restarted and the identity hook runs when we restore
allocations, the running allocations are likely to have already-signed Workload
Identities that are unexpired. Save these to the client's local state DB so that
we can avoid a thundering herd of RPCs during client restart. When we restore,
we'll check if there's at least one expired signed WI before making any initial
signing request.
Included:
* Renames `getIdentities` to `getInitialIdentities` to make the workflow more clear.
* Renames the existing `widmgr_test.go` file of integration tests, which is in its
own package to avoid circular imports to `widmgr_int_test.go`
Any code that tracks workloads and their identities should not rely on string
comparisons, especially since we support 2 types of workload identities: those
that identify tasks and those that identify services. This means we cannot rely
on task.Name for workload-identity pairs.
The new type structs.WIHandle solves this problem by providing a uniform way of
identifying workloads and their identities.
No functional changes, just cleaning up deprecated usages that are
removed in v2 and replace one call of .Slice with .ForEach to avoid
making the intermediate copy.
Setting a null value to a node metadata is expected to remove it from
subsequent reads. This is true both for static node metadata (defined in
the agent configuration file) as well as for dynamic node metadata
(defined via the Nomad API).
Null values for static metadata must be persisted to indicate that the
value has been removed, but strictly dynamic metadata null values can be
removed from state and client memory.
Since the allocation in the task runner is updated in a separate
goroutine, a race condition may happen where the task is started but the
prestart hooks are skipped because the allocation became terminal.
Checking for a terminal allocation before proceeding with the task start
ensures the task only runs if the prestart hooks are also executed.
Since `shouldShutdown()` only uses terminal allocation status, it
remains `true` after the first transition, so it's safe to check it
again after the prestart hooks as it will never revert to `false`.
Eliminates the vaultclient test import cycle by putting the test file into the
client package and making vaultclient objects public.
Ref hashicorp/team-nomad#404
When Workload Identity is being used with Consul, the `consul_hook` will add
Consul tokens to the alloc hook resources. Update the `group_service_hook` and
`service_hook` to use those tokens when available for registering and
deregistering Consul workloads.
The `sids_hook` runs for Connect sidecar/gateway tasks and gets Consul Service
Identity (SI) tokens for use by the Envoy bootstrap hook. When Workload Identity
is being used with Consul, the `consul_hook` will have already added these
tokens to the alloc hook resources. Update the `sids_hook` to use those tokens
instead and write them to the expected area of the taskdir.
When waiting on a previous alloc we must query against the leader before
switching to a stale query with index set.
Also check to ensure the response is fresh before using it like #18269
To support Workload Identity with Consul for templates, we want templates to be
able to use the WI created at the task scope (either implicitly or set by the
user). But to allow different tasks within a group to be assigned to different
clusters as we're doing for Vault, we need to be able to set the `consul` block
with its `cluster` field at the task level to override the group.
This PR introduces a new allocrunner-level consul_hook which iterates over
services and tasks, if their provider is consul, fetches consul tokens for all of
them, stores them in AllocHookResources and in task secret dirs.
Ref: hashicorp/team-nomad#404
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Since the identity_hook is meant to be the central place that makes signed
identities available to other hooks, it should also expose the default identity
that is signed by the plan applier.
Ref: hashicorp/team-nomad#404
Similar to #18269, it is possible that even if Node.GetClientAllocs
retrieves fresh allocs that the subsequent Alloc.GetAllocs call
retrieves stale allocs. While `diffAlloc(existing, updated)` properly
ignores stale alloc *updates*, alloc deletions have no such check.
So if a client retrieves an alloc created at index 123, and then a
subsequent Alloc.GetAllocs call hits a new server which returns results
at index 100, the client will stop the alloc created at 123 because it
will be missing from the stale response.
This change applies the same logic as #18269 and ensures only fresh
responses are used.
Glossary:
* fresh - modified at an index > the query index
* stale - modified at an index <= the query index
* vault: update identity name to start with `vault_`
In the original proposal, workload identities used to derive Vault
tokens were expected to be called just `vault`. But in order to support
multiple Vault clusters it is necessary to associate identities with
specific Vault cluster configuration.
This commit implements a new proposal to have Vault identities named as
`vault_<cluster>`.
* config: fix multi consul and vault config parse
Capture the loop variable when parsing multiple Consul and Vault
configuration blocks so the duration parse function uses the correct
field when it's called later on.
* client: build Vault client with right config
When setting up the multiple Vault clients, the code was always loading
the default configuration, resulting in all clients to be configured the
same way.
* config: fix WorkloadIdentityConfig.Copy() method
Ensure `WorkloadIdentityConfig.Copy()` does not return the original
pointer for the `TTL` field.
This is a work-in-progress changeset to provide workload-specific Consul tokens
that are created by the `consul_hook` and attached to workload registration
requests by the `group_service_hook` and `service_hook`.
This requires unreleased updates to Consul's `api` package, so this changeset
includes a temporary `replace` directive in the go.mod file.
and MockCSIManager to support the call counting
that csi_hook_test expects
instead of implementing csimanager
interfaces in two separate places:
* client/allocrunner/csi_hook_test
* client/csi_endpoint_test
they can both use the same mocks defined in
client/pluginmanager/csimanager/
alongside the actual implementations of them.
also refactor TestCSINode_DetachVolume
to use use it like Node_ExpandVolume
so we can also test the happy path there
This commit splits identity_hook between the allocrunner and taskrunner. The
allocrunner-level part of the hook signs each task identity, and the
taskrunner-level part picks it up and stores secrets for each task.
The code revamps the WIDMgr, which is now split into 2 interfaces:
IdentityManager which manages renewals of signatures and handles sending
updates to subscribers via Watch method, and IdentitySigner which only does the
signing.
This work is necessary for having a unified Consul login workflow that comes
with the new Consul integration. A new, allocrunner-level consul_hook will now
be the only hook doing Consul authentication.
Nomad Enterprise will support configuring multiple Vault clients. Instead of
having a single Vault client field in the Nomad client, we'll have a function
that callers can parameterize by the Vault cluster name that returns the
correctly configured Vault API client wrapper.
following ControllerExpandVolume
in c6dbba7cde,
which expands the disk at e.g. a cloud vendor,
the controller plugin may say that we also need
to issue NodeExpandVolume for the node plugin to
make the new disk space available to task(s) that
have claims on the volume by e.g. expanding
the filesystem on the node.
csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume
* drivers: plumb hardware topology via grpc into drivers
This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.
* cr: use enum instead of bool for core grade
* cr: fix test slit tables to be possible
the first half of volume expansion,
this allows a user to update requested capacity
("capacity_min" and "capacity_max") in a volume
specification file, and re-issue either Register
or Create volume commands (or api calls).
the requested capacity will now be "reconciled"
with the current real capacity of the volume,
issuing a ControllerExpandVolume RPC call
to a running controller plugin, if requested
"capacity_min" is higher than the current
capacity on the volume in state.
csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#controllerexpandvolume
note: this does not yet cover NodeExpandVolume
* client: refactor cpuset partitioning
This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.
Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.
Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.
* cr: tweaks for feedback
This change deduplicates the ACL policy list generated from ACL
roles referenced within an ACL token on the client.
Previously the list could contain duplicates, which would cause
erronous permission denied errors when calling client related RPC/
HTTP API endpoints. This is because the client calls the ACL get
policies endpoint which subsequently ensures the caller has
permission to view the ACL policies. This check is performed by
comparing the requested list args with the policies referenced by
the caller ACL token. When a duplicate is present, this check
fails, as the check must ensure the slices match exactly.
In the original design of Consul fingerprinting, we would poll every period so
that we could change the client's fingerprint if Consul became unavailable. As
of 1.4.0 (ref #14673) we no longer update the fingerprint in order to avoid
excessive `Node.Register` RPCs when someone's Consul cluster is flapping.
This allows us to safely backoff Consul fingerprinting on success, just as we
have with Vault.
fingerprint: add support for fingerprinting multiple Consul clusters
Add fingerprinting we'll need to accept multiple Consul clusters in upcoming
Nomad Enterprise features. The fingerprinter will create a map of Consul clients
by cluster name. In Nomad CE, all but the default cluster will be ignored and
there will be no visible behavior change.
Ref: https://github.com/hashicorp/team-nomad/issues/404
When restoring an allocation `WIDMgr` was not being set in the alloc
runner config, resulting in a nil panic when the task runner attempted
to start.
Since we will often require the same configuration values when creating
or restoring a new allocation, this commit moves the logic to a shared
function to ensure that `addAlloc` and `restoreState` configure alloc
runners with the same values.
* Revert "client: include response body in output for successful HTTP checks (#18345)"
This reverts commit d0a93f12d1.
* cr: add comment about dropping ok output
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Ensure that the index processed by the client is at least as new as the last index processed so that stale data does not impact the running allocations.
When an allocation is garbage collected from the client, but not from
the servers, the API request is routed to the client and the client
does attempt to read the file, but the alloc dir has already been
deleted, resulting in a 500 error.
This happens because the client GC only destroys the alloc runner
(deleting the alloc dir), but it keeps a reference to the alloc runner
until the alloc is garbage collected from the servers as well.
This commit adjusts this logic by checking if the alloc runner (and the
alloc files) has been destroyed, returning a 404 if so.