* identity: support change_mode and change_signal
wip - just jobspec portion
* test struct
* cleanup some insignificant boogs
* actually implement change mode
* docs tweaks
* add changelog
* test identity.change_mode operations
* use more words in changelog
* job endpoint tests
* address comments from code review
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Allocations that were created before Nomad 1.7 will not have the cluster field
set for their Consul blocks. While this can be corrected server-side, that
doesn't help allocations already on clients.
Allocations that were created before Nomad 1.7 will not have the `cluster` field
set for their Vault blocks. While this can be corrected server-side, that
doesn't help allocations already on clients.
Also add extra safety on Consul cluster lookup too
The `group_service_hook` needs to supply the Consul service client with Consul
tokens for its services. The lookup in the hook resources was looking for the
wrong key. This would cause the service client to ignore the Consul token we've
received and use the agent's own token.
This changeset also moves the prefix formatting into `MakeUniqueIdentityName` method
to reduce the risk of this kind of bug in the future.
The allocrunner's `identity_hook` implements the interface for TaskStop, but
this interface is only ever called for task-level hooks. This results in a
leaked goroutine that tries to periodically renew WIs until the client shuts
down gracefully.
Add an implementation for the allocrunner's `PreKill` and `Destroy` hooks, so
that whenever an allocation is stopped or garbage collected we stop renewing its
Workload Identities. This also requires making the `Shutdown` method of `WIDMgr`
safe to call multiple times.
The `sids_hook` serves the legacy Connect workflow, and we want to bypass it
when using workload identities. So the hook checks that there's not already a
Consul token in the alloc hook resources derived from the Workload
Identity. This check was looking for the wrong key. This would cause the hook to
ignore the Consul token we already have and then fail to derive a SI token
unless the Nomad agent has its own token with `acl:write` permission.
Fix the lookup and add tests covering the bypass behavior.
* core: plumbing to support numa aware scheduling
* core: apply node resources compatibility upon fsm rstore
Handle the case where an upgraded server dequeus an evaluation before
a client triggers a new fingerprint - which would be needed to cause
the compatibility fix to run. By running the compat fix on restore the
server will immediately have the compatible pseudo topology to use.
* lint: learn how to spell pseudo
In Nomad Enterprise, a task may connect to a non-default Vault cluster,
requiring `consul-template` to be configured with a specific client
`vault` block.
This changeset makes two changes:
* Removes the `consul.use_identity` field from the agent configuration. This behavior is properly covered by the presence of `consul.service_identity` / `consul.task_identity` blocks.
* Adds a `consul.task_auth_method` and `consul.service_auth_method` fields to the agent configuration. This allows the cluster administrator to choose specific Consul Auth Method names for their environment, with a reasonable default.
When agents start, they create a shared Consul client that is then wrapped as
various interfaces for testability, and used in constructing the Nomad client
and server. The interfaces that support workload services (rather than the Nomad
agent itself) need to support multiple Consul clusters for Nomad
Enterprise. Update these interfaces to be factory functions that return the
Consul client for a given cluster name. Update the `ServiceClient` to split
workload updates between clusters by creating a wrapper around all the clients
that delegates to the cluster-specific `ServiceClient`.
Ref: https://github.com/hashicorp/team-nomad/issues/404
The WID manager will only sign WI tokens for the allocation's task group. We're
accidentally looping over all the task groups, which for jobs with multiple task
groups results in a failure in the `consul_hook`.
Includes changes to WID Manager that make it request signed identities for
services, as well as a few improvements to WIHandle introduced in #18672.
---------
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
When clients are restarted and the identity hook runs when we restore
allocations, the running allocations are likely to have already-signed Workload
Identities that are unexpired. Save these to the client's local state DB so that
we can avoid a thundering herd of RPCs during client restart. When we restore,
we'll check if there's at least one expired signed WI before making any initial
signing request.
Included:
* Renames `getIdentities` to `getInitialIdentities` to make the workflow more clear.
* Renames the existing `widmgr_test.go` file of integration tests, which is in its
own package to avoid circular imports to `widmgr_int_test.go`
Any code that tracks workloads and their identities should not rely on string
comparisons, especially since we support 2 types of workload identities: those
that identify tasks and those that identify services. This means we cannot rely
on task.Name for workload-identity pairs.
The new type structs.WIHandle solves this problem by providing a uniform way of
identifying workloads and their identities.
Since the allocation in the task runner is updated in a separate
goroutine, a race condition may happen where the task is started but the
prestart hooks are skipped because the allocation became terminal.
Checking for a terminal allocation before proceeding with the task start
ensures the task only runs if the prestart hooks are also executed.
Since `shouldShutdown()` only uses terminal allocation status, it
remains `true` after the first transition, so it's safe to check it
again after the prestart hooks as it will never revert to `false`.
When Workload Identity is being used with Consul, the `consul_hook` will add
Consul tokens to the alloc hook resources. Update the `group_service_hook` and
`service_hook` to use those tokens when available for registering and
deregistering Consul workloads.
The `sids_hook` runs for Connect sidecar/gateway tasks and gets Consul Service
Identity (SI) tokens for use by the Envoy bootstrap hook. When Workload Identity
is being used with Consul, the `consul_hook` will have already added these
tokens to the alloc hook resources. Update the `sids_hook` to use those tokens
instead and write them to the expected area of the taskdir.
To support Workload Identity with Consul for templates, we want templates to be
able to use the WI created at the task scope (either implicitly or set by the
user). But to allow different tasks within a group to be assigned to different
clusters as we're doing for Vault, we need to be able to set the `consul` block
with its `cluster` field at the task level to override the group.
This PR introduces a new allocrunner-level consul_hook which iterates over
services and tasks, if their provider is consul, fetches consul tokens for all of
them, stores them in AllocHookResources and in task secret dirs.
Ref: hashicorp/team-nomad#404
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* vault: update identity name to start with `vault_`
In the original proposal, workload identities used to derive Vault
tokens were expected to be called just `vault`. But in order to support
multiple Vault clusters it is necessary to associate identities with
specific Vault cluster configuration.
This commit implements a new proposal to have Vault identities named as
`vault_<cluster>`.
and MockCSIManager to support the call counting
that csi_hook_test expects
instead of implementing csimanager
interfaces in two separate places:
* client/allocrunner/csi_hook_test
* client/csi_endpoint_test
they can both use the same mocks defined in
client/pluginmanager/csimanager/
alongside the actual implementations of them.
also refactor TestCSINode_DetachVolume
to use use it like Node_ExpandVolume
so we can also test the happy path there
This commit splits identity_hook between the allocrunner and taskrunner. The
allocrunner-level part of the hook signs each task identity, and the
taskrunner-level part picks it up and stores secrets for each task.
The code revamps the WIDMgr, which is now split into 2 interfaces:
IdentityManager which manages renewals of signatures and handles sending
updates to subscribers via Watch method, and IdentitySigner which only does the
signing.
This work is necessary for having a unified Consul login workflow that comes
with the new Consul integration. A new, allocrunner-level consul_hook will now
be the only hook doing Consul authentication.
Nomad Enterprise will support configuring multiple Vault clients. Instead of
having a single Vault client field in the Nomad client, we'll have a function
that callers can parameterize by the Vault cluster name that returns the
correctly configured Vault API client wrapper.
following ControllerExpandVolume
in c6dbba7cde,
which expands the disk at e.g. a cloud vendor,
the controller plugin may say that we also need
to issue NodeExpandVolume for the node plugin to
make the new disk space available to task(s) that
have claims on the volume by e.g. expanding
the filesystem on the node.
csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume
* drivers: plumb hardware topology via grpc into drivers
This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.
* cr: use enum instead of bool for core grade
* cr: fix test slit tables to be possible
* client: refactor cpuset partitioning
This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.
Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.
Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.
* cr: tweaks for feedback
* lang: note that Stack is not concurrency-safe
* client: use more descriptive name for wrangler hook in logs
* numalib: use correct name for receiver parameter
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.
Introduce a helper with a check on the cap before the bitshift to avoid overflow in all
places this can occur.
Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
Allows for multiple `identity{}` blocks for tasks along with user-specified audiences. This is a building block to allow workload identities to be used with Consul, Vault and 3rd party JWT based auth methods.
Expiration is still unimplemented and is necessary for JWTs to be used securely, so that's up next.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* build: update to go1.21
* go: eliminate helpers in favor of min/max
* build: run go mod tidy
* build: swap depguard for semgrep
* command: fixup broken tls error check on go1.21
There are some refactorings that have to be made in the getter and state
where the api changed in `slices`
* Bump golang.org/x/exp
* Bump golang.org/x/exp in api
* Update job_endpoint_test
* [feedback] unexport sort function
This feature is necessary when user want to explicitly re-render all templates on task restart.
E.g. to fetch all new secrets from Vault, even if the lease on the existing secrets has not been expired.