Update the `nomad setup consul` command to include a `Selector` for the
`NamespaceRule` so the logic is only applied when the token has a claim
for `consul_namespace`.
Jobs without an explicit `consul.namespace` value receive a JWT without
the `consul_namespace` claim because Nomad is unable to determine which
Consul namespace should be used.
By using `NamespaceRules`, cluster operators are able to set a default
value for these jobs.
* API command and jobspec docs
* PR comments addressed
* API docs for job/jobid/action socket
* Removing a perhaps incorrect origin of job_id across the jobs api doc
* PR comments addressed
In order to correctly handle Consul namespaces, auth methods and binding rules
must always be created in the default namespace only.
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
The `-reschedule` flag stops allocations and assumes the Nomad scheduler
will create new allocations to replace them. But this is only true for
service and batch jobs.
Restarting non-service jobs with the `-reschedule` flag causes the
command to loop forever waiting for the allocations to be replaced,
which never happens.
Allocations for system jobs may be replaced by triggering an evaluation
after each stop to cause the reconciler to run again.
Sysbatch jobs should not be allowed to be rescheduled as they are never
replaced by the scheduler.
This simplifies the default setup of Nomad workloads WI-based
authentication for Consul by using a single auth method with 2 binding rules.
Users can still specify separate auth methods for services and tasks.
* Initial pass at a global actions instance queue
* Action card with a bunch of functionality that needs to be pared back a bit
* Happy little actions button
* runAction performs updated to use actions service
* Stop All and Clear Finished buttons
* Keyboard service now passes element, so we can pseudo-click the actions dropdown
* resizable sidebar code blocks
* Contextual actions within task and job levels
* runAction greatly consolidated
* Pluralize action text
* Peer grouping of flyout action intances
* ShortIDs instead of full alloc IDs
* Testfixes that previously depended on notifications
* Stop and stop all for peered action instances
* Job name in action instance card linkable
* Componentized actions global button
* scss consolidation
* Clear and Stop buttons become mutually exclusive in an action card
* Clean up action card title styles a bit
* todo-bashing
* stopAll and stopPeers separated and fixed up
* Socket handling functions moved to the Actions service
* Error handling on socket message
* Smarter import
* Documentation note: need alloc-exec and alloc-raw-exec for raw_exec jobs
* Tests for flyout and dropdown actions
* Docs link when in empty flyout/queue state and percy snapshot test for it
The `nomad job restart` command should skip allocations that already
have replacements. Restarting an allocation with a replacement is a
no-op because the allocation status is terminal and the command's
replacement monitor returns immediatelly.
But by not skipping them, the effective batch size is computed
incorrectly.
The allocrunner has a service registration handler that proxies various API
calls to Consul. With multi-cluster support (for ENT), the service registration
handler is what selects the correct Consul client. The name of this field in the
allocrunner and taskrunner code base looks like it's referring to the actual
Consul API client. This was actually the case before Nomad native service
discovery was implemented, but now the name is misleading.
When creating the binding rule, `BindName` must match the pattern used
for the role name, otherwise the task will not be able to login to
Consul.
Also update the equality check for the binding rule to ensure this
property is held even if the auth method already has existing binding
rules attached.
* make the little dots consistent
* don't trim delimiter as that over matches
* test jobspec2 package
* copy api/WorkloadIdentity.TTL -> structs
* test ttl parsing
* fix hcl1 v 2 parsing mismatch
* make jobspec(1) tests match jobspec2 tests
Remove the now-unused original configuration blocks for Consul and Vault from
the client. When the client needs to refer to a Consul or Vault block it will
always be for a specific cluster for the task/service. Add a helper for
accessing the default clusters (for the client's own use).
This is two of three changesets for this work. The remainder will implement the
same changes in the `command/agent` package.
As part of this work I discovered and fixed two bugs:
* The gRPC proxy socket that we create for Envoy is only ever created using the
default Consul cluster's configuration. This will prevent Connect from being
used with the non-default cluster.
* The Consul configuration we use for templates always comes from the default
Consul cluster's configuration, but will use the correct Consul token for the
non-default cluster. This will prevent templates from being used with the
non-default cluster.
Ref: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Fixes: https://github.com/hashicorp/nomad/issues/18984
Fixes: https://github.com/hashicorp/nomad/issues/18983
Submitting a Consul or Vault token with a job is deprecated in Nomad 1.7 and
intended for removal in Nomad 1.9. We added a deprecation warning to the CLI
when the user passes in the appropriate flag or environment variable in
does not use Vault or Consul but happen to have the appropriate environment
variable in your environment. While this is generally a bad practice (because
the token is leaked to Nomad), it's also the existing practice for some users.
Move the warning to the job admission hook. This will allow us to warn only when
appropriate, and that will also help the migration process by producing warnings
only for the relevant jobs.
Remove the now-unused original configuration blocks for Consul and Vault from
the server. When the server needs to refer to a Consul or Vault block it will
always be for a specific cluster for the task/service. Add a helper for
accessing the default clusters (for the servers own use).
This is one of three changesets for this work. The remainder will implement the
same changes in the `client` package and on the `command/agent` package.
As part of this work I discovered that the job submission hook for Vault only
checks the enabled flag on the default cluster, rather than the clusters that
are used by the job being submitted. This will return an error on job
registration saying that Vault is disabled. Fix that to check only the
cluster(s) used by the job.
Ref: https://github.com/hashicorp/nomad/issues/18947
Fixes: https://github.com/hashicorp/nomad/issues/18990
The `operator client-state` command is mostly used for developer debugging of
the Nomad client state, but it hasn't been updated with several recent
additions. Add allocation identities, network status, and dynamic volumes to the
objects it outputs.
Also, fix a bug where reading the state for an allocation without task states
will crash the CLI. This can happen if the Nomad client stops after an alloc is
persisted to disk but before the task actually starts.
An interactive setup helper for configuring Consul to accept Nomad WI-enabled workloads.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
* identity: support change_mode and change_signal
wip - just jobspec portion
* test struct
* cleanup some insignificant boogs
* actually implement change mode
* docs tweaks
* add changelog
* test identity.change_mode operations
* use more words in changelog
* job endpoint tests
* address comments from code review
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256.
Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again.
**Test Updates**
Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed.
I added a new `testutil.WaitForKeyring` helper to replace `testutil.WaitForLeader` in cases where the keyring must be initialized before the test may continue. However this is mostly used in the `nomad/` package.
In the `api` and `command/agent` packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners.
**TODO**
- Docs and changelog entries.
- Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days.
- Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this.
**Requiem for ed25519**
When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset.
EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties:
1. Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key.
2. Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious.
3. No parameters. No choices of encodings. It's all well-defined by [RFC 8032](https://datatracker.ietf.org/doc/html/rfc8032).
4. Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus!
5. Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too.
Life was good with ed25519, but sadly it could not last.
[IDPs](https://en.wikipedia.org/wiki/Identity_provider), such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256:
- [PayPal](https://www.paypalobjects.com/.well-known/openid-configuration) supports HS256
- [Yahoo](https://api.login.yahoo.com/.well-known/openid-configuration) supports ES256
RS256 only:
- [GitHub](https://token.actions.githubusercontent.com/.well-known/openid-configuration)
- [GitLab](https://gitlab.com/.well-known/openid-configuration)
- [Google](https://accounts.google.com/.well-known/openid-configuration)
- [Intuit](https://developer.api.intuit.com/.well-known/openid_configuration)
- [Microsoft](https://login.microsoftonline.com/fabrikamb2c.onmicrosoft.com/v2.0/.well-known/openid-configuration)
- [SalesForce](https://login.salesforce.com/.well-known/openid-configuration)
- [SimpleLogin (acquired by ProtonMail)](https://app.simplelogin.io/.well-known/openid-configuration/)
- [TFC](https://app.terraform.io/.well-known/openid-configuration)
Submitting a Consul or Vault token with a job is deprecated in Nomad 1.7 and
intended for removal in Nomad 1.9. Add a deprecation warning to the CLI when the
user passes in the appropriate flag or environment variable.
Nomad agents will no longer need a Vault token when configured with workload
identity, and we'll ignore Vault tokens in the agent config after Nomad 1.9. Log
a warning at agent startup.
Ref: https://github.com/hashicorp/nomad/issues/15617
Ref: https://github.com/hashicorp/nomad/issues/15618