Commit Graph

25251 Commits

Author SHA1 Message Date
James Rasell
5f98e6473c acl: use token locality consts when validating auth methods. (#18975) 2023-11-03 07:22:54 +00:00
Seth Hoenig
1604dba508 client: fingerprint cpu on raspberry pi (#18982)
This PR tweaks the linux cpu fingerprinter to handle the case where no
NUMA node data is found under /sys/devices/system/, in which case we
need to assume just one node, one socket.
2023-11-02 15:52:37 -05:00
Michael Schurter
78f0c6b2a9 cli: update acl bootstrap help to match docs (#18961)
See https://developer.hashicorp.com/nomad/docs/commands/acl/bootstrap
2023-11-02 08:52:21 -07:00
Tim Gross
142884b384 ignore KEK wrapper struct for codegen (#18973)
Our codec code generation doesn't honor `json:"..."` tags which, if we were to
ever implement `json.Marshaller` for the `KeyEncryptionKeyWrapper` struct, would
break the on-disk format of all the existing KEKs.

As a precaution, add this struct to the code generator's ignore list (just like
we have done with `IdentityClaims`).
2023-11-02 11:25:40 -04:00
James Rasell
6d0893cf57 acl/client: fix incorrect denied error on calls with dangling policies. (#18972)
When a user performs a client API call, the Nomad client will
perform an RPC which looks up the ACL policies which the callers
ACL token is assigned. If the ACL token includes dangling (deleted)
policies, the call would previously fail with a permission denied
error.

This change ensures this error is not returned and that the lookup
will succeed in the event of dangling policies.
2023-11-02 15:23:42 +00:00
Luiz Aoqui
a907273557 vault: fix import cycle in vaultclient (#18965)
* Revert "vault: eliminate vaultclient test import cycle (#18652)"

This reverts commit 03cf9ae7ff.

* vault: remove import cycle in vaultclient_test.go
2023-11-02 11:07:04 -04:00
Seth Hoenig
61e21db2b4 docs: add 1.7 cpu upgrade notes and tweak cpu concepts doc (#18977)
* docs: add 1.7 cpu upgrade notes and tweak cpu concepts doc

* docs: fix spelling
2023-11-02 09:58:16 -05:00
Seth Hoenig
0dc9c49c6c docs: add a Concepts/CPU docs page (#18924)
* docs: add a Concepts/CPU docs page

* docs: cpu doc cr feedback

* docs: cpu fix image
2023-11-02 08:45:43 -05:00
Piotr Kazmierczak
d69a1238cd cli: consul setup command (#18820)
An interactive setup helper for configuring Consul to accept Nomad WI-enabled workloads.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-11-02 09:02:07 +01:00
James Rasell
0822af35af cli: remove unused raft tool helper. (#18954) 2023-11-02 07:43:44 +00:00
Michael Schurter
0040427c6d identity: don't generate codec for oidc config (#18964)
Our codec code generation doesn't honor `json:"..."` tags which breaks
the OIDC Discovery endpoint.

This adds the relevant struct to the code generators ignore list (just
like we have done with IdentityClaims).
2023-11-01 13:20:00 -07:00
Tim Gross
feede21d9a test: make CSI bad state GC test synchronous (#18960)
One of our core scheduler tests for GC tests that volumes with invalid
allocations immediately have those claims marked as past claims and puts them
into the unpublishing state. This happens synchronously with the GC evaluation
processing, so there's no need for us to wait for the results.

Fixes: #18959
2023-11-01 15:31:42 -04:00
Seth Hoenig
51b8737ca9 Release/1.7.0 beta.1 (#18962)
* Prepare release 1.7.0-beta.1

* cl: tweak actions cl entry

* Generate files for 1.7.0-beta.1 release

* Prepare for next release

---------

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2023-11-01 14:27:59 -05:00
Logan Attwood
0e643501de Fix the "Starting" allocations link (#18866)
Before this commit, it would bring you to the list of allocations
filtered by status=starting. This status does not exist in the Status
drop-down on the Allocations section of a job in the UI.
2023-11-01 15:23:43 -04:00
Michael Schurter
0b0ae40199 docs: recommend rotating keys on upgrade (#18958)
RIP EdDSA.
2023-11-01 10:57:33 -07:00
Tim Gross
483e78615d template: fix test assertion to be compatible between CE/ENT (#18957)
The template hook emits an error when the task has a Consul block that requires
WI but there's no WI. The exact error message we get depends on whether we're
running in CE or ENT. Update the test assertion so that we can tolerate this
difference without building ENT-specific test files.
2023-11-01 13:26:45 -04:00
Anthony
e1acf72eb5 Automated license utilization reporting docs (#17976) 2023-11-01 12:18:04 -04:00
Seth Hoenig
02d433225f cl: use caps for feature (#18956) 2023-11-01 10:56:39 -05:00
Tim Gross
dd62e8a319 consul/vault: use accessor method to get cluster name in client (#18955)
When looking up the Consul or Vault cluster from a client hook, we should always
use an accessor function rather than trying to lookup the `Cluster` field, which
may be empty for jobs registered before Nomad 1.7.
2023-11-01 10:59:59 -04:00
Michael Schurter
e49ca3c431 identity: Implement change_mode (#18943)
* identity: support change_mode and change_signal

wip - just jobspec portion

* test struct

* cleanup some insignificant boogs

* actually implement change mode

* docs tweaks

* add changelog

* test identity.change_mode operations

* use more words in changelog

* job endpoint tests

* address comments from code review

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-11-01 09:41:11 -05:00
Tim Gross
d62213a135 consul: fix lookups of default cluster across upgrades (#18945)
Allocations that were created before Nomad 1.7 will not have the cluster field
set for their Consul blocks. While this can be corrected server-side, that
doesn't help allocations already on clients.
2023-11-01 10:11:54 -04:00
James Rasell
4ec27a97d1 docs: clarify ACL agent config TTL params apply to auth methods. (#18949) 2023-11-01 13:45:13 +00:00
Luiz Aoqui
bfb2dcd172 Vault small fixes (#18942)
* vault: remove `token_ttl` from `vaultcompat` setup

Since Nomad uses periodic tokens, the right value to set in the role is
`token_period`, not `token_ttl`.

* vault: set 1.11.0 as min version for JWT auth

In order to use workload identities JWT auth with Vault it's required to
have a Vault cluster running v1.11.0+, which the version where
`user_claim_json_pointer` was introduced.
2023-11-01 08:23:19 -04:00
Seth Hoenig
5b56a5c5d1 client: fix cpu core/freq calculation on intel macs (#18934) 2023-11-01 07:16:26 -05:00
James Rasell
4a89a0a0f2 changelog: fix entry wording for #18873 (#18927) 2023-11-01 09:56:31 +00:00
Tim Gross
c1fa145765 vault: fix lookups of default cluster across upgrades (#18940)
Allocations that were created before Nomad 1.7 will not have the `cluster` field
set for their Vault blocks. While this can be corrected server-side, that
doesn't help allocations already on clients.

Also add extra safety on Consul cluster lookup too
2023-10-31 17:30:01 -04:00
Luiz Aoqui
d7edbd44b7 api: handle redirect during websocket upgrade (#18903)
When attempting a WebSocket connection upgrade the client may receive a
redirect request from the server, in which case the request should be
reattempted using the new address present in the `Location` header.
2023-10-31 17:12:11 -04:00
Luiz Aoqui
3ddf1ecf1d actions: minor bug fixes and improvements (#18904) 2023-10-31 17:06:02 -04:00
Tim Gross
2bff6d2a6a docs: fix token_period in example Vault role for WI (#18939)
Vault tokens requested for WI are "periodic" Vault tokens (ones that get
periodically renewed). The field we should be setting for the renewal window is
`token_period`.
2023-10-31 16:33:03 -04:00
Michael Schurter
9afc70ef5a Fix Vault docs to use HCL instead of JSON (#18938) 2023-10-31 13:25:20 -07:00
Michael Schurter
f8a65b6c29 docs: changelog & basic docs for 1.7 WI changes (#18936)
Changelog entries and bare minimum docs for workload identity changes in 1.7.
2023-10-31 13:06:08 -07:00
Michael Schurter
66fbc0f67e identity: default to RS256 for new workload ids (#18882)
OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256.

Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again.

**Test Updates**

Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed.

I added a new `testutil.WaitForKeyring` helper to replace `testutil.WaitForLeader` in cases where the keyring must be initialized before the test may continue. However this is mostly used in the `nomad/` package.

In the `api` and `command/agent` packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners.

**TODO**

- Docs and changelog entries.
- Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days.
- Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this.

**Requiem for ed25519**

When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset.

EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties:

1. Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key.
2. Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious.
3. No parameters. No choices of encodings. It's all well-defined by [RFC 8032](https://datatracker.ietf.org/doc/html/rfc8032).
4. Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus!
5. Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too.

Life was good with ed25519, but sadly it could not last.

[IDPs](https://en.wikipedia.org/wiki/Identity_provider), such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256:

- [PayPal](https://www.paypalobjects.com/.well-known/openid-configuration) supports HS256
- [Yahoo](https://api.login.yahoo.com/.well-known/openid-configuration) supports ES256

RS256 only:

- [GitHub](https://token.actions.githubusercontent.com/.well-known/openid-configuration)
- [GitLab](https://gitlab.com/.well-known/openid-configuration)
- [Google](https://accounts.google.com/.well-known/openid-configuration)
- [Intuit](https://developer.api.intuit.com/.well-known/openid_configuration)
- [Microsoft](https://login.microsoftonline.com/fabrikamb2c.onmicrosoft.com/v2.0/.well-known/openid-configuration)
- [SalesForce](https://login.salesforce.com/.well-known/openid-configuration)
- [SimpleLogin (acquired by ProtonMail)](https://app.simplelogin.io/.well-known/openid-configuration/)
- [TFC](https://app.terraform.io/.well-known/openid-configuration)
2023-10-31 11:25:20 -07:00
Tim Gross
01d050c36b identity: version check multiple and implicit identities (#18926)
Job submitters cannot set multiple identities prior to Nomad 1.7, and cluster
administrators should not set the identity configurations for their `consul` and
`vault` configuration blocks until all servers have been upgraded. Validate
these cases during job submission so as to prevent state store corruption when
jobs are submitting in the middle of a cluster upgrade.
2023-10-31 13:57:53 -04:00
Tim Gross
ea3e711fa6 docs: upgrade guide for integrations deprecation warnings (#18928)
The Consul and Vault integrations work shipping in Nomad 1.7 will deprecated the
existing token-based workflows. These will be removed in Nomad 1.9, so add a
note describing this to the upgrade guide.
2023-10-31 13:21:47 -04:00
Tim Gross
790d4d5d7a changelog entries for Integrations feature work (#18923) 2023-10-31 11:53:43 -04:00
Phil Renaud
d98ed87c1b Actions changelog update to feature (#18921) 2023-10-30 20:28:50 -04:00
Tim Gross
4850f07295 docs: name, audience, and TTL fields for identity blocks (#18916) 2023-10-30 13:45:40 -04:00
Tim Gross
6fd3143fe7 services: fix lookup for Consul tokens (#18914)
The `group_service_hook` needs to supply the Consul service client with Consul
tokens for its services. The lookup in the hook resources was looking for the
wrong key. This would cause the service client to ignore the Consul token we've
received and use the agent's own token.

This changeset also moves the prefix formatting into `MakeUniqueIdentityName` method
to reduce the risk of this kind of bug in the future.
2023-10-30 13:42:18 -04:00
Dave May
0748918a3a cli: Add file prediction for operator raft/snapshot commands (#18901) 2023-10-30 13:40:21 -04:00
Seth Hoenig
b5469dd0eb Post 1.6.3 release (#18918)
* Generate files for 1.6.3 release

* Prepare for next release

* Merge release 1.6.3 files

---------

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2023-10-30 12:38:16 -05:00
Tim Gross
f0330d6df1 identity_hook: implement PreKill hook, not TaskStop hook (#18913)
The allocrunner's `identity_hook` implements the interface for TaskStop, but
this interface is only ever called for task-level hooks. This results in a
leaked goroutine that tries to periodically renew WIs until the client shuts
down gracefully.

Add an implementation for the allocrunner's `PreKill` and `Destroy` hooks, so
that whenever an allocation is stopped or garbage collected we stop renewing its
Workload Identities. This also requires making the `Shutdown` method of `WIDMgr`
safe to call multiple times.
2023-10-30 10:54:22 -04:00
Dave May
1f4965e877 docs: Add code fence to Improvements example (#18902) 2023-10-30 14:13:19 +00:00
Tim Gross
9463d7f88a docs: add note about consul.service_identity ignoring fields (#18900)
The WI we get for Consul services is saved to the client state DB like all other
WIs, but the resulting JWT is never exposed to the task secrets directory
because (a) it's only intended for use with Consul service configuration,
and (b) for group services it could be ambiguous which task to expose it to.

Add a note to the `consul.service_identity` docs that these fields are ignored.
2023-10-30 09:19:15 -04:00
Luiz Aoqui
347389f9f9 vault: derive token using create_from_role (#18880)
Fallback to the ACL role defined in the client's `create_from_role`
configuration when using the JWT flow and the task does not specify a
role to use.
2023-10-27 13:03:44 -04:00
Luiz Aoqui
71a471b90a cli: deprecate -vault-token flag (#18881)
Apply the same deprecation notice from #18863 to the `nomad job plan`
command.
2023-10-27 12:48:11 -04:00
James Rasell
2daf49df9a server: use same receiver name for all server funcs. (#18896) 2023-10-27 16:36:10 +01:00
Tim Gross
694a5ec19d docs: remove stale note about generate_lease from template docs (#18895)
Prior to `consul-template` v0.22.0, automatic PKI renewal wouldn't work properly
based on the expiration of the cert. More recent versions of `consul-template`
can use the expiry to refresh the cert, so it's no longer necessary (and in fact
generates extra load on Vault) to set `generate_lease`. Remove this
recommendation from the docs.

Fixes: #18893
2023-10-27 11:09:09 -04:00
Justin Yang
b76e0429c4 client: add support for NetBSD clients (#18562)
Bumps `shirou/gopsutil` to v3.23.9
2023-10-27 10:33:00 -04:00
Tim Gross
139a96ad12 e2e: fix bind name to allow Connect reachability (#18878)
The `BindName` for JWT authentication should always bind to the `nomad_service` field in the JWT and not include the namespace, as the `nomad_service` is what's actually registered in Consul. 

* Fix the binding rule for the `consulcompat` test 
* Add a reachability assertion so that we don't miss regressions.
* Ensure we have a clean shutdown so that we don't leak state (containers and iptables) between tests.
2023-10-27 10:15:17 -04:00
James Rasell
3c8eb54dfc scheduler: ensure dup alloc names are fixed before plan submit. (#18873)
This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
2023-10-27 14:16:41 +01:00