nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 18:35:44 +03:00

Author	SHA1	Message	Date
Luiz Aoqui	ab36cf031c	vault: avoid continual renewal of invalid token (#18985 ) A series of errors may happen when a token is invalidated while the Vault client is waiting to renew it. The token may have been invalidated for several reasons, such as the alloc finished running and it's now terminal or the token may have been change directly on Vault out-of-band. Most of the errors are caused by retries that will never succeed until Vault fully removes the token from its state. This commit prevents the retries by making the error `invalid lease ID` a fatal error. In earlier versions of Vault, this case was covered by the error `lease not found or lease is not renewable`, which is already considered to be a fatal error by Nomad: `2d0cde4ccc/vault/expiration.go (L636-L639)` But https://github.com/hashicorp/vault/pull/5346 introduced an earlier `nil` check that generates a different error message: `750ab337ea/vault/expiration.go (L1362-L1364)` Both errors happen for the same reason (`le == nil`) and so should be considered fatal on renewal.	2023-11-07 19:50:19 -05:00
Luiz Aoqui	7054fe1a8c	vault: always renew tokens using the renewal loop (#18998 ) Previously, a Vault token could renewed either periodically via the renewal loop or immediately by calling `RenewToken()`. But a race condition in the renewal loop could cause an attempt to renew an expired token. If both `updateCh` and `renewalCh` are active (such as when a task stops at the same time its token is waiting for renewal), the following `select` picks a `case` at random. `78f0c6b2a9/client/vaultclient/vaultclient.go (L557-L564)` If `case <-renewalCh` is picked, the token is incorrectly re-added to the heap, causing unnecessary renewals of a token that is already expired. `1604dba508/client/vaultclient/vaultclient.go (L505-L510)` To prevent this situation, the `renew()` function should only renew tokens that are currently in the heap, so `RenewToken()` must first push the token to the heap and wait for the renewal to happen instead of calling `renew()` directly since this could cause another race condition where the token is renewed twice: once by `RenewToken()` calling `renew()` directly and a second time if the renewal happens to pick the token as soon as `RenewToken()` adds it to the heap.	2023-11-07 19:49:33 -05:00
Phil Renaud	783572de7d	[ui] Actions implementation in the web UI (#18793 ) * runAction model and adapter funcs * Hacky but functional action running from job index * remove proxy hack * runAction added to taskSubRow * Added tty and ws_handshake to job action endpoint call * delog * Bunch of streaming work * action started, running, and finished notification titles, neutral color, and ansi escape * Handle random alloc selection in the web ui * Run on All implementation in web ui * [ui] Helios two-step button and uniform title bar for Actions (#18912) * Initial pass at title bar button uniformity * Vertical align on actions dropdown toggle and small edits to prevent keynav overflow issue * We represent loading state w text and disable now * Pageheader component to align buttons * Buttons standardized * Actions dropdown reveal for multi-alloc job * Notification code styles * An action-having single alloc job * Mirageed * Actions-laden jobs in mirage * Separating allocCount and taskCount in mirage mocks * Unbreak stop job tests * Permissions for actions dropdown * tests for running actions from the job index page * running from a task row actions tests * some todocleanup * PR feedback addressed, including page helper for actions	2023-11-07 15:29:43 -05:00
Seth Hoenig	cf2f48efd4	build: update to Go 1.21.4 (#19013 )	2023-11-07 13:18:07 -06:00
Seth Hoenig	a2f7ab2645	e2e disable windows (#19012 ) * e2e: disable windows client * e2e: disable windows artifact test	2023-11-07 09:34:18 -06:00
Tim Gross	50f0ce5412	config: remove old Vault/Consul config blocks from client (#18994 ) Remove the now-unused original configuration blocks for Consul and Vault from the client. When the client needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service. Add a helper for accessing the default clusters (for the client's own use). This is two of three changesets for this work. The remainder will implement the same changes in the `command/agent` package. As part of this work I discovered and fixed two bugs: * The gRPC proxy socket that we create for Envoy is only ever created using the default Consul cluster's configuration. This will prevent Connect from being used with the non-default cluster. * The Consul configuration we use for templates always comes from the default Consul cluster's configuration, but will use the correct Consul token for the non-default cluster. This will prevent templates from being used with the non-default cluster. Ref: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Fixes: https://github.com/hashicorp/nomad/issues/18984 Fixes: https://github.com/hashicorp/nomad/issues/18983	2023-11-07 09:15:37 -05:00
Tim Gross	1998004483	move deprecation warning for Vault/Consul token to admission hook (#18995 ) Submitting a Consul or Vault token with a job is deprecated in Nomad 1.7 and intended for removal in Nomad 1.9. We added a deprecation warning to the CLI when the user passes in the appropriate flag or environment variable in does not use Vault or Consul but happen to have the appropriate environment variable in your environment. While this is generally a bad practice (because the token is leaked to Nomad), it's also the existing practice for some users. Move the warning to the job admission hook. This will allow us to warn only when appropriate, and that will also help the migration process by producing warnings only for the relevant jobs.	2023-11-07 08:37:06 -05:00
Seth Hoenig	3ba364e42f	deps: update some dependencies (#19002 ) * deps: update shoenig/test to 1.7.0 * deps: update go-set/v2 to v2.1.0 * deps: update shoenig/go-landlock to v1.2.0	2023-11-07 07:34:40 -06:00
Piotr Kazmierczak	7c6863b479	cli: setup vault command (#18910 ) An interactive setup helper for configuring Vault to accept Nomad WI-enabled workloads. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-11-07 10:42:00 +01:00
Dave May	e4f98a8d1d	docs: fix broken links in docker.mdx (#19003 )	2023-11-07 07:34:47 +00:00
Tim Gross	1ef99f0536	config: remove old Vault/Consul config blocks from server (#18991 ) Remove the now-unused original configuration blocks for Consul and Vault from the server. When the server needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service. Add a helper for accessing the default clusters (for the servers own use). This is one of three changesets for this work. The remainder will implement the same changes in the `client` package and on the `command/agent` package. As part of this work I discovered that the job submission hook for Vault only checks the enabled flag on the default cluster, rather than the clusters that are used by the job being submitted. This will return an error on job registration saying that Vault is disabled. Fix that to check only the cluster(s) used by the job. Ref: https://github.com/hashicorp/nomad/issues/18947 Fixes: https://github.com/hashicorp/nomad/issues/18990	2023-11-06 10:26:20 -05:00
dependabot[bot]	a13f0c6c2d	build(deps-dev): bump next from 13.4.2 to 14.0.1 in /website (#18999 ) Bumps [next](https://github.com/vercel/next.js) from 13.4.2 to 14.0.1. - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](https://github.com/vercel/next.js/compare/v13.4.2...v14.0.1) --- updated-dependencies: - dependency-name: next dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-06 09:22:53 -05:00
Tim Gross	b62c5c51d2	cli: extend coverage of `operator client-state` command (#18996 ) The `operator client-state` command is mostly used for developer debugging of the Nomad client state, but it hasn't been updated with several recent additions. Add allocation identities, network status, and dynamic volumes to the objects it outputs. Also, fix a bug where reading the state for an allocation without task states will crash the CLI. This can happen if the Nomad client stops after an alloc is persisted to disk but before the task actually starts.	2023-11-03 15:43:05 -04:00
Erwan Ben Souiden	9f995e76a4	docs: fix Grafana doc breaking link (#18988 )	2023-11-03 14:31:37 +00:00
James Rasell	5f98e6473c	acl: use token locality consts when validating auth methods. (#18975 )	2023-11-03 07:22:54 +00:00
Seth Hoenig	1604dba508	client: fingerprint cpu on raspberry pi (#18982 ) This PR tweaks the linux cpu fingerprinter to handle the case where no NUMA node data is found under /sys/devices/system/, in which case we need to assume just one node, one socket.	2023-11-02 15:52:37 -05:00
Michael Schurter	78f0c6b2a9	cli: update acl bootstrap help to match docs (#18961 ) See https://developer.hashicorp.com/nomad/docs/commands/acl/bootstrap	2023-11-02 08:52:21 -07:00
Tim Gross	142884b384	ignore KEK wrapper struct for codegen (#18973 ) Our codec code generation doesn't honor `json:"..."` tags which, if we were to ever implement `json.Marshaller` for the `KeyEncryptionKeyWrapper` struct, would break the on-disk format of all the existing KEKs. As a precaution, add this struct to the code generator's ignore list (just like we have done with `IdentityClaims`).	2023-11-02 11:25:40 -04:00
James Rasell	6d0893cf57	acl/client: fix incorrect denied error on calls with dangling policies. (#18972 ) When a user performs a client API call, the Nomad client will perform an RPC which looks up the ACL policies which the callers ACL token is assigned. If the ACL token includes dangling (deleted) policies, the call would previously fail with a permission denied error. This change ensures this error is not returned and that the lookup will succeed in the event of dangling policies.	2023-11-02 15:23:42 +00:00
Luiz Aoqui	a907273557	vault: fix import cycle in `vaultclient` (#18965 ) * Revert "vault: eliminate vaultclient test import cycle (#18652)" This reverts commit `03cf9ae7ff`. * vault: remove import cycle in vaultclient_test.go	2023-11-02 11:07:04 -04:00
Seth Hoenig	61e21db2b4	docs: add 1.7 cpu upgrade notes and tweak cpu concepts doc (#18977 ) * docs: add 1.7 cpu upgrade notes and tweak cpu concepts doc * docs: fix spelling	2023-11-02 09:58:16 -05:00
Seth Hoenig	0dc9c49c6c	docs: add a Concepts/CPU docs page (#18924 ) * docs: add a Concepts/CPU docs page * docs: cpu doc cr feedback * docs: cpu fix image	2023-11-02 08:45:43 -05:00
Piotr Kazmierczak	d69a1238cd	cli: consul setup command (#18820 ) An interactive setup helper for configuring Consul to accept Nomad WI-enabled workloads. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-11-02 09:02:07 +01:00
James Rasell	0822af35af	cli: remove unused raft tool helper. (#18954 )	2023-11-02 07:43:44 +00:00
Michael Schurter	0040427c6d	identity: don't generate codec for oidc config (#18964 ) Our codec code generation doesn't honor `json:"..."` tags which breaks the OIDC Discovery endpoint. This adds the relevant struct to the code generators ignore list (just like we have done with IdentityClaims).	2023-11-01 13:20:00 -07:00
Tim Gross	feede21d9a	test: make CSI bad state GC test synchronous (#18960 ) One of our core scheduler tests for GC tests that volumes with invalid allocations immediately have those claims marked as past claims and puts them into the unpublishing state. This happens synchronously with the GC evaluation processing, so there's no need for us to wait for the results. Fixes: #18959	2023-11-01 15:31:42 -04:00
Seth Hoenig	51b8737ca9	Release/1.7.0 beta.1 (#18962 ) * Prepare release 1.7.0-beta.1 * cl: tweak actions cl entry * Generate files for 1.7.0-beta.1 release * Prepare for next release --------- Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2023-11-01 14:27:59 -05:00
Logan Attwood	0e643501de	Fix the "Starting" allocations link (#18866 ) Before this commit, it would bring you to the list of allocations filtered by status=starting. This status does not exist in the Status drop-down on the Allocations section of a job in the UI.	2023-11-01 15:23:43 -04:00
Michael Schurter	0b0ae40199	docs: recommend rotating keys on upgrade (#18958 ) RIP EdDSA.	2023-11-01 10:57:33 -07:00
Tim Gross	483e78615d	template: fix test assertion to be compatible between CE/ENT (#18957 ) The template hook emits an error when the task has a Consul block that requires WI but there's no WI. The exact error message we get depends on whether we're running in CE or ENT. Update the test assertion so that we can tolerate this difference without building ENT-specific test files.	2023-11-01 13:26:45 -04:00
Anthony	e1acf72eb5	Automated license utilization reporting docs (#17976 )	2023-11-01 12:18:04 -04:00
Seth Hoenig	02d433225f	cl: use caps for feature (#18956 )	2023-11-01 10:56:39 -05:00
Tim Gross	dd62e8a319	consul/vault: use accessor method to get cluster name in client (#18955 ) When looking up the Consul or Vault cluster from a client hook, we should always use an accessor function rather than trying to lookup the `Cluster` field, which may be empty for jobs registered before Nomad 1.7.	2023-11-01 10:59:59 -04:00
Michael Schurter	e49ca3c431	identity: Implement `change_mode` (#18943 ) * identity: support change_mode and change_signal wip - just jobspec portion * test struct * cleanup some insignificant boogs * actually implement change mode * docs tweaks * add changelog * test identity.change_mode operations * use more words in changelog * job endpoint tests * address comments from code review --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-11-01 09:41:11 -05:00
Tim Gross	d62213a135	consul: fix lookups of default cluster across upgrades (#18945 ) Allocations that were created before Nomad 1.7 will not have the cluster field set for their Consul blocks. While this can be corrected server-side, that doesn't help allocations already on clients.	2023-11-01 10:11:54 -04:00
James Rasell	4ec27a97d1	docs: clarify ACL agent config TTL params apply to auth methods. (#18949 )	2023-11-01 13:45:13 +00:00
Luiz Aoqui	bfb2dcd172	Vault small fixes (#18942 ) * vault: remove `token_ttl` from `vaultcompat` setup Since Nomad uses periodic tokens, the right value to set in the role is `token_period`, not `token_ttl`. * vault: set 1.11.0 as min version for JWT auth In order to use workload identities JWT auth with Vault it's required to have a Vault cluster running v1.11.0+, which the version where `user_claim_json_pointer` was introduced.	2023-11-01 08:23:19 -04:00
Seth Hoenig	5b56a5c5d1	client: fix cpu core/freq calculation on intel macs (#18934 )	2023-11-01 07:16:26 -05:00
James Rasell	4a89a0a0f2	changelog: fix entry wording for #18873 (#18927 )	2023-11-01 09:56:31 +00:00
Tim Gross	c1fa145765	vault: fix lookups of default cluster across upgrades (#18940 ) Allocations that were created before Nomad 1.7 will not have the `cluster` field set for their Vault blocks. While this can be corrected server-side, that doesn't help allocations already on clients. Also add extra safety on Consul cluster lookup too	2023-10-31 17:30:01 -04:00
Luiz Aoqui	d7edbd44b7	api: handle redirect during websocket upgrade (#18903 ) When attempting a WebSocket connection upgrade the client may receive a redirect request from the server, in which case the request should be reattempted using the new address present in the `Location` header.	2023-10-31 17:12:11 -04:00
Luiz Aoqui	3ddf1ecf1d	actions: minor bug fixes and improvements (#18904 )	2023-10-31 17:06:02 -04:00
Tim Gross	2bff6d2a6a	docs: fix `token_period` in example Vault role for WI (#18939 ) Vault tokens requested for WI are "periodic" Vault tokens (ones that get periodically renewed). The field we should be setting for the renewal window is `token_period`.	2023-10-31 16:33:03 -04:00
Michael Schurter	9afc70ef5a	Fix Vault docs to use HCL instead of JSON (#18938 )	2023-10-31 13:25:20 -07:00
Michael Schurter	f8a65b6c29	docs: changelog & basic docs for 1.7 WI changes (#18936 ) Changelog entries and bare minimum docs for workload identity changes in 1.7.	2023-10-31 13:06:08 -07:00
Michael Schurter	66fbc0f67e	identity: default to RS256 for new workload ids (#18882 ) OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256. Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again. Test Updates Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed. I added a new `testutil.WaitForKeyring` helper to replace `testutil.WaitForLeader` in cases where the keyring must be initialized before the test may continue. However this is mostly used in the `nomad/` package. In the `api` and `command/agent` packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners. TODO - Docs and changelog entries. - Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days. - Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this. Requiem for ed25519 When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset. EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties: 1. Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key. 2. Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious. 3. No parameters. No choices of encodings. It's all well-defined by [RFC 8032](https://datatracker.ietf.org/doc/html/rfc8032). 4. Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus! 5. Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too. Life was good with ed25519, but sadly it could not last. [IDPs](https://en.wikipedia.org/wiki/Identity_provider), such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256: - [PayPal](https://www.paypalobjects.com/.well-known/openid-configuration) supports HS256 - [Yahoo](https://api.login.yahoo.com/.well-known/openid-configuration) supports ES256 RS256 only: - [GitHub](https://token.actions.githubusercontent.com/.well-known/openid-configuration) - [GitLab](https://gitlab.com/.well-known/openid-configuration) - [Google](https://accounts.google.com/.well-known/openid-configuration) - [Intuit](https://developer.api.intuit.com/.well-known/openid_configuration) - [Microsoft](https://login.microsoftonline.com/fabrikamb2c.onmicrosoft.com/v2.0/.well-known/openid-configuration) - [SalesForce](https://login.salesforce.com/.well-known/openid-configuration) - [SimpleLogin (acquired by ProtonMail)](https://app.simplelogin.io/.well-known/openid-configuration/) - [TFC](https://app.terraform.io/.well-known/openid-configuration)	2023-10-31 11:25:20 -07:00
Tim Gross	01d050c36b	identity: version check multiple and implicit identities (#18926 ) Job submitters cannot set multiple identities prior to Nomad 1.7, and cluster administrators should not set the identity configurations for their `consul` and `vault` configuration blocks until all servers have been upgraded. Validate these cases during job submission so as to prevent state store corruption when jobs are submitting in the middle of a cluster upgrade.	2023-10-31 13:57:53 -04:00
Tim Gross	ea3e711fa6	docs: upgrade guide for integrations deprecation warnings (#18928 ) The Consul and Vault integrations work shipping in Nomad 1.7 will deprecated the existing token-based workflows. These will be removed in Nomad 1.9, so add a note describing this to the upgrade guide.	2023-10-31 13:21:47 -04:00
Tim Gross	790d4d5d7a	changelog entries for Integrations feature work (#18923 )	2023-10-31 11:53:43 -04:00
Phil Renaud	d98ed87c1b	Actions changelog update to feature (#18921 )	2023-10-30 20:28:50 -04:00

1 2 3 4 5 ...

25265 Commits