nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 08:55:43 +03:00

Author	SHA1	Message	Date
James Rasell	facc3e8013	agent: allow configuration of in-memory telemetry sink. (#20166 ) This change adds configuration options for setting the in-memory telemetry sink collection and retention durations. This sink backs the metrics JSON API and previously had hard-coded default values. The new options are particularly useful when running development or debug environments, where metrics collection is desired at a fast and granular rate.	2024-03-25 15:00:18 +00:00
Tim Gross	bdf3ff301e	jobspec: add support for destination partition to `upstream` block (#20167 ) Adds support for specifying a destination Consul admin partition in the `upstream` block. Fixes: https://github.com/hashicorp/nomad/issues/19785	2024-03-22 16:15:22 -04:00
Tim Gross	10dd738a03	jobspec: update `gateway.ingress.service` Consul API fields (#20176 ) Add support for further configuring `gateway.ingress.service` blocks to bring this block up-to-date with currently available Consul API fields (except for namespace and admin partition, which will need be handled under a different PR). These fields are sent to Consul as part of the job endpoint submission hook for Connect gateways. Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>	2024-03-22 13:50:48 -04:00
Michael Schurter	23e4b7c9d2	Upgrade go-msgpack to v2 (#20173 ) Replaces #18812 Upgraded with: ``` find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';' find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';' go get go get -v -u github.com/hashicorp/raft-boltdb/v2 go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80 ``` see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0 for details	2024-03-21 11:44:23 -07:00
Tim Gross	7b9bce2d08	config: fix `client.template` config merging with defaults (#20165 ) When loading the client configuration, the user-specified `client.template` block was not properly merged with the default values. As a result, if the user set any `client.template` field, all the other field defaulted to their zero values instead of the documented defaults. This changeset: * Adds the missing `Merge` method for the client template config and ensures it's called. * Makes a single source of truth for the default template configuration, instead of two different constructors. * Extends the tests to cover the merge of a partial block better. Fixes: https://github.com/hashicorp/nomad/issues/20164	2024-03-20 10:18:56 -04:00
Tim Gross	5138c1c82f	autopilot: add Enterprise health information to API endpoint (#20153 ) Add information about autopilot health to the `/operator/autopilot/health` API in Nomad Enterprise. I've pulled the CE changes required for this feature out of @lindleywhite's PR in the Enterprise repo. A separate PR will include a new `operator autopilot health` command that can present this information at the command line. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394 Co-authored-by: Lindley <lindley@hashicorp.com>	2024-03-18 11:38:17 -04:00
Amir Abbas	40b8f17717	Support insecure flag on artifact (#20126 )	2024-03-14 10:59:20 -05:00
Seth Hoenig	05937ab75b	exec2: add client support for unveil filesystem isolation mode (#20115 ) * exec2: add client support for unveil filesystem isolation mode This PR adds support for a new filesystem isolation mode, "Unveil". The mode introduces a "alloc_mounts" directory where tasks have user-owned directory structure which are bind mounts into the real alloc directory structure. This enables a task driver to use landlock (and maybe the real unveil on openbsd one day) to isolate a task to the task owned directory structure, providing sandboxing. * actually create alloc-mounts-dir directory * fix doc strings about alloc mount dir paths	2024-03-13 08:24:17 -05:00
hc-github-team-nomad-core	46182c2a83	Generate files for 1.7.6 release	2024-03-12 12:04:04 +01:00
Seth Hoenig	286dce7a2a	exec2: add a client.users configuration block (#20093 ) * exec: add a client.users configuration block For now just add min/max dynamic user values; soon we can also absorb the "user.denylist" and "user.checked_drivers" options from the deprecated client.options map. * give the no-op pool implementation a better name * use explicit error types to make referencing them cleaner in tests * use import alias to not shadow package name	2024-03-08 16:02:32 -06:00
Seth Hoenig	4d83733909	tests: swap testify for test in more places (#20028 ) * tests: swap testify for test in plugins/csi/client_test.go * tests: swap testify for test in testutil/ * tests: swap testify for test in host_test.go * tests: swap testify for test in plugin_test.go * tests: swap testify for test in utils_test.go * tests: swap testify for test in scheduler/ * tests: swap testify for test in parse_test.go * tests: swap testify for test in attribute_test.go * tests: swap testify for test in plugins/drivers/ * tests: swap testify for test in command/ * tests: fixup some test usages * go: run go mod tidy * windows: cpuset test only on linux	2024-02-29 12:11:35 -06:00
Juana De La Cuesta	20cfbc82d3	Introduces `Disconnect` block into the `TaskGroup` configuration (#19886 ) This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.	2024-02-19 16:41:35 +01:00
hc-github-team-nomad-core	6e08d9ffff	Generate files for 1.7.5 release	2024-02-13 11:32:59 -05:00
Tim Gross	e986c298ac	alloc exec: fix panics after stream close (#19932 ) In #19172 we added a check on websocket errors to see if they were one of several benign "close" messages. This change inadvertently assumed that other messages used for close would not implement `HTTPCodedError`. When errors like the following are received: > msgpack decode error [pos 0]: io: read/write on closed pipe" they are sent from the inner loop as though they were a "real" error, but the channel is already being closed with a "close" message. This allowed many more attempts to pass thru a previously-undiscovered race condition in the two goroutines that stream RPC responses to the websocket. When the input stream returns an error for any reason (for example, the command we're executing has exited), it will unblock the "outer" goroutine and cause a write to the websocket. If we're concurrently writing the "close error" discussed above, this results in a panic from the websocket library. This changeset includes two fixes: * Catch "closed pipe" error correctly so that we're not sending unnecessary error messages. * Move all writes to the websocket into the same response streaming goroutine. The main handler goroutine will block on a results channel, and the response streaming goroutine will send on that channel with the final error when it's done so it can be reported to the user.	2024-02-12 09:43:34 -05:00
Tim Gross	110d93ab25	windows: remove LazyDLL calls for system modules (#19925 ) On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to load a few system DLL files, which does not prevent DLL hijacking attacks. Hypothetically a local attacker on the client host that can place an abusive library in a specific location could use this to escalate privileges to the Nomad process. Although this attack does not fall within the Nomad security model, it doesn't hurt to follow good practices here. We can remove two of these DLL loads by using wrapper functions provided by the stdlib in `x/sys/windows` Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-02-09 08:47:48 -05:00
hc-github-team-nomad-core	875e96cccc	Generate files for 1.7.4 release	2024-02-08 10:40:24 -05:00
Juana De La Cuesta	120c3ca3c9	Add granular control of SELinux labels for host mounts (#19839 ) Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label * Update website/content/docs/job-specification/volume_mount.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * fix: typo * func: make volume mount verification happen even on mounts with no volume --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-05 10:05:33 +01:00
James Rasell	41555b6370	cli: Fix minor help formatting issue in agent command. (#19743 )	2024-01-17 12:18:00 +00:00
hc-github-team-nomad-core	ddfc157c0a	Generate files for 1.7.3 release	2024-01-15 15:58:41 -05:00
Luiz Aoqui	e1e80f383e	vault: add new `nomad setup vault -check` commmand (#19720 ) The new `nomad setup vault -check` commmand can be used to retrieve information about the changes required before a cluster is migrated from the deprecated legacy authentication flow with Vault to use only workload identities.	2024-01-12 15:48:30 -05:00
Tim Gross	0935f443dc	vault: support allowing tokens to expire without refresh (#19691 ) Some users with batch workloads or short-lived prestart tasks want to derive a Vaul token, use it, and then allow it to expire without requiring a constant refresh. Add the `vault.allow_token_expiration` field, which works only with the Workload Identity workflow and not the legacy workflow. When set to true, this disables the client's renewal loop in the `vault_hook`. When Vault revokes the token lease, the token will no longer be valid. The client will also now automatically detect if the Vault auth configuration does not allow renewals and will disable the renewal loop automatically. Note this should only be used when a secret is requested from Vault once at the start of a task or in a short-lived prestart task. Long-running tasks should never set `allow_token_expiration=true` if they obtain Vault secrets via `template` blocks, as the Vault token will expire and the template runner will continue to make failing requests to Vault until the `vault_retry` attempts are exhausted. Fixes: https://github.com/hashicorp/nomad/issues/8690	2024-01-10 14:49:02 -05:00
Tim Gross	d3e5cae1eb	consul: support admin partitions (#19665 ) Add support for Consul Enterprise admin partitions. We added fingerprinting in https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition` field. The expectation is that most users will create a mapping of Nomad node pool to Consul admin partition. But we'll also create an implicit constraint for the fingerprinted value. Fixes: https://github.com/hashicorp/nomad/issues/13139	2024-01-10 10:41:29 -05:00
Mike Nomitch	31f4296826	Adds support for failures before warning to Consul service checks (#19336 ) Adds support for failures before warning and failures before critical to the automatically created Nomad client and server services in Consul	2023-12-14 11:33:31 -08:00
hc-github-team-nomad-core	b777013ff9	Generate files for 1.7.2 release	2023-12-14 11:23:55 +01:00
hc-github-team-nomad-core	180fd54918	Generate files for 1.7.1 release	2023-12-08 14:39:09 -05:00
Luiz Aoqui	099ee06a60	Revert "deps: update go-metrics to v0.5.3 (#19190 )" (#19374 ) * Revert "deps: update go-metrics to v0.5.3 (#19190)" This reverts commit `ddb060d8b3`. * changelog: add entry for #19374	2023-12-08 08:46:55 -05:00
Luiz Aoqui	c624dc2121	config: fix loading Vault token from env var (#19349 ) The `defaultVault` variable is a pointer to the Vault configuration named `default`. Initially, this variable points to the Vault configuration that is used to load CLI flag values, but after those are merged with the default and config file values the pointer reference must be updated before mutating the config with environment variable values.	2023-12-07 11:56:53 -05:00
Luiz Aoqui	27d2ad1baf	cli: add `-dev-consul` and `-dev-vault` agent mode (#19327 ) The `-dev-consul` and `-dev-vault` flags add default identities and configuration to the Nomad agent to connect and use the workload identity integration with Consul and Vault.	2023-12-07 11:51:20 -05:00
hc-github-team-nomad-core	e799b06f02	Generate files for 1.7.0 release	2023-12-07 16:43:02 +01:00
Tim Gross	3c4e2009f5	connect: deployments should wait for Connect sidecar checks (#19334 ) When a Connect service is registered with Consul, Nomad includes the nested `Connect.SidecarService` field that includes health checks for the Envoy proxy. Because these are not part of the job spec, the alloc health tracker created by `health_hook` doesn't know to read the value of these checks. In many circumstances this won't be noticed, but if the Envoy health check happens to take longer than the `update.min_healthy_time` (perhaps because it's been set low), it's possible for a deployment to progress too early such that there will briefly be no healthy instances of the service available in Consul. Update the Consul service client to find the nested sidecar service in the service catalog and attach it to the results provided to the tracker. The tracker can then check the sidecar health checks. Fixes: https://github.com/hashicorp/nomad/issues/19269	2023-12-06 16:59:51 -05:00
Juana De La Cuesta	cf539c405e	Add a new parameter to avoid starting a replacement for lost allocs (#19101 ) This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run. In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned. Implements issue #10366 Co-authored-by: Dom Lavery <dom@circleci.com> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-12-06 12:28:42 +01:00
Michael Schurter	4cb40433bb	Post 1.7.0 rc.1 release (#19252 ) * Prepare release 1.7.0-rc.1 * Generate files for 1.7.0-rc.1 release * Prepare for next release	2023-12-01 08:53:48 -05:00
James Rasell	81249ffe65	agent: log using error keyword not err in keyring endpoint (#19243 )	2023-11-30 16:40:13 +00:00
Luiz Aoqui	ddb060d8b3	deps: update go-metrics to v0.5.3 (#19190 ) Update `go-metrics` to v0.5.3 to pick https://github.com/hashicorp/go-metrics/pull/146.	2023-11-28 12:37:57 -05:00
Piotr Kazmierczak	248b2ba5cd	WI: use single auth method for Consul by default (#19169 ) This simplifies the default setup of Nomad workloads WI-based authentication for Consul by using a single auth method with 2 binding rules. Users can still specify separate auth methods for services and tasks.	2023-11-28 12:22:27 +01:00
Luiz Aoqui	5ff6cce3ab	vault: update default JWT auth method path (#19188 ) Update default auth method path to be `jwt-nomad` to avoid potential conflicts when Vault's `jwt` default is already being used for something else.	2023-11-27 17:48:12 -05:00
Piotr Kazmierczak	742651f2f7	agent: ignore websocket statuses 1000, 1001 and 1005 correctly (#19172 ) These are "close" messages and not actual errors.	2023-11-27 09:33:08 +01:00
James Rasell	532402aa2d	actions: use specific RPC request object and tighten naming. (#19149 )	2023-11-23 07:42:37 +00:00
James Rasell	0f0b9a1a3c	action: add job action name validation (#19145 )	2023-11-22 08:02:49 +00:00
hc-github-team-nomad-core	ea3f6cc879	Generate files for 1.7.0-beta.2 release	2023-11-15 22:47:41 +00:00
Adriano Caloiaro	f66eb83fc0	Add `go-netaddrs` support to `retry_join` (#18745 )	2023-11-15 10:07:18 -05:00
Tim Gross	7191c78928	refactor: rename allocrunner's Consul service reg handler (#19019 ) The allocrunner has a service registration handler that proxies various API calls to Consul. With multi-cluster support (for ENT), the service registration handler is what selects the correct Consul client. The name of this field in the allocrunner and taskrunner code base looks like it's referring to the actual Consul API client. This was actually the case before Nomad native service discovery was implemented, but now the name is misleading.	2023-11-08 15:39:32 -05:00
Michael Schurter	c4ae91f8be	Fix WorkloadIdentity.TTL handling, jobspec2 testing, and hcl1 vs 2 parsing (#19024 ) * make the little dots consistent * don't trim delimiter as that over matches * test jobspec2 package * copy api/WorkloadIdentity.TTL -> structs * test ttl parsing * fix hcl1 v 2 parsing mismatch * make jobspec(1) tests match jobspec2 tests	2023-11-08 09:01:16 -08:00
Tim Gross	9d075c44b2	config: remove old Vault/Consul config blocks from parser (#18997 ) Remove the now-unused original configuration blocks for Consul and Vault from the agent configuration parsing. When the agent needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service (or the default cluster for the agent's own use). This is third of three changesets for this work. Fixes: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Ref: https://github.com/hashicorp/nomad/pull/18994	2023-11-08 09:30:08 -05:00
Tim Gross	50f0ce5412	config: remove old Vault/Consul config blocks from client (#18994 ) Remove the now-unused original configuration blocks for Consul and Vault from the client. When the client needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service. Add a helper for accessing the default clusters (for the client's own use). This is two of three changesets for this work. The remainder will implement the same changes in the `command/agent` package. As part of this work I discovered and fixed two bugs: * The gRPC proxy socket that we create for Envoy is only ever created using the default Consul cluster's configuration. This will prevent Connect from being used with the non-default cluster. * The Consul configuration we use for templates always comes from the default Consul cluster's configuration, but will use the correct Consul token for the non-default cluster. This will prevent templates from being used with the non-default cluster. Ref: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Fixes: https://github.com/hashicorp/nomad/issues/18984 Fixes: https://github.com/hashicorp/nomad/issues/18983	2023-11-07 09:15:37 -05:00
Tim Gross	1ef99f0536	config: remove old Vault/Consul config blocks from server (#18991 ) Remove the now-unused original configuration blocks for Consul and Vault from the server. When the server needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service. Add a helper for accessing the default clusters (for the servers own use). This is one of three changesets for this work. The remainder will implement the same changes in the `client` package and on the `command/agent` package. As part of this work I discovered that the job submission hook for Vault only checks the enabled flag on the default cluster, rather than the clusters that are used by the job being submitted. This will return an error on job registration saying that Vault is disabled. Fix that to check only the cluster(s) used by the job. Ref: https://github.com/hashicorp/nomad/issues/18947 Fixes: https://github.com/hashicorp/nomad/issues/18990	2023-11-06 10:26:20 -05:00
Seth Hoenig	51b8737ca9	Release/1.7.0 beta.1 (#18962 ) * Prepare release 1.7.0-beta.1 * cl: tweak actions cl entry * Generate files for 1.7.0-beta.1 release * Prepare for next release --------- Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2023-11-01 14:27:59 -05:00
Michael Schurter	e49ca3c431	identity: Implement `change_mode` (#18943 ) * identity: support change_mode and change_signal wip - just jobspec portion * test struct * cleanup some insignificant boogs * actually implement change mode * docs tweaks * add changelog * test identity.change_mode operations * use more words in changelog * job endpoint tests * address comments from code review --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-11-01 09:41:11 -05:00
Luiz Aoqui	3ddf1ecf1d	actions: minor bug fixes and improvements (#18904 )	2023-10-31 17:06:02 -04:00
Michael Schurter	66fbc0f67e	identity: default to RS256 for new workload ids (#18882 ) OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256. Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again. Test Updates Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed. I added a new `testutil.WaitForKeyring` helper to replace `testutil.WaitForLeader` in cases where the keyring must be initialized before the test may continue. However this is mostly used in the `nomad/` package. In the `api` and `command/agent` packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners. TODO - Docs and changelog entries. - Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days. - Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this. Requiem for ed25519 When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset. EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties: 1. Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key. 2. Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious. 3. No parameters. No choices of encodings. It's all well-defined by [RFC 8032](https://datatracker.ietf.org/doc/html/rfc8032). 4. Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus! 5. Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too. Life was good with ed25519, but sadly it could not last. [IDPs](https://en.wikipedia.org/wiki/Identity_provider), such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256: - [PayPal](https://www.paypalobjects.com/.well-known/openid-configuration) supports HS256 - [Yahoo](https://api.login.yahoo.com/.well-known/openid-configuration) supports ES256 RS256 only: - [GitHub](https://token.actions.githubusercontent.com/.well-known/openid-configuration) - [GitLab](https://gitlab.com/.well-known/openid-configuration) - [Google](https://accounts.google.com/.well-known/openid-configuration) - [Intuit](https://developer.api.intuit.com/.well-known/openid_configuration) - [Microsoft](https://login.microsoftonline.com/fabrikamb2c.onmicrosoft.com/v2.0/.well-known/openid-configuration) - [SalesForce](https://login.salesforce.com/.well-known/openid-configuration) - [SimpleLogin (acquired by ProtonMail)](https://app.simplelogin.io/.well-known/openid-configuration/) - [TFC](https://app.terraform.io/.well-known/openid-configuration)	2023-10-31 11:25:20 -07:00

1 2 3 4 5 ...

2278 Commits