nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 17:05:43 +03:00

Author	SHA1	Message	Date
dependabot[bot]	40bbddf3d8	chore(deps): bump github.com/prometheus/client_golang (#19733 )	2024-01-15 08:24:43 +00:00
Luiz Aoqui	e1e80f383e	vault: add new `nomad setup vault -check` commmand (#19720 ) The new `nomad setup vault -check` commmand can be used to retrieve information about the changes required before a cluster is migrated from the deprecated legacy authentication flow with Vault to use only workload identities.	2024-01-12 15:48:30 -05:00
Seth Hoenig	5b7f4746ce	client/allocdir: use an interface in place of AllocDir structs (#19703 ) * client/allocdir: use an interface in place of AllocDir structs This PR replace allocdir.AllocDir with allocdir.Interface such that we may eventually have another implementation of alloc directories. This is in support of the exec2 driver, which will need an implementation of the alloc directory incompatibile with the current version. use rlock	2024-01-12 14:13:29 -06:00
Piotr Kazmierczak	858a805d7d	e2e: add a note about provisioning the infrastructure on macOS/Apple Silicon (#19727 )	2024-01-12 14:09:50 +01:00
Piotr Kazmierczak	5d12ca4f57	state store: better handling of job deletion (#19609 ) When jobs are deleted with -purge, all their deployments and allocations should be deleted from the state store, and the evals status should be set to complete. Otherwise we end up in a situation where users could re-submit previously failing jobs, but these new jobs would not get deployments allocated unless system gc got called.	2024-01-12 10:08:55 +01:00
Luiz Aoqui	b2aa6ffd05	docs: fix Consul ACL requirements (#19721 ) Even with the new workload identitiy based flow the Nomad servers still need the `acl = "write"` permission in order to revoke service identity tokens.	2024-01-11 15:52:23 -05:00
Seth Hoenig	a58f0eca8e	e2e: move rawexec oversub tests into oversubscription e2e test suite (#19717 ) * e2e: move rawexec oversub tests into oversubscription e2e test suite This PR moves two tests for raw_exec and memory oversubscription into the oversubscription test suite, which has the necessary plumbing to activate and restore the oversubscription configuration of the scheduler during the test. * cr: rename files for better readability	2024-01-11 14:27:05 -06:00
Luiz Aoqui	8d0a469000	vault: remove revoked Vault accessors from state (#19706 ) When using the no-op Vault client the Nomad server still needs to delete the revoked Vault accessors from state to prevent them from lingering forever after the cluster migrates to the workload identity flow.	2024-01-11 14:38:51 -05:00
Seth Hoenig	aad932eeee	build: update to go1.21.6 (#19709 )	2024-01-11 09:48:56 -06:00
Tim Gross	4c206d0b19	docs: changelog entry for ENT PR (#19705 ) Ref: https://github.com/hashicorp/nomad-enterprise/pull/1370	2024-01-11 10:36:08 -05:00
Seth Hoenig	0c08f94c8e	build: use setup-golang@v3 to handle auto caching (#19707 ) * wip: try on branch * build: use setup-golang@v3 to handle auto caching	2024-01-11 08:51:56 -06:00
Seth Hoenig	9410c519ff	drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration (#19599 ) * drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration * fix tests	2024-01-11 08:20:15 -06:00
Tim Gross	1254468600	consul: refactor job mutation hook (#19699 ) The job mutation logic for Nomad CE and Nomad ENT are nearly identical except for a prelude that grabs the correct default cluster. Factor this out into a method that can be shared between both code bases.	2024-01-10 16:29:05 -05:00
CJ	c9cd8480fa	docs: considerations for Stateful Workloads (#19077 ) Co-authored-by: Adrian Todorov <adrian.todorov@hashicorp.com>	2024-01-10 16:06:45 -05:00
Piotr Kazmierczak	930339a0fa	e2e: remove broken Consul WI test (#19697 )	2024-01-10 21:31:18 +01:00
Tim Gross	0935f443dc	vault: support allowing tokens to expire without refresh (#19691 ) Some users with batch workloads or short-lived prestart tasks want to derive a Vaul token, use it, and then allow it to expire without requiring a constant refresh. Add the `vault.allow_token_expiration` field, which works only with the Workload Identity workflow and not the legacy workflow. When set to true, this disables the client's renewal loop in the `vault_hook`. When Vault revokes the token lease, the token will no longer be valid. The client will also now automatically detect if the Vault auth configuration does not allow renewals and will disable the renewal loop automatically. Note this should only be used when a secret is requested from Vault once at the start of a task or in a short-lived prestart task. Long-running tasks should never set `allow_token_expiration=true` if they obtain Vault secrets via `template` blocks, as the Vault token will expire and the template runner will continue to make failing requests to Vault until the `vault_retry` attempts are exhausted. Fixes: https://github.com/hashicorp/nomad/issues/8690	2024-01-10 14:49:02 -05:00
Luiz Aoqui	5267eec3ad	vault: fix token revocation during workflow migration (#19689 ) When transitioning from the legacy token-based workflow to the new JWT workflow for Vault the previous code would instantiate a no-op Vault if the server configuration had a `default_identity` block. This no-op client returned an error for some of its operations were called, such as `LookupToken` and `RevokeTokens`. The original intention was that, in the new JWT workflow, none of these methods should be called, so returning an error could help surface potential bugs. But the `RevokeTokens` and `MarkForRevocation` methods _are_ called even in the JWT flow. When a leadership transition happens, the new server looks for unused Vault accessors from state and tries to revoke them. Similarly, the `RevokeTokens` method is called every time the `Node.UpdataStatus` and `Node.UpdateAlloc` RPCs are made by clients, as the Nomad server tries to find unused Vault tokens for the node/alloc. Since the new JWT flow does not require Nomad servers to contact Vault, calling `RevokeTokens` and `MarkForRevocation` is not able to complete without a Vault token, so this commit changes the logic to use the no-op Vault client when no token is configured. It also updates the client itself to not error if these methods are called, but to rather just log so operators can be made aware that there are Vault tokens created by Nomad that have not been force-expired. When migrating an existing cluster to the new workload identity based flow, Nomad operators must first upgrade the Nomad version without removing any of the existing Vault configuration. Doing so can prevent Nomad servers from managing and cleaning-up existing Vault tokens during a leadership transition and node or alloc updates. Operators must also resubmit all jobs with a `vault` block so they are updated with an `identity` for Vault. Skipping this step may cause allocations to fail if their Vault token expires (if, for example, the Nomad client stops running for TTL/2) or if they are rescheduled, since the new client will try to follow the legacy flow which will fail if the Nomad server configuration for Vault has already been updated to remove the Vault address and token.	2024-01-10 13:28:46 -05:00
Tim Gross	d3e5cae1eb	consul: support admin partitions (#19665 ) Add support for Consul Enterprise admin partitions. We added fingerprinting in https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition` field. The expectation is that most users will create a mapping of Nomad node pool to Consul admin partition. But we'll also create an implicit constraint for the fingerprinted value. Fixes: https://github.com/hashicorp/nomad/issues/13139	2024-01-10 10:41:29 -05:00
Daniel Peinhopf	9eb357020d	Docs: Alternative IIS Task Driver (#19411 )	2024-01-10 14:14:30 +00:00
Seth Hoenig	cb7d078c1d	drivers/raw_exec: enable configuring raw_exec task to have no memory limit (#19670 ) * drivers/raw_exec: enable configuring raw_exec task to have no memory limit This PR makes it possible to configure a raw_exec task to not have an upper memory limit, which is how the driver would behave pre-1.7. This is done by setting memory_max = -1. The cluster (or node pool) must have memory oversubscription enabled. * cl: add cl	2024-01-09 14:57:13 -06:00
Egor Mikhailov	18f49e015f	auth: add new optional `OIDCDisableUserInfo` setting for OIDC auth provider (#19566 ) Add new optional `OIDCDisableUserInfo` setting for OIDC auth provider which disables a request to the identity provider to get OIDC UserInfo. This option is helpful when your identity provider doesn't send any additional claims from the UserInfo endpoint, such as Microsoft AD FS OIDC Provider: > The AD FS UserInfo endpoint always returns the subject claim as specified in the > OpenID standards. AD FS doesn't support additional claims requested via the > UserInfo endpoint Fixes #19318	2024-01-09 13:41:46 -05:00
Tim Gross	c875f3e49a	docs: expand docs on implicit ACL capabilities grants (#19681 ) An audit of Nomad's ACLs resulted in some confusion around whether the `NamespaceValidator` method is conjunctive ("add", as implied by the docs) or disjunctive ("or", as it is by design). Clarify the ACL documentation as follows: * Call out where fine-grained capabilities imply grants to other capabilities (for example, that `csi-read-volume` grants `csi-list-volume`). * Fix an incorrectly documented ACL requirement for the CSI List External Volumes API. * Clarify how ACLs are expected to work for the two search API endpoints, such that you need list/read access to the objects in the search context.	2024-01-09 13:25:05 -05:00
James Rasell	a3a03dff78	acl: ensure auth method configs are correctly and fully hashed. (#19677 )	2024-01-09 14:03:26 +00:00
dependabot[bot]	f3bc9c7c41	chore(deps): bump github.com/docker/docker (#19672 )	2024-01-09 08:24:20 +00:00
Tim Gross	a399f16a31	docs: describe cgroup controller requirements (#19493 ) Nomad can only use cgroups to control resource requirements if all the cgroups controllers are actually enabled. Add this to our requirements documentation as well as the impacted `exec` and `java` task drivers.	2024-01-08 10:01:14 -05:00
am-ak	7dc82f233f	[DOCS] Update docker.mdx (#19657 ) Removed info regarding development of Nomad	2024-01-08 14:32:57 +00:00
James Rasell	fbea8d1051	server: Fix panic when validating non-service reschedule block. (#19652 )	2024-01-08 14:14:00 +00:00
Shantanu Gadgil	6bbd3b0cec	reschedule is at group level (#19653 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2024-01-08 10:54:52 +00:00
dependabot[bot]	398b5000c1	chore(deps): bump github.com/hashicorp/go-plugin from 1.4.10 to 1.6.0 (#19646 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2024-01-08 08:26:34 +00:00
James Rasell	ff2d0d6453	cli: Fix dummy FSM create to ensure snapshot state command works. (#19630 ) The Nomad state store function was recently updated to validate certain parameters, fixing a panic condition. This change meant dummy FSM used for the snapshot state command was always failing this validation and the command no longer worked. This change adds the required parameter to pass validation and therefore makes the CLI command functional again.	2024-01-05 16:00:24 +00:00
Marvin Chin	be8575a8a2	Fix server shutdown not waiting for worker run completion (#19560 ) * Move group into a separate helper module for reuse * Add shutdownCh to worker The shutdown channel is used to signal that worker has stopped. * Make server shutdown block on workers' shutdownCh * Fix waiting for eval broker state change blocking indefinitely There was a race condition in the GenericNotifier between the Run and WaitForChange functions, where WaitForChange blocks trying to write to a full unsubscribeCh, but the Run function never reads from the unsubscribeCh as it has already stopped. This commit fixes it by unblocking if the notifier has been stopped. * Bound the amount of time server shutdown waits on worker completion * Fix lostcancel linter error * Fix worker test using unexpected worker constructor * Add changelog --------- Co-authored-by: Marvin Chin <marvinchin@users.noreply.github.com>	2024-01-05 08:45:07 -06:00
James Rasell	5a00440b06	api: Fix operator snapshot API streaming. (#19608 )	2024-01-05 14:33:39 +00:00
dependabot[bot]	37af843b01	chore(deps): bump github.com/opencontainers/runc from 1.1.8 to 1.1.10 (#19289 )	2024-01-05 09:57:54 +00:00
dependabot[bot]	c2e6d8aee2	build(deps): bump github.com/containerd/containerd from 1.6.18 to 1.6.26 (#19531 )	2024-01-05 09:29:14 +00:00
James Rasell	f3ed406b0f	state: ensure the job submission table is persisted and restored. (#19605 )	2024-01-05 08:12:27 +00:00
James Rasell	2abbd7e485	cli: fix operator snapshot save help output examples. (#19606 )	2024-01-05 07:43:12 +00:00
Phil Renaud	a5881963dd	Error message typo fix: Filed to Failed (#19611 )	2024-01-04 21:56:23 -05:00
Phil Renaud	16876697a1	[ui] Adds group-name tooltips to deploying and steady-state job panels (#19601 ) * Adds group-name tooltips to deploying and steady-state job panels * Default tooltip text for mirage edge cases	2024-01-04 13:10:37 -05:00
Phil Renaud	75b830ef04	[ui] Changelog for multi-line variables (#19600 ) * Changelog for multi-line variables * Multi-entry changelog	2024-01-04 12:00:50 -05:00
Seth Hoenig	4b3ee77d6b	docs: update raw_exec driver docs and 1.7 upgrade notes (#19598 )	2024-01-04 08:26:46 -06:00
Seth Hoenig	ccfb13a72d	e2e: add test for raw_exec memory_max configuration (#19596 ) * e2e: add test for raw_exec memory_max configuration * docs: note raw_exec supports memory_max in resources documentation	2024-01-04 08:25:56 -06:00
Piotr Kazmierczak	aa197cf824	e2e: pass Nomad address to Consul WI test (#19603 )	2024-01-04 08:52:39 +01:00
Phil Renaud	89cceebb91	[ui] Multi-line variable values and helios upgrades generally (#19544 ) * Multi-line variable values and helios upgrades generally * Variables page titles and actions restyle * Hacky fix to keyboard shortcut otherwise bumping space on shift * Related entities heliosified * Namespace and path fields heliosed * Paths table heliosified * Variable view table * Fixups after design discussion * Monospaced editing * De-commented template placeholder * Acceptance tests updated for helios components across variables * Tests helios'd in variable-form-test * PR suggestions	2024-01-03 15:54:22 -05:00
Marvin Chin	d75293d2ab	Add OOM detection for exec driver (#19563 ) * Add OomKilled field to executor proto format * Teach linux executor to detect and report OOMs * Teach exec driver to propagate OOMKill information * Fix data race * use tail /dev/zero to create oom condition * use new test framework * minor tweaks to executor test * add cl entry * remove type conversion --------- Co-authored-by: Marvin Chin <marvinchin@users.noreply.github.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-01-03 09:50:27 -06:00
Tim Gross	f2630add91	acl: remove timestamps from `WhoAmI` response (#19578 ) In Nomad 1.7 we updated our JWT library to go-jose, but this changed the wire format of the embedded struct we have in the `IdentityClaims` struct that we return as part of the `WhoAmI` RPC response. This wasn't originally intended to be sent over the wire but other changes in Nomad 1.5+ added a caller to the client. The library change causes a deserialization error on Nomad 1.5 and 1.6 clients, which prevents access to Nomad Variables and SD via template blocks. Removed the incompatible fields from the response, which are unused by any current caller. In a future version of Nomad, we'll likely remove the `WhoAmI` callers from the client in lieu of using the public keys the clients have to check auth. Fixes: https://github.com/hashicorp/nomad/issues/19555	2024-01-03 08:24:38 -05:00
James Rasell	91cba75f5c	copywrite: fix and add copywrite config enterprise comments. (#19590 ) Nomad CI checks for copywrite headers using multiple config files for specific exemption paths. This means the top-level config file does not take effect when running the copywrite script within these sub-folders. Exempt files therefore need to be added to the sub-config files, along with the top level.	2024-01-03 08:58:53 +00:00
Piotr Kazmierczak	a87aa71f55	e2e: fix typo in Consul e2e (#19589 )	2024-01-03 09:34:38 +01:00
Tim Gross	e7ca2b51ad	vault: ignore `allow_unauthenticated` config if identity is set (#19585 ) When the server's `vault` block has a default identity, we don't check the user's Vault token (and in fact, we warn them on job submit if they've provided one). But the validation hook still checks for a token if `allow_unauthenticated` is set to true. This is a misconfiguration but there's no reason for Nomad not to do the expected thing here. Fixes: https://github.com/hashicorp/nomad/issues/19565	2024-01-02 16:46:34 -05:00
Luiz Aoqui	cd8a03431c	docs: add `scale_in_protection` to AWS Autoscaler (#19546 ) Document new `scale_in_protection` configuration of the AWS ASG Autoscaler target plugin.	2024-01-02 14:48:56 -05:00
Luiz Aoqui	0bef6f05a2	docs: add note about `` namespace on autoscaling (#19547 ) Explain the behaviour when the wildcard namespace value `` is used to configure the Nomad Autoscaler agent.	2024-01-02 14:48:20 -05:00

1 2 3 4 5 ...

25539 Commits