nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Luiz Aoqui	41277f823f	license: fix some imports of BUSL-1.1 in MPL-2.0 (#19832 ) Some packages licensed under MPL-2.0 were incorrectly importing code from packages licensed under BUSL-1.1. Not all imports are fixed here as they will require additional work to untangle them. To help track progress this commit adds a Semgrep rule that detects incorrect BUSL-1.1 imports in MPL-2.0 packages.	2024-01-29 12:04:12 -05:00
James Rasell	10324566ae	driver/rawexec: populate OOM killed exit result. (#19829 )	2024-01-29 08:54:52 +00:00
James Rasell	8d6067e987	driver/qemu: populate OOM killed exit result. (#19830 )	2024-01-29 07:34:27 +00:00
James Rasell	34fe96a420	driver/java: populate OOM killed exit result. (#19818 )	2024-01-26 08:09:16 +00:00
James Rasell	9e6f12ef2d	stream: remove unused internal error definition from event stream. (#19819 )	2024-01-26 07:52:53 +00:00
Michael Schurter	a283a41613	docs: mention wildcards in namespace api docs (#19809 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2024-01-24 11:52:28 -08:00
Michael Schurter	8f564182ef	connect: rewrite envoy bootstrap on every restart (#19787 ) Fixes #19781 Do not mark the envoy bootstrap hook as done after successfully running once. Since the bootstrap file is written to /secrets, which is a tmpfs on supported platforms, it is not persisted across reboots. This causes the task and allocation to fail on reboot (see #19781). This fixes it by always rewriting the envoy bootstrap file every time the Nomad agent starts. This does mean we may write a new bootstrap file to an already running Envoy task, but in my testing that doesn't have any impact. This commit doesn't necessarily fix every use of Done by hooks, but hopefully improves the situation. The comment on Done has been expanded to hopefully avoid misuse in the future. Done assertions were removed from tests as they add more noise than value. Alternative 1: Use a regular file An alternative approach would be to write the bootstrap file somewhere other than the tmpfs, but this is unsafe as when Consul ACLs are enabled the file will contain a secret token: https://developer.hashicorp.com/consul/commands/connect/envoy#bootstrap Alternative 2: Detect if file is already written An alternative approach would be to detect if the bootstrap file exists, and only write it if it doesn't. This is just a more complicated form of the current fix. I think in general in the absence of other factors task hooks should be idempotent and therefore able to rerun on any agent startup. This simplifies the code and our ability to reason about task restarts vs agent restarts vs node reboots by making them all take the same code path.	2024-01-24 11:26:31 -08:00
Piotr Kazmierczak	543ba16e61	e2e: more retries for RequireConsulDeregistered (#19801 )	2024-01-22 20:11:48 +01:00
Luiz Aoqui	b7fa4447bd	docs: autoscaler config for blocking query timeout (#19777 )	2024-01-22 13:08:10 -05:00
Piotr Kazmierczak	8a4bd61caf	e2e: WaitForJobStopped correction (#19749 )	2024-01-22 11:38:22 +01:00
dependabot[bot]	af2cdc98a5	chore(deps): bump golang.org/x/sync from 0.4.0 to 0.6.0 (#19792 )	2024-01-22 07:32:21 +00:00
Adrian Todorov	044eb0e048	docs: warnings about template dependencies, HCL2 clarifications (#19779 )	2024-01-19 14:07:15 -05:00
Luiz Aoqui	fce30f342c	docs: add `lock_namespace` autoscaler config (#19769 ) Document the `high_availability.lock_namespace` configuration of the Nomad Autoscaler.	2024-01-18 11:52:14 -05:00
Vijesh	3b4afea974	docs: note script checks don't support some Consul options (#19770 ) Script checks don't support Consul's `success_before_passing`, `failures_before_critical`, or `failures_before_warning` because they're run by Nomad and not by Consul	2024-01-18 08:38:57 -05:00
Piotr Kazmierczak	8f99ba6b2c	docs: add missing JWT auth method API documentation (#19757 )	2024-01-17 16:03:08 +01:00
Tom Davies	5a11a28cac	docs: updates link to Consul WLI migration docs (#19748 )	2024-01-17 09:57:02 -05:00
Piotr Kazmierczak	11ca21ca3c	cli: correct typos in setup consul (#19754 )	2024-01-17 14:13:07 +01:00
James Rasell	41555b6370	cli: Fix minor help formatting issue in agent command. (#19743 )	2024-01-17 12:18:00 +00:00
Mike Nomitch	bc039a7a8a	Adds Namespace UI to Access Control (#19402 ) Adds Namespace UI to Access Control - Also adds two step buttons to other Access Control pages --------- Co-authored-by: Phil Renaud <phil@riotindustries.com>	2024-01-16 09:20:50 -08:00
Luiz Aoqui	c0cfeb3ecd	Merge pull request #19746 from hashicorp/post-1.7.3-release Post 1.7.3 release	2024-01-16 10:59:02 -05:00
Luiz Aoqui	051202087b	Merge release 1.7.3 files	2024-01-15 16:00:12 -05:00
hc-github-team-nomad-core	ca93483626	Prepare for next release	2024-01-15 15:58:41 -05:00
hc-github-team-nomad-core	ddfc157c0a	Generate files for 1.7.3 release	2024-01-15 15:58:41 -05:00
Piotr Kazmierczak	8226a85263	e2e: remove deprecated template_file dependency for tf (#19313 ) This also allows running tf for our e2e suite locally on darwin.	2024-01-15 18:42:28 +01:00
Piotr Kazmierczak	609f3a60b5	e2e: purging jobs removes all allocs (#19744 ) There's no need to wait for allocs since #19609, in fact waiting for allocs to stop will always fail leading to e2e failures.	2024-01-15 17:54:35 +01:00
dependabot[bot]	d62280941d	chore(deps): bump github.com/hashicorp/go-immutable-radix/v2 (#19734 )	2024-01-15 10:27:31 +00:00
dependabot[bot]	40bbddf3d8	chore(deps): bump github.com/prometheus/client_golang (#19733 )	2024-01-15 08:24:43 +00:00
Luiz Aoqui	e1e80f383e	vault: add new `nomad setup vault -check` commmand (#19720 ) The new `nomad setup vault -check` commmand can be used to retrieve information about the changes required before a cluster is migrated from the deprecated legacy authentication flow with Vault to use only workload identities.	2024-01-12 15:48:30 -05:00
Seth Hoenig	5b7f4746ce	client/allocdir: use an interface in place of AllocDir structs (#19703 ) * client/allocdir: use an interface in place of AllocDir structs This PR replace allocdir.AllocDir with allocdir.Interface such that we may eventually have another implementation of alloc directories. This is in support of the exec2 driver, which will need an implementation of the alloc directory incompatibile with the current version. use rlock	2024-01-12 14:13:29 -06:00
Piotr Kazmierczak	858a805d7d	e2e: add a note about provisioning the infrastructure on macOS/Apple Silicon (#19727 )	2024-01-12 14:09:50 +01:00
Piotr Kazmierczak	5d12ca4f57	state store: better handling of job deletion (#19609 ) When jobs are deleted with -purge, all their deployments and allocations should be deleted from the state store, and the evals status should be set to complete. Otherwise we end up in a situation where users could re-submit previously failing jobs, but these new jobs would not get deployments allocated unless system gc got called.	2024-01-12 10:08:55 +01:00
Luiz Aoqui	b2aa6ffd05	docs: fix Consul ACL requirements (#19721 ) Even with the new workload identitiy based flow the Nomad servers still need the `acl = "write"` permission in order to revoke service identity tokens.	2024-01-11 15:52:23 -05:00
Seth Hoenig	a58f0eca8e	e2e: move rawexec oversub tests into oversubscription e2e test suite (#19717 ) * e2e: move rawexec oversub tests into oversubscription e2e test suite This PR moves two tests for raw_exec and memory oversubscription into the oversubscription test suite, which has the necessary plumbing to activate and restore the oversubscription configuration of the scheduler during the test. * cr: rename files for better readability	2024-01-11 14:27:05 -06:00
Luiz Aoqui	8d0a469000	vault: remove revoked Vault accessors from state (#19706 ) When using the no-op Vault client the Nomad server still needs to delete the revoked Vault accessors from state to prevent them from lingering forever after the cluster migrates to the workload identity flow.	2024-01-11 14:38:51 -05:00
Seth Hoenig	aad932eeee	build: update to go1.21.6 (#19709 )	2024-01-11 09:48:56 -06:00
Tim Gross	4c206d0b19	docs: changelog entry for ENT PR (#19705 ) Ref: https://github.com/hashicorp/nomad-enterprise/pull/1370	2024-01-11 10:36:08 -05:00
Seth Hoenig	0c08f94c8e	build: use setup-golang@v3 to handle auto caching (#19707 ) * wip: try on branch * build: use setup-golang@v3 to handle auto caching	2024-01-11 08:51:56 -06:00
Seth Hoenig	9410c519ff	drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration (#19599 ) * drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration * fix tests	2024-01-11 08:20:15 -06:00
Tim Gross	1254468600	consul: refactor job mutation hook (#19699 ) The job mutation logic for Nomad CE and Nomad ENT are nearly identical except for a prelude that grabs the correct default cluster. Factor this out into a method that can be shared between both code bases.	2024-01-10 16:29:05 -05:00
CJ	c9cd8480fa	docs: considerations for Stateful Workloads (#19077 ) Co-authored-by: Adrian Todorov <adrian.todorov@hashicorp.com>	2024-01-10 16:06:45 -05:00
Piotr Kazmierczak	930339a0fa	e2e: remove broken Consul WI test (#19697 )	2024-01-10 21:31:18 +01:00
Tim Gross	0935f443dc	vault: support allowing tokens to expire without refresh (#19691 ) Some users with batch workloads or short-lived prestart tasks want to derive a Vaul token, use it, and then allow it to expire without requiring a constant refresh. Add the `vault.allow_token_expiration` field, which works only with the Workload Identity workflow and not the legacy workflow. When set to true, this disables the client's renewal loop in the `vault_hook`. When Vault revokes the token lease, the token will no longer be valid. The client will also now automatically detect if the Vault auth configuration does not allow renewals and will disable the renewal loop automatically. Note this should only be used when a secret is requested from Vault once at the start of a task or in a short-lived prestart task. Long-running tasks should never set `allow_token_expiration=true` if they obtain Vault secrets via `template` blocks, as the Vault token will expire and the template runner will continue to make failing requests to Vault until the `vault_retry` attempts are exhausted. Fixes: https://github.com/hashicorp/nomad/issues/8690	2024-01-10 14:49:02 -05:00
Luiz Aoqui	5267eec3ad	vault: fix token revocation during workflow migration (#19689 ) When transitioning from the legacy token-based workflow to the new JWT workflow for Vault the previous code would instantiate a no-op Vault if the server configuration had a `default_identity` block. This no-op client returned an error for some of its operations were called, such as `LookupToken` and `RevokeTokens`. The original intention was that, in the new JWT workflow, none of these methods should be called, so returning an error could help surface potential bugs. But the `RevokeTokens` and `MarkForRevocation` methods _are_ called even in the JWT flow. When a leadership transition happens, the new server looks for unused Vault accessors from state and tries to revoke them. Similarly, the `RevokeTokens` method is called every time the `Node.UpdataStatus` and `Node.UpdateAlloc` RPCs are made by clients, as the Nomad server tries to find unused Vault tokens for the node/alloc. Since the new JWT flow does not require Nomad servers to contact Vault, calling `RevokeTokens` and `MarkForRevocation` is not able to complete without a Vault token, so this commit changes the logic to use the no-op Vault client when no token is configured. It also updates the client itself to not error if these methods are called, but to rather just log so operators can be made aware that there are Vault tokens created by Nomad that have not been force-expired. When migrating an existing cluster to the new workload identity based flow, Nomad operators must first upgrade the Nomad version without removing any of the existing Vault configuration. Doing so can prevent Nomad servers from managing and cleaning-up existing Vault tokens during a leadership transition and node or alloc updates. Operators must also resubmit all jobs with a `vault` block so they are updated with an `identity` for Vault. Skipping this step may cause allocations to fail if their Vault token expires (if, for example, the Nomad client stops running for TTL/2) or if they are rescheduled, since the new client will try to follow the legacy flow which will fail if the Nomad server configuration for Vault has already been updated to remove the Vault address and token.	2024-01-10 13:28:46 -05:00
Tim Gross	d3e5cae1eb	consul: support admin partitions (#19665 ) Add support for Consul Enterprise admin partitions. We added fingerprinting in https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition` field. The expectation is that most users will create a mapping of Nomad node pool to Consul admin partition. But we'll also create an implicit constraint for the fingerprinted value. Fixes: https://github.com/hashicorp/nomad/issues/13139	2024-01-10 10:41:29 -05:00
Daniel Peinhopf	9eb357020d	Docs: Alternative IIS Task Driver (#19411 )	2024-01-10 14:14:30 +00:00
Seth Hoenig	cb7d078c1d	drivers/raw_exec: enable configuring raw_exec task to have no memory limit (#19670 ) * drivers/raw_exec: enable configuring raw_exec task to have no memory limit This PR makes it possible to configure a raw_exec task to not have an upper memory limit, which is how the driver would behave pre-1.7. This is done by setting memory_max = -1. The cluster (or node pool) must have memory oversubscription enabled. * cl: add cl	2024-01-09 14:57:13 -06:00
Egor Mikhailov	18f49e015f	auth: add new optional `OIDCDisableUserInfo` setting for OIDC auth provider (#19566 ) Add new optional `OIDCDisableUserInfo` setting for OIDC auth provider which disables a request to the identity provider to get OIDC UserInfo. This option is helpful when your identity provider doesn't send any additional claims from the UserInfo endpoint, such as Microsoft AD FS OIDC Provider: > The AD FS UserInfo endpoint always returns the subject claim as specified in the > OpenID standards. AD FS doesn't support additional claims requested via the > UserInfo endpoint Fixes #19318	2024-01-09 13:41:46 -05:00
Tim Gross	c875f3e49a	docs: expand docs on implicit ACL capabilities grants (#19681 ) An audit of Nomad's ACLs resulted in some confusion around whether the `NamespaceValidator` method is conjunctive ("add", as implied by the docs) or disjunctive ("or", as it is by design). Clarify the ACL documentation as follows: * Call out where fine-grained capabilities imply grants to other capabilities (for example, that `csi-read-volume` grants `csi-list-volume`). * Fix an incorrectly documented ACL requirement for the CSI List External Volumes API. * Clarify how ACLs are expected to work for the two search API endpoints, such that you need list/read access to the objects in the search context.	2024-01-09 13:25:05 -05:00
James Rasell	a3a03dff78	acl: ensure auth method configs are correctly and fully hashed. (#19677 )	2024-01-09 14:03:26 +00:00
dependabot[bot]	f3bc9c7c41	chore(deps): bump github.com/docker/docker (#19672 )	2024-01-09 08:24:20 +00:00

1 2 3 4 5 ...

25565 Commits