Commit Graph

25565 Commits

Author SHA1 Message Date
Luiz Aoqui
41277f823f license: fix some imports of BUSL-1.1 in MPL-2.0 (#19832)
Some packages licensed under MPL-2.0 were incorrectly importing code
from packages licensed under BUSL-1.1.

Not all imports are fixed here as they will require additional work to
untangle them. To help track progress this commit adds a Semgrep rule
that detects incorrect BUSL-1.1 imports in MPL-2.0 packages.
2024-01-29 12:04:12 -05:00
James Rasell
10324566ae driver/rawexec: populate OOM killed exit result. (#19829) 2024-01-29 08:54:52 +00:00
James Rasell
8d6067e987 driver/qemu: populate OOM killed exit result. (#19830) 2024-01-29 07:34:27 +00:00
James Rasell
34fe96a420 driver/java: populate OOM killed exit result. (#19818) 2024-01-26 08:09:16 +00:00
James Rasell
9e6f12ef2d stream: remove unused internal error definition from event stream. (#19819) 2024-01-26 07:52:53 +00:00
Michael Schurter
a283a41613 docs: mention wildcards in namespace api docs (#19809)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2024-01-24 11:52:28 -08:00
Michael Schurter
8f564182ef connect: rewrite envoy bootstrap on every restart (#19787)
Fixes #19781

Do not mark the envoy bootstrap hook as done after successfully running once.
Since the bootstrap file is written to /secrets, which is a tmpfs on supported
platforms, it is not persisted across reboots. This causes the task and
allocation to fail on reboot (see #19781).

This fixes it by *always* rewriting the envoy bootstrap file every time the
Nomad agent starts. This does mean we may write a new bootstrap file to an
already running Envoy task, but in my testing that doesn't have any impact.

This commit doesn't necessarily fix every use of Done by hooks, but hopefully
improves the situation. The comment on Done has been expanded to hopefully
avoid misuse in the future.

Done assertions were removed from tests as they add more noise than value.

*Alternative 1: Use a regular file*

An alternative approach would be to write the bootstrap file somewhere
other than the tmpfs, but this is *unsafe* as when Consul ACLs are
enabled the file will contain a secret token:
https://developer.hashicorp.com/consul/commands/connect/envoy#bootstrap

*Alternative 2: Detect if file is already written*

An alternative approach would be to detect if the bootstrap file exists,
and only write it if it doesn't.

This is just a more complicated form of the current fix. I think in
general in the absence of other factors task hooks should be idempotent
and therefore able to rerun on any agent startup. This simplifies the
code and our ability to reason about task restarts vs agent restarts vs
node reboots by making them all take the same code path.
2024-01-24 11:26:31 -08:00
Piotr Kazmierczak
543ba16e61 e2e: more retries for RequireConsulDeregistered (#19801) 2024-01-22 20:11:48 +01:00
Luiz Aoqui
b7fa4447bd docs: autoscaler config for blocking query timeout (#19777) 2024-01-22 13:08:10 -05:00
Piotr Kazmierczak
8a4bd61caf e2e: WaitForJobStopped correction (#19749) 2024-01-22 11:38:22 +01:00
dependabot[bot]
af2cdc98a5 chore(deps): bump golang.org/x/sync from 0.4.0 to 0.6.0 (#19792) 2024-01-22 07:32:21 +00:00
Adrian Todorov
044eb0e048 docs: warnings about template dependencies, HCL2 clarifications (#19779) 2024-01-19 14:07:15 -05:00
Luiz Aoqui
fce30f342c docs: add lock_namespace autoscaler config (#19769)
Document the `high_availability.lock_namespace` configuration of the
Nomad Autoscaler.
2024-01-18 11:52:14 -05:00
Vijesh
3b4afea974 docs: note script checks don't support some Consul options (#19770)
Script checks don't support Consul's `success_before_passing`, `failures_before_critical`, or `failures_before_warning` because they're run by Nomad and not by Consul
2024-01-18 08:38:57 -05:00
Piotr Kazmierczak
8f99ba6b2c docs: add missing JWT auth method API documentation (#19757) 2024-01-17 16:03:08 +01:00
Tom Davies
5a11a28cac docs: updates link to Consul WLI migration docs (#19748) 2024-01-17 09:57:02 -05:00
Piotr Kazmierczak
11ca21ca3c cli: correct typos in setup consul (#19754) 2024-01-17 14:13:07 +01:00
James Rasell
41555b6370 cli: Fix minor help formatting issue in agent command. (#19743) 2024-01-17 12:18:00 +00:00
Mike Nomitch
bc039a7a8a Adds Namespace UI to Access Control (#19402)
Adds Namespace UI to Access Control - Also adds two step buttons to other Access Control pages

---------

Co-authored-by: Phil Renaud <phil@riotindustries.com>
2024-01-16 09:20:50 -08:00
Luiz Aoqui
c0cfeb3ecd Merge pull request #19746 from hashicorp/post-1.7.3-release
Post 1.7.3 release
2024-01-16 10:59:02 -05:00
Luiz Aoqui
051202087b Merge release 1.7.3 files 2024-01-15 16:00:12 -05:00
hc-github-team-nomad-core
ca93483626 Prepare for next release 2024-01-15 15:58:41 -05:00
hc-github-team-nomad-core
ddfc157c0a Generate files for 1.7.3 release 2024-01-15 15:58:41 -05:00
Piotr Kazmierczak
8226a85263 e2e: remove deprecated template_file dependency for tf (#19313)
This also allows running tf for our e2e suite locally on darwin.
2024-01-15 18:42:28 +01:00
Piotr Kazmierczak
609f3a60b5 e2e: purging jobs removes all allocs (#19744)
There's no need to wait for allocs since #19609, in fact waiting for allocs to
stop will always fail leading to e2e failures.
2024-01-15 17:54:35 +01:00
dependabot[bot]
d62280941d chore(deps): bump github.com/hashicorp/go-immutable-radix/v2 (#19734) 2024-01-15 10:27:31 +00:00
dependabot[bot]
40bbddf3d8 chore(deps): bump github.com/prometheus/client_golang (#19733) 2024-01-15 08:24:43 +00:00
Luiz Aoqui
e1e80f383e vault: add new nomad setup vault -check commmand (#19720)
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
2024-01-12 15:48:30 -05:00
Seth Hoenig
5b7f4746ce client/allocdir: use an interface in place of AllocDir structs (#19703)
* client/allocdir: use an interface in place of AllocDir structs

This PR replace *allocdir.AllocDir with allocdir.Interface such that we
may eventually have another implementation of alloc directories. This is
in support of the exec2 driver, which will need an implementation of the
alloc directory incompatibile with the current version.

* use rlock
2024-01-12 14:13:29 -06:00
Piotr Kazmierczak
858a805d7d e2e: add a note about provisioning the infrastructure on macOS/Apple Silicon (#19727) 2024-01-12 14:09:50 +01:00
Piotr Kazmierczak
5d12ca4f57 state store: better handling of job deletion (#19609)
When jobs are deleted with -purge, all their deployments and allocations should
be deleted from the state store, and the evals status should be set to complete.
Otherwise we end up in a situation where users could re-submit previously
failing jobs, but these new jobs would not get deployments allocated unless
system gc got called.
2024-01-12 10:08:55 +01:00
Luiz Aoqui
b2aa6ffd05 docs: fix Consul ACL requirements (#19721)
Even with the new workload identitiy based flow the Nomad servers still
need the `acl = "write"` permission in order to revoke service identity
tokens.
2024-01-11 15:52:23 -05:00
Seth Hoenig
a58f0eca8e e2e: move rawexec oversub tests into oversubscription e2e test suite (#19717)
* e2e: move rawexec oversub tests into oversubscription e2e test suite

This PR moves two tests for raw_exec and memory oversubscription into
the oversubscription test suite, which has the necessary plumbing to
activate and restore the oversubscription configuration of the scheduler
during the test.

* cr: rename files for better readability
2024-01-11 14:27:05 -06:00
Luiz Aoqui
8d0a469000 vault: remove revoked Vault accessors from state (#19706)
When using the no-op Vault client the Nomad server still needs to delete
the revoked Vault accessors from state to prevent them from lingering
forever after the cluster migrates to the workload identity flow.
2024-01-11 14:38:51 -05:00
Seth Hoenig
aad932eeee build: update to go1.21.6 (#19709) 2024-01-11 09:48:56 -06:00
Tim Gross
4c206d0b19 docs: changelog entry for ENT PR (#19705)
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1370
2024-01-11 10:36:08 -05:00
Seth Hoenig
0c08f94c8e build: use setup-golang@v3 to handle auto caching (#19707)
* wip: try on branch

* build: use setup-golang@v3 to handle auto caching
2024-01-11 08:51:56 -06:00
Seth Hoenig
9410c519ff drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration (#19599)
* drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration

* fix tests
2024-01-11 08:20:15 -06:00
Tim Gross
1254468600 consul: refactor job mutation hook (#19699)
The job mutation logic for Nomad CE and Nomad ENT are nearly identical except
for a prelude that grabs the correct default cluster. Factor this out into a
method that can be shared between both code bases.
2024-01-10 16:29:05 -05:00
CJ
c9cd8480fa docs: considerations for Stateful Workloads (#19077)
Co-authored-by: Adrian Todorov <adrian.todorov@hashicorp.com>
2024-01-10 16:06:45 -05:00
Piotr Kazmierczak
930339a0fa e2e: remove broken Consul WI test (#19697) 2024-01-10 21:31:18 +01:00
Tim Gross
0935f443dc vault: support allowing tokens to expire without refresh (#19691)
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.

When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.

Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.

Fixes: https://github.com/hashicorp/nomad/issues/8690
2024-01-10 14:49:02 -05:00
Luiz Aoqui
5267eec3ad vault: fix token revocation during workflow migration (#19689)
When transitioning from the legacy token-based workflow to the new JWT
workflow for Vault the previous code would instantiate a no-op Vault if
the server configuration had a `default_identity` block.

This no-op client returned an error for some of its operations were
called, such as `LookupToken` and `RevokeTokens`. The original intention
was that, in the new JWT workflow, none of these methods should be
called, so returning an error could help surface potential bugs.

But the `RevokeTokens` and `MarkForRevocation` methods _are_ called even
in the JWT flow. When a leadership transition happens, the new server
looks for unused Vault accessors from state and tries to revoke them.
Similarly, the `RevokeTokens` method is called every time the
`Node.UpdataStatus` and `Node.UpdateAlloc` RPCs are made by clients, as
the Nomad server tries to find unused Vault tokens for the node/alloc.

Since the new JWT flow does not require Nomad servers to contact Vault,
calling `RevokeTokens` and `MarkForRevocation` is not able to complete
without a Vault token, so this commit changes the logic to use the no-op
Vault client when no token is configured. It also updates the client
itself to not error if these methods are called, but to rather just log
so operators can be made aware that there are Vault tokens created by
Nomad that have not been force-expired.

When migrating an existing cluster to the new workload identity based
flow, Nomad operators must first upgrade the Nomad version without
removing any of the existing Vault configuration. Doing so can prevent
Nomad servers from managing and cleaning-up existing Vault tokens during
a leadership transition and node or alloc updates.

Operators must also resubmit all jobs with a `vault` block so they are
updated with an `identity` for Vault. Skipping this step may cause
allocations to fail if their Vault token expires (if, for example, the
Nomad client stops running for TTL/2) or if they are rescheduled, since
the new client will try to follow the legacy flow which will fail if the
Nomad server configuration for Vault has already been updated to remove
the Vault address and token.
2024-01-10 13:28:46 -05:00
Tim Gross
d3e5cae1eb consul: support admin partitions (#19665)
Add support for Consul Enterprise admin partitions. We added fingerprinting in
https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition`
field. The expectation is that most users will create a mapping of Nomad node
pool to Consul admin partition. But we'll also create an implicit constraint for
the fingerprinted value.

Fixes: https://github.com/hashicorp/nomad/issues/13139
2024-01-10 10:41:29 -05:00
Daniel Peinhopf
9eb357020d Docs: Alternative IIS Task Driver (#19411) 2024-01-10 14:14:30 +00:00
Seth Hoenig
cb7d078c1d drivers/raw_exec: enable configuring raw_exec task to have no memory limit (#19670)
* drivers/raw_exec: enable configuring raw_exec task to have no memory limit

This PR makes it possible to configure a raw_exec task to not have an
upper memory limit, which is how the driver would behave pre-1.7.

This is done by setting memory_max = -1. The cluster (or node pool) must
have memory oversubscription enabled.

* cl: add cl
2024-01-09 14:57:13 -06:00
Egor Mikhailov
18f49e015f auth: add new optional OIDCDisableUserInfo setting for OIDC auth provider (#19566)
Add new optional `OIDCDisableUserInfo` setting for OIDC auth provider which
disables a request to the identity provider to get OIDC UserInfo.

This option is helpful when your identity provider doesn't send any additional
claims from the UserInfo endpoint, such as Microsoft AD FS OIDC Provider:

> The AD FS UserInfo endpoint always returns the subject claim as specified in the
> OpenID standards. AD FS doesn't support additional claims requested via the
> UserInfo endpoint

Fixes #19318
2024-01-09 13:41:46 -05:00
Tim Gross
c875f3e49a docs: expand docs on implicit ACL capabilities grants (#19681)
An audit of Nomad's ACLs resulted in some confusion around whether the
`NamespaceValidator` method is conjunctive ("add", as implied by the docs) or
disjunctive ("or", as it is by design). Clarify the ACL documentation as
follows:

* Call out where fine-grained capabilities imply grants to other
  capabilities (for example, that `csi-read-volume` grants `csi-list-volume`).
* Fix an incorrectly documented ACL requirement for the CSI List External
  Volumes API.
* Clarify how ACLs are expected to work for the two search API endpoints, such
  that you need list/read access to the objects in the search context.
2024-01-09 13:25:05 -05:00
James Rasell
a3a03dff78 acl: ensure auth method configs are correctly and fully hashed. (#19677) 2024-01-09 14:03:26 +00:00
dependabot[bot]
f3bc9c7c41 chore(deps): bump github.com/docker/docker (#19672) 2024-01-09 08:24:20 +00:00