Commit Graph

25405 Commits

Author SHA1 Message Date
Tim Gross
3c4e2009f5 connect: deployments should wait for Connect sidecar checks (#19334)
When a Connect service is registered with Consul, Nomad includes the nested
`Connect.SidecarService` field that includes health checks for the Envoy
proxy. Because these are not part of the job spec, the alloc health tracker
created by `health_hook` doesn't know to read the value of these checks.

In many circumstances this won't be noticed, but if the Envoy health check
happens to take longer than the `update.min_healthy_time` (perhaps because it's
been set low), it's possible for a deployment to progress too early such that
there will briefly be no healthy instances of the service available in Consul.

Update the Consul service client to find the nested sidecar service in the
service catalog and attach it to the results provided to the tracker. The
tracker can then check the sidecar health checks.

Fixes: https://github.com/hashicorp/nomad/issues/19269
2023-12-06 16:59:51 -05:00
Tim Gross
340c9ebd47 E2E: extend timeout on CSI snapshot test (#19338)
The EBS snapshot operation can take a long time to complete. Recent runs have
shown we sometimes get up to the 10s timeout on the context we're giving the CLI
command. Extend this so that we're not getting spurious timeouts.

Fixes: https://github.com/hashicorp/nomad/issues/19118
2023-12-06 16:34:54 -05:00
Daniel Bennett
36f69a8e88 e2e: more occasionally slow exec tasks (#19337) 2023-12-06 15:22:15 -06:00
Daniel Bennett
9fe1f0aadc e2e: fix ConsulNamespaces tests (#19325)
* cleanup consul tokens by accessor id
rather than secret id, which has been failing for some time with:
> 404 (Cannot find token to delete)

* expect subset of consul namespaces
the consul test cluster may have namespaces from other unrelated tests
2023-12-06 12:21:27 -06:00
Juana De La Cuesta
cf539c405e Add a new parameter to avoid starting a replacement for lost allocs (#19101)
This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run.

In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned.
Implements issue #10366

Co-authored-by: Dom Lavery <dom@circleci.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-12-06 12:28:42 +01:00
Michael Schurter
b0e55b4ba6 Merge pull request #19320 from hashicorp/go1.21.5
Update to Go 1.21.5
2023-12-05 11:48:13 -08:00
Michael Schurter
f97806c5ea cl 2023-12-05 11:27:02 -08:00
Michael Schurter
7ef5c9e906 Update to Go 1.21.5 2023-12-05 11:23:31 -08:00
Seth Hoenig
87e7bf4ab2 e2e: skip connect test that does a restart of nomad agent (#19316) 2023-12-05 09:15:09 -06:00
Seth Hoenig
35ccb7ecdb e2e: use correct url to download zip file from go-getter repository (#19315) 2023-12-05 09:11:08 -06:00
Seth Hoenig
cc65f39c82 e2e/v3: dump eval if detected as cancelled (#19310) 2023-12-05 09:07:12 -06:00
Daniel Bennett
c7d01705f5 e2e: push nomad token to servers (#19312)
so humans with root shell access can use it to debug

not ideal security, but this is a short-lived test cluster
2023-12-05 08:54:57 -06:00
Phil Renaud
c381781b42 [ui] Helios upgraded to 3.3.0 (#19247) 2023-12-05 09:25:28 -05:00
Tim Gross
1e51379e56 docs: clarify behavior and recommendations for mTLS vs TLS for HTTP (#19282)
Some of our documentation on `tls` configuration could be more clear as to
whether we're referring to mTLS or TLS. Also, when ACLs are enabled it's fine to
have `verify_https_client=false` (the default). Make it clear that this is an
acceptably secure configuration and that it's in fact recommended in order to
avoid pain of distributing client certs to user browsers.
2023-12-04 15:03:43 -05:00
Phil Renaud
646445d4ac [ui] example job with actions (#19153)
* An example job with a few interesting actions

* A pretty different example job

* Tests updated with const'd number of default templates

* Removed default jobspec params and formatted
2023-12-04 13:40:00 -05:00
Seth Hoenig
6779d7c7b4 e2e: add a ShowState() option to cluster3.Establish options (#19303)
This will dump much of the interesting parts of cluster state, including
available nodes and their status, existing allocations and their status,
and existing evaluations and their status.
2023-12-04 12:37:21 -06:00
Tim Gross
37df614da6 docs: fix recommended binding rules for Consul integration (#19299)
Fixes some errors in the documentation for the Consul integration, based on
tests locally without using the `nomad setup consul` command and updating the
docs to match.

* Consul CE doesn't support the `-namespace-rule-bind-namespace` option.
* The binding rule for services should not including the Nomad namespace in the
  `bind-name` parameter (the service is registered in the appropriate Consul
  namespace).
* The role for tasks should include the suffix "-tasks" in the name to match the
  binding rule we create.
* Fix the Consul bound audiences to be a list of strings
* Fix some quoting issues in the commands.
2023-12-04 11:56:03 -05:00
Piotr Kazmierczak
0a783d0046 wi: change setup cmds -cleanup flag to -destroy (#19295) 2023-12-04 15:28:17 +01:00
Piotr Kazmierczak
9d209d6725 vault: claims for WI workloads should not contain nomad_group (#19296) 2023-12-04 15:25:22 +01:00
Piotr Kazmierczak
0ff190fa38 docs: setup helpers documentation (#19267) 2023-12-04 09:59:07 +01:00
James Rasell
d041ddc4ee docs: fix up HCL formatting on agent config examples. (#19254) 2023-12-04 08:44:00 +00:00
Daniel Bennett
d34788896f e2e: jobs3-submitted jobs automatically cleanup (#19284)
so that cleanup occurs even if the job fails to run
(unless configured not to)
2023-12-01 15:57:23 -06:00
Luiz Aoqui
125dd4af38 docs: small updates to agent consul (#19285) 2023-12-01 16:40:06 -05:00
Daniel Bennett
bfb2263f30 e2e: give isolation test jobs more time to start (#19276) 2023-12-01 14:03:40 -06:00
Seth Hoenig
b83c1e14c1 docs: fix documentation of client.reserved.cores (#19266) 2023-12-01 13:06:55 -06:00
Tim Gross
d2518b1c3a docs: changelog entry for bugfix introduced in #18754 (#19275)
In #18754 we accidentally fixed a bug that prevented poststop tasks from getting
access to Variables. This was fixed in the 1.6.x branch in #19270, at which
point we discovered the fix had been done in main already as part of the auth
refactor. Add a changelog entry for it.
2023-12-01 13:55:09 -05:00
Tim Gross
0bc2ea8d98 client version constraints for implicit identities for WI (#18932)
Clients prior to Nomad 1.7 cannot support the new workload identity-based
authentication to Consul and Vault. Add an implicit Nomad version constraint on
job submission for task groups that use the new workflow.

Includes a constraint test showing same-version prelease handling.
2023-12-01 13:51:21 -05:00
Tim Gross
2ba459c73a docs: split consul config params into client vs server sections (#19258)
Some sections of the `consul` configuration are relevant only for clients or
servers. We updated our Vault docs to split these parameters out into their own
sections for clarity. Match that for the Consul docs.
2023-12-01 13:37:39 -05:00
Tim Gross
5c9a851f5f vault: fix legacy token workflow for poststop tasks (#19268)
The new Workload Identity workflow for Vault tokens correctly handles post-stop
tasks, however the legacy workflow does not. Attempts to get a Vault token are
rejected if the allocation is server-terminal or client-terminal, but we should
be waiting until the allocation is client-terminal (only) so that poststop tasks
get a chance to get Vault tokens too.

Fixes: https://github.com/hashicorp/nomad/issues/16886
2023-12-01 13:25:43 -05:00
Seth Hoenig
5b3416bd97 e2e: set e2e/v3 debug logging on metrics test (#19263) 2023-12-01 10:03:55 -06:00
Phil Renaud
a35acdb84e Title bar job start button now observes job submission variables data (#19220) 2023-12-01 10:57:30 -05:00
Adrian Todorov
af71f4a55a Clarify docs around CSI volume context updates (#19216)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-12-01 15:19:04 +00:00
Luiz Aoqui
d12dc36c3b cli: add Consul namespace selector (#19251)
Update the `nomad setup consul` command to include a `Selector` for the
`NamespaceRule` so the logic is only applied when the token has a claim
for `consul_namespace`.

Jobs without an explicit `consul.namespace` value receive a JWT without
the `consul_namespace` claim because Nomad is unable to determine which
Consul namespace should be used.

By using `NamespaceRules`, cluster operators are able to set a default
value for these jobs.
2023-12-01 09:29:08 -05:00
Tim Gross
05fe2ad191 E2E: fix assertion in CT native service lookup test (#19249)
When porting the `ConsulTemplate` test, I made a last-minute refactor to the
assertions for waiting on files, and accidentally inverted the test assertion in
the process.

Also, when running `jobs3.Submit` you need to include the `Namespace` option so
that the cleanup function that gets return deletes the job from the correct
namespace. This was causing the namespace cleanup to fail because the job
deletion had failed.
2023-12-01 08:54:55 -05:00
Michael Schurter
4cb40433bb Post 1.7.0 rc.1 release (#19252)
* Prepare release 1.7.0-rc.1

* Generate files for 1.7.0-rc.1 release

* Prepare for next release
2023-12-01 08:53:48 -05:00
Phil Renaud
d104432cd3 Actions: API, command, and jobspec docs (#19166)
* API command and jobspec docs

* PR comments addressed

* API docs for job/jobid/action socket

* Removing a perhaps incorrect origin of job_id across the jobs api doc

* PR comments addressed
2023-11-30 14:13:37 -05:00
Piotr Kazmierczak
67bbcc4a4f cli: setup consul proper ns handling (#19237)
In order to correctly handle Consul namespaces, auth methods and binding rules
must always be created in the default namespace only.

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-11-30 20:09:19 +01:00
Piotr Kazmierczak
e57dcdf106 docs: adjust claim mappings for Consul auth method (#19244) 2023-11-30 20:01:18 +01:00
James Rasell
81249ffe65 agent: log using error keyword not err in keyring endpoint (#19243) 2023-11-30 16:40:13 +00:00
Daniel Bennett
639c3f53c9 e2e: give node drain KillTimeout test more time (#19226)
and error more verbosely if it fails

also, add extra information to a failed evaluation
for more error visibility in other tests

---------

Co-authored-by: Juanadelacuesta <juanita.delacuestamorales@hashicorp.com>
2023-11-30 10:37:20 -06:00
Tim Gross
13eda8bfdd consul: respect task-level namespace when checking permissions (#19236)
In the legacy Consul token workflow, we check the user's token's permissions in
Consul at the time of job submit. The new task-level `consul` block was not
being respected when checking the list of namespaces.
2023-11-30 11:14:12 -05:00
Tim Gross
79c74bf125 service hook: get correct NS for task-level consul (#19242)
Ensure that the `ServiceProviderNamespace` correctly picks the task-level
`consul.namespace` and falls back to the group if set.
2023-11-30 11:13:47 -05:00
Tim Gross
ae403dcb4b script_check_hook: handle task-level Consul namespace (#19241)
The `script_check_hook` runs at the task level but can create script checks for
both task-level services and group-level services. Now that we allow the Consul
namespace to be set at the task-level `consul.namespace`, we need to have both
possible namespaces handy when creating and updating checks.
2023-11-30 11:13:30 -05:00
Luiz Aoqui
1a2d41d30b consul: refactor allocrunner consul hook (#19229)
Refactor the JWT token derivation logic to only take a single request
since it was only ever called with a map of length one.

The original implementation received multiple requets to match the
legacy flow, but but legacy flow requests were batched from the Nomad
client to the server, which doesn't happen for JWT. Each JWT request
goes directly from the Nomad client to the Consul agent, so there is no
batching involved.
2023-11-30 10:55:03 -05:00
Luiz Aoqui
e741e93304 identity: add Consul and Vault namespace claims (#19228)
Token claims are used in several dynamic configuration in Consul and
Vault, such as Consul ACL bind and namespace rules, and Vault templated
policies.

Adding a claim for the Consul and Vault namespace defined for the
service or task allows cluster operators to create more flexible and
precise rules.

The `consul_namespace` claim is added to workload identities for Consul
services and to task workload identities that have the `consul_` name
prefix and are affected by a task or group `consul` block.

The `vault_namespace` claim is added to task workload identities that
have the `vault_` name prefix and are affected by a `vault` block.
2023-11-30 10:41:32 -05:00
Phil Renaud
7ab7edf9cd [ui] Display job plan warnings alongside dry run info when attempting to run a job through the web UI (#19225)
* init

* Warnings shown at plan stage

* testfixes for new hds class

* New tests for warning block presence
2023-11-30 10:41:23 -05:00
Seth Hoenig
5f3aae7340 website: fix spellcheck path and cleanup some misspellings (#19238) 2023-11-30 09:38:19 -06:00
Piotr Kazmierczak
d699b82df6 docs: update consul-integration to include ns changes (#19239)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-11-30 16:37:48 +01:00
Luiz Aoqui
59aa860c13 scheduler: fix task-level consul diff (#19230)
Fix `tasksUpdated()` to compare the task level `consul` blocks instead
of the group.
2023-11-30 10:13:17 -05:00
Luiz Aoqui
969cdb0f46 test: add consul namespace rules to consulcompat (#19227)
When configuring Consul for multi-namespace support, the JWT auth method
needs to specify namespace rules. This attribute is set to `nil` in CE
but is used in Nomad ENT.
2023-11-30 10:13:08 -05:00