Commit Graph

25445 Commits

Author SHA1 Message Date
Seth Hoenig
7e43317e37 core: account for linux systems with no reservable cores (#19458)
* core: account for linux systems with no reservable cores

* cl: add cl

* core: remove condition on reservable cores for legacy empty check
2023-12-13 13:06:45 -06:00
Seth Hoenig
6e4d57b330 numalib: provide a fallback for topology scanning on linux (#19457)
* numalib: provide a fallback for topology scanning on linux

* numalib: better package var names

* cl: add cl

* lint: fix my sloppy code

* cl: fixup wording
2023-12-13 13:06:30 -06:00
Piotr Kazmierczak
b6dd376100 numa: account for incorrect core number on topology.insert (#19383)
Unsupported environments like containers or guests OSs inside LXD can
incorrectly number of available cores thus leading to numalib having trouble
detecting cores and panicking. This code adds tests for linux sysfs detection
methods and fixes the panic.
2023-12-13 17:40:26 +01:00
Charlie Voiselle
d2fc7cc0c4 [docs] Note reboot to update bridge_network_hairpin_mode (#19304) 2023-12-12 19:49:15 -05:00
Luiz Aoqui
0bc822db40 vault: load default config for tasks without vault (#19439)
It is often expected that a task that needs access to Vault defines a
`vault` block to specify the Vault policy to use to derive a token.

But in some scenarios, like when the Nomad client is connected to a
local Vault agent that is responsible for authn/authz, the task is not
required to defined a `vault` block.

In these situations, the `default` Vault cluster should be used to
render the template.
2023-12-12 14:06:55 -05:00
Tim Gross
a76daf61c1 consul: fix constraints for non-default clusters (ENT) (#19449)
The Connect-related constraints injected by the Connect job mutating hook do not
account for non-default `consul` blocks (for Nomad Enterprise). This works when
both the default and non-default clusters are available and are the same
version, but not when they do not.

Fixes: https://github.com/hashicorp/nomad/issues/19442
2023-12-12 11:44:54 -05:00
James Rasell
71ea1deda7 cli: Fix bug in var put command using mix of flags and spec. (#19423) 2023-12-12 08:31:22 +00:00
Daniel Bennett
eb23add189 e2e: sleep in docker job (#19434) 2023-12-11 15:38:14 -06:00
Tim Gross
7f87ede1e2 auth: respect default tls.verify_server_hostname=false (#19425)
In Nomad 1.7.0, we refactored much of the authentication code to eliminate nil
ACLs and create "virtual" ACL objects that can be used to reduce the risk of
fail-open security bugs. In doing so, we accidentally dropped support for the
default `tls.verify_server_hostname=false` option.

Fix the bug by including the field in the set of conditions we check for the TLS
argument we pass into the constructor (this keeps "policy" separate from
"mechanism" in the auth code and reduces the number of dimensions we need to
test). Change the field name in the Authenticator to better match the intent.
2023-12-11 14:20:13 -05:00
Phil Renaud
b1654016c0 Fix a small regression where accessor is hidden by default when editing a token (#19432) 2023-12-11 13:00:11 -05:00
Tom Davies
c983a8f0ad Fixes Consul token checking when policies exist within namespaces (#18516)
* e2e/connect: adds test for namespace policies

* consul: use token namespace when fetching policies

* changelog

* fixup! e2e/connect: adds test for namespace policies
2023-12-11 10:07:32 -06:00
Phil Renaud
268e92eaba [ui] Small-screen styles for exec window (#19332)
* Phones can exec too

* De-magic breakpoints
2023-12-11 09:42:25 -05:00
Luiz Aoqui
99d72b7154 docs: fix placement of Consul auth method configs (#19404)
The auth method names are used by Nomad clients, not servers.
2023-12-11 09:16:57 -05:00
James Rasell
e4812738e8 changelog: add entry for #19415 (#19416) 2023-12-11 11:25:06 +00:00
Tim Gross
e551814df5 docs: add warnings about backing up keyring to snapshot commands (#19400)
The `operator snapshot` commands and agent don't back up Nomad's key
material. Add some warnings about this to places where users might be looking
for information on cluster recovery.

Fixes: https://github.com/hashicorp/nomad/issues/19389
2023-12-08 16:05:05 -05:00
Tim Gross
ad9520c240 docs: add warning not to use 1.7.0 (#19399)
Nomad 1.7.0 should be considered "yanked". Add a note about this to the upgrade
guide.
2023-12-08 15:19:27 -05:00
Tim Gross
78f7e40636 Post 1.7.1 release (#19398) 2023-12-08 14:54:47 -05:00
hc-github-team-nomad-core
72940e8cfb Prepare for next release 2023-12-08 14:39:09 -05:00
hc-github-team-nomad-core
180fd54918 Generate files for 1.7.1 release 2023-12-08 14:39:09 -05:00
Adrian Todorov
1eb1dbfa36 docs: update PKI example in template block with the new pkiCert function (#19394) 2023-12-08 14:23:12 -05:00
Seth Hoenig
39eb17f3ec docs: describe the need for dmidecode in docs (#19348) 2023-12-08 10:45:37 -06:00
Seth Hoenig
f3cbe2e29a e2e: sleep a bit in short lived docker jobs (#19384) 2023-12-08 10:44:43 -06:00
Phil Renaud
5d2688a257 [ui] Two small UI quality of life changes (#19377)
* Jobs index without groups

* Download button only appears if you have content in your template

* No longer need to test for the group count in jobs index
2023-12-08 11:21:14 -05:00
Daniel Bennett
e9ff6d74d3 e2e: unflake oversubscription.testExec (#19373)
poll with must.Wait() instead of hard-coded sleep
waiting for poststart task to run, and wait for longer
2023-12-08 10:20:18 -06:00
Tim Gross
8e8309e58e UI: fix column header typo on job services page (#19370) 2023-12-08 10:58:23 -05:00
Luiz Aoqui
099ee06a60 Revert "deps: update go-metrics to v0.5.3 (#19190)" (#19374)
* Revert "deps: update go-metrics to v0.5.3 (#19190)"

This reverts commit ddb060d8b3.

* changelog: add entry for #19374
2023-12-08 08:46:55 -05:00
Tim Gross
f1be76b8b8 keyring: replicate RSA private key via GetKey RPC (#19350)
When we added a RSA key for signing Workload Identities, we added it to the
keystore serialization but did not also add it to the `GetKey` RPC. This means
that when a key is rotated, the RSA key will not come along. The Nomad leader
signs all Workload Identities, but external consumers of WI (like Consul or
Vault) will verify the WI against any of the servers. If the request to verify
hits a follower, the follower will not have the RSA private key and cannot use
the existing ed25519 key to verify WIs with the `RS256` algorithm.

Add the RSA key material to the `GetKey` RPC.

Also remove an extraneous write to disk that happens for each key each time we
restart the Nomad server.

Fixes: #19340
2023-12-07 14:15:08 -05:00
Tim Gross
d7a5274164 client: allow incomplete allocrunners to be removed on restore (#16638)
If an allocrunner is persisted to the client state but the client stops before
task runner can start, we end up with an allocation in the database with
allocrunner state but no taskrunner state. This ends up mimicking an old
pre-0.9.5 state where this state was not recorded and that hits a backwards
compatibility shim. This leaves allocations in the client state that can never
be restored, but won't ever be removed either.

Update the backwards compatibility shim so that we fail the restore for the
allocrunner and remove the allocation from the client state. Taskrunners persist
state during graceful shutdown, so it shouldn't be possible to leak tasks that
have actually started. This lets us "start over" with the allocation, if the
server still wants to place it on the client.
2023-12-07 14:04:55 -05:00
Tim Gross
fb58dd835d docs: expand on Sentinel policy reference (#19335) 2023-12-07 14:04:43 -05:00
Seth Hoenig
f146678f43 ci: use go-modtool with config file (#19333) 2023-12-07 11:12:39 -06:00
Luiz Aoqui
c624dc2121 config: fix loading Vault token from env var (#19349)
The `defaultVault` variable is a pointer to the Vault configuration
named `default`. Initially, this variable points to the Vault
configuration that is used to load CLI flag values, but after those are
merged with the default and config file values the pointer reference
must be updated before mutating the config with environment variable
values.
2023-12-07 11:56:53 -05:00
Luiz Aoqui
27d2ad1baf cli: add -dev-consul and -dev-vault agent mode (#19327)
The `-dev-consul` and `-dev-vault` flags add default identities and
configuration to the Nomad agent to connect and use the workload
identity integration with Consul and Vault.
2023-12-07 11:51:20 -05:00
Daniel Bennett
7baf3c012c e2e: even more time for exec+java tests (#19347) 2023-12-07 10:23:39 -06:00
Piotr Kazmierczak
92bc568c44 Merge pull request #19345 from hashicorp/post-1.7.0-release
Post 1.7.0 release
2023-12-07 17:22:19 +01:00
Piotr Kazmierczak
b737b5125c Merge release 1.7.0 files 2023-12-07 16:48:19 +01:00
hc-github-team-nomad-core
d6f1a60178 Prepare for next release 2023-12-07 16:43:02 +01:00
hc-github-team-nomad-core
e799b06f02 Generate files for 1.7.0 release 2023-12-07 16:43:02 +01:00
Piotr Kazmierczak
cff80bbdc0 prepare release 1.7.0 2023-12-07 16:43:01 +01:00
Juana De La Cuesta
8eee5277b9 style: add missing changelog entry for prevent reschedule (#19341) 2023-12-07 15:41:15 +01:00
Seth Hoenig
8cde7a4f70 e2e: turn of extreme verbose metrics test logging (#19330) 2023-12-06 16:08:49 -06:00
Tim Gross
3c4e2009f5 connect: deployments should wait for Connect sidecar checks (#19334)
When a Connect service is registered with Consul, Nomad includes the nested
`Connect.SidecarService` field that includes health checks for the Envoy
proxy. Because these are not part of the job spec, the alloc health tracker
created by `health_hook` doesn't know to read the value of these checks.

In many circumstances this won't be noticed, but if the Envoy health check
happens to take longer than the `update.min_healthy_time` (perhaps because it's
been set low), it's possible for a deployment to progress too early such that
there will briefly be no healthy instances of the service available in Consul.

Update the Consul service client to find the nested sidecar service in the
service catalog and attach it to the results provided to the tracker. The
tracker can then check the sidecar health checks.

Fixes: https://github.com/hashicorp/nomad/issues/19269
2023-12-06 16:59:51 -05:00
Tim Gross
340c9ebd47 E2E: extend timeout on CSI snapshot test (#19338)
The EBS snapshot operation can take a long time to complete. Recent runs have
shown we sometimes get up to the 10s timeout on the context we're giving the CLI
command. Extend this so that we're not getting spurious timeouts.

Fixes: https://github.com/hashicorp/nomad/issues/19118
2023-12-06 16:34:54 -05:00
Daniel Bennett
36f69a8e88 e2e: more occasionally slow exec tasks (#19337) 2023-12-06 15:22:15 -06:00
Daniel Bennett
9fe1f0aadc e2e: fix ConsulNamespaces tests (#19325)
* cleanup consul tokens by accessor id
rather than secret id, which has been failing for some time with:
> 404 (Cannot find token to delete)

* expect subset of consul namespaces
the consul test cluster may have namespaces from other unrelated tests
2023-12-06 12:21:27 -06:00
Juana De La Cuesta
cf539c405e Add a new parameter to avoid starting a replacement for lost allocs (#19101)
This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run.

In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned.
Implements issue #10366

Co-authored-by: Dom Lavery <dom@circleci.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-12-06 12:28:42 +01:00
Michael Schurter
b0e55b4ba6 Merge pull request #19320 from hashicorp/go1.21.5
Update to Go 1.21.5
2023-12-05 11:48:13 -08:00
Michael Schurter
f97806c5ea cl 2023-12-05 11:27:02 -08:00
Michael Schurter
7ef5c9e906 Update to Go 1.21.5 2023-12-05 11:23:31 -08:00
Seth Hoenig
87e7bf4ab2 e2e: skip connect test that does a restart of nomad agent (#19316) 2023-12-05 09:15:09 -06:00
Seth Hoenig
35ccb7ecdb e2e: use correct url to download zip file from go-getter repository (#19315) 2023-12-05 09:11:08 -06:00