Commit Graph

25079 Commits

Author SHA1 Message Date
Juana De La Cuesta
72acaf6623 [17449] Introduces a locking mechanism over variables (#18207)
It includes the work over the state store, the PRC server, the HTTP server, the go API package and the CLI's  command. To read more on the actuall functionality, refer to the RFCs [NMD-178] Locking with Nomad Variables and [NMD-179] Leader election using locking mechanism for the Autoscaler.
2023-09-21 17:56:33 +02:00
Piotr Kazmierczak
86d2cdcf80 client: split identity_hook across allocrunner and taskrunner (#18431)
This commit splits identity_hook between the allocrunner and taskrunner. The
allocrunner-level part of the hook signs each task identity, and the
taskrunner-level part picks it up and stores secrets for each task.

The code revamps the WIDMgr, which is now split into 2 interfaces:
IdentityManager which manages renewals of signatures and handles sending
updates to subscribers via Watch method, and IdentitySigner which only does the
signing.

This work is necessary for having a unified Consul login workflow that comes
with the new Consul integration. A new, allocrunner-level consul_hook will now
be the only hook doing Consul authentication.
2023-09-21 17:31:27 +02:00
Phil Renaud
cf8dde0850 [ui] Color indicators for server/client status (#18318)
* Color the status cell for servers and nodes

* Testfix and changelog

* Leader indicator moved post-word

* Icon and badge treatment

* Capitalizing test checks

* HDS badges dont expose statusClass like we used to, so stop checking for it
2023-09-20 17:05:04 -04:00
Tim Gross
d7bd47d60f config: remove consul.template_identity in lieu of task_identity (#18540)
The original thinking for Workload Identity integration with Consul and Vault
was that we'd allow `template` blocks to specify their own identity. But because
the login to Consul/Vault to get tokens happens at the task level, this would
involve making the `template` block a new WID watcher on its own rather than
using the Consul and Vault hooks we're building at the group/task level.

So it doesn't make sense to have separate identities for individual `template`
blocks rather than at the level of tasks. Update the agent configuration to
rename the `template_identity` to the more accurate `task_identity`, which will
be used for any non-service hooks (just `template` today).

Update the implicit identities job mutation hook to create the identity we'll
need as well.
2023-09-20 15:43:08 -04:00
Tim Gross
fdc6c2151d vault: select Vault API client by cluster name (#18533)
Nomad Enterprise will support configuring multiple Vault clients. Instead of
having a single Vault client field in the Nomad client, we'll have a function
that callers can parameterize by the Vault cluster name that returns the
correctly configured Vault API client wrapper.
2023-09-19 14:35:01 -04:00
Tim Gross
fcb9c4a39c job endpoint: implicit constraints for multi-Vault/Consul (#18528)
Update the implicit constraint mutating hook to support multiple Vault and
Consul clusters in Nomad Enterprise. This requires moving the Vault/Consul
mutating hooks earlier in the list as well, because that'll ensure we've
canonicalized properly for multiple clusters.
2023-09-19 12:19:44 -04:00
Daniel Bennett
4895d708b4 csi: implement NodeExpandVolume (#18522)
following ControllerExpandVolume
in c6dbba7cde,
which expands the disk at e.g. a cloud vendor,
the controller plugin may say that we also need
to issue NodeExpandVolume for the node plugin to
make the new disk space available to task(s) that
have claims on the volume by e.g. expanding
the filesystem on the node.

csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume
2023-09-18 10:30:15 -05:00
dependabot[bot]
d564d7811b chore(website/content): update content-conformance version (#17482) 2023-09-18 11:08:51 -04:00
Seth Hoenig
591394fb62 drivers: plumb hardware topology via grpc into drivers (#18504)
* drivers: plumb hardware topology via grpc into drivers

This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.

* cr: use enum instead of bool for core grade

* cr: fix test slit tables to be possible
2023-09-18 08:58:07 -05:00
Tim Gross
b105e41265 job endpoint: reorder check for disabled job registrations (#18523)
When job registrations are disabled, there's no reason to do the potentially
expensive job mutation and admission hooks. Move the ACL resolution and this
check before those hooks.
2023-09-18 09:15:02 -04:00
Tim Gross
5bd8b89c19 helper: reduce size of buffer used by template connections (#18524)
In #12458 we added an in-memory connection buffer so that template runners that
want access to the Nomad API for Service Registration and Variables can
communicate with Nomad without having to create a real HTTP client. The size of
this buffer (1 MiB) was taken directly from its usage in Vault, and each
connection makes 2 such buffers (send and receive). Because each template runner
has its own connection, when there are large numbers of allocations this adds up
to significant memory usage.

The largest Nomad Variable payload is 64KiB, and a small amount of
metadata. Service Registration responses are much smaller, and we don't include
check results in them (as Consul does), so the size is relatively bounded. We
should be able to safely reduce the size of the buffer by a factor of 10 or more
without forcing the template runner to make multiple read calls over the buffer.

Fixes: #18508
2023-09-18 09:12:09 -04:00
Tim Gross
ad4436ffff job endpoint hooks to enforce access to vault/consul clusters (CE) (#18521)
In Nomad Enterprise, namespace rules can control access to Vault and Consul
clusters. Add job endpoint mutating and validating hooks for both Vault and
Consul so that ENT can enforce these namespace rules. This changeset includes
the stub behaviors for CE.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1234
2023-09-15 13:58:37 -04:00
Shantanu Gadgil
f37f84182d docs: example of multiple crons (#18511) 2023-09-15 10:10:56 -04:00
Gerard Nguyen
1339599185 cli: Add prune flag for nomad server force-leave command (#18463)
This feature will help operator to remove a failed/left node from Serf layer immediately
without waiting for 24 hours for the node to be reaped

* Update CLI with prune flag
* Update API /v1/agent/force-leave with prune query string parameter
* Update CLI and API doc
* Add unit test
2023-09-15 08:45:11 -04:00
Shantanu Gadgil
d2dd64f2c4 point to hashicorp's cronexpr (#18510)
point to hashicorp's cronexpr
2023-09-15 09:23:58 +01:00
Luiz Aoqui
5f951d506a docs: update Vault config for workload identity (#18503)
Update documentation for the agent configuration `vault` block for
workload identity support.
2023-09-14 19:38:36 -03:00
Daniel Bennett
c6dbba7cde csi: implement ControllerExpandVolume (#18359)
the first half of volume expansion,
this allows a user to update requested capacity
("capacity_min" and "capacity_max") in a volume
specification file, and re-issue either Register
or Create volume commands (or api calls).

the requested capacity will now be "reconciled"
with the current real capacity of the volume,
issuing a ControllerExpandVolume RPC call
to a running controller plugin, if requested
"capacity_min" is higher than the current
capacity on the volume in state.

csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#controllerexpandvolume

note: this does not yet cover NodeExpandVolume
2023-09-14 14:13:04 -05:00
wrli20
0329393a28 docs: fix link to alicloud autoscaler plugin (#18495) 2023-09-14 09:23:58 -04:00
stswidwinski
bd519dcbf4 Fix for https://github.com/hashicorp/nomad/issues/18493 (#18494)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-09-14 13:35:15 +01:00
Luiz Aoqui
da7525d9f7 consul: check for warnings on service identity (#18466)
Apply workload identity warnings to group and task level Consul services
that have an identity assigned.
2023-09-13 17:58:34 -03:00
Luiz Aoqui
9e094f64b0 Merge pull request #18489 from hashicorp/post-1.6.2-release
Post 1.6.2 release
2023-09-13 15:50:37 -03:00
Luiz Aoqui
a68a15d97c Merge release 1.6.2 files 2023-09-13 15:44:49 -03:00
hc-github-team-nomad-core
648a53fc49 Prepare for next release 2023-09-13 15:41:21 -03:00
hc-github-team-nomad-core
297de953e0 Generate files for 1.6.2 release 2023-09-13 15:41:21 -03:00
Luiz Aoqui
b9ec271463 changelog: move entry #17858 to improvement (#18484) 2023-09-13 13:35:03 -03:00
Luiz Aoqui
391a6af979 changelog: add entry for #18184 (#18483) 2023-09-13 13:03:11 -03:00
Pavel Aminov
5ddada2973 Adding node_pool to job key validation (#18366) 2023-09-13 11:52:04 -03:00
Tim Gross
756a22f4d5 lint: fix a missing gofmt -s (#18480) 2023-09-13 10:22:19 -04:00
Joshua Timmons
4b6cc14216 Add more links from Variables doc to examples (#18468) 2023-09-13 10:21:41 -04:00
Shantanu Gadgil
12580c345a bubble up the error message from go-getter (#18444) 2023-09-13 09:36:39 -04:00
wrli20
46e72aa8d5 add new target plugin for aliyun (#18473) 2023-09-13 13:39:35 +01:00
James Rasell
532911c380 csi: remove unused internal funcs. (#18459) 2023-09-13 08:11:34 +01:00
Luiz Aoqui
3534307d0d vault: add use_identity and default_identity agent configuration and implicit workload identity (#18343) 2023-09-12 13:53:37 -03:00
James Rasell
1b74f8f9cf scripts: update CNI plugins install version to v1.3.0 (#18460) 2023-09-12 16:20:28 +01:00
Luiz Aoqui
82372fecb8 config: add TTL to agent identity config (#18457)
Add support for identity token TTL in agent configuration fields such as
Consul `service_identity` and `template_identity`.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-09-12 11:13:09 -03:00
Seth Hoenig
2e1974a574 client: refactor cpuset partitioning (#18371)
* client: refactor cpuset partitioning

This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.

Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.

Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.

* cr: tweaks for feedback
2023-09-12 09:11:11 -05:00
Tim Gross
77ca0bb8af docs: support multiple Vault and Consul clusters (ENT-only) (#18432)
This changeset is the documentation for supporting multiple Vault and Consul
clusters in Nomad Enterprise. It includes documentation changes for the agent
configuration (#18255), the namespace specification (#18425), and the vault,
consul, and service blocks of the jobspec (#18409).
2023-09-12 09:33:14 -04:00
Daniel Bennett
a51d46c65c e2e: packer windows from "ECS_Optimized" image (#18453)
"Containers" AMIs evaporated at some point...
https://aws.amazon.com/marketplace/pp/prodview-yfve3zjgfjtug
> This version has been removed and is no longer
> available to new customers.
2023-09-11 12:26:32 -05:00
James Rasell
d923fc554d consul/connect: add new fields to Consul Connect upstream block (#18430)
Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>
2023-09-11 16:02:52 +01:00
James Rasell
668dc5f7a7 client: fix role permission issue with duplicate policies. (#18419)
This change deduplicates the ACL policy list generated from ACL
roles referenced within an ACL token on the client.

Previously the list could contain duplicates, which would cause
erronous permission denied errors when calling client related RPC/
HTTP API endpoints. This is because the client calls the ACL get
policies endpoint which subsequently ensures the caller has
permission to view the ACL policies. This check is performed by
comparing the requested list args with the policies referenced by
the caller ACL token. When a duplicate is present, this check
fails, as the check must ensure the slices match exactly.
2023-09-11 12:52:08 +01:00
Michael Schurter
ef24e40b39 identity: support jwt expiration and rotation (#18262)
Implements expirations and renewals for alternate workload identity tokens.
2023-09-08 14:50:34 -07:00
Daniel Bennett
22cbb913db csi: rename volume Mounter to Manager (#18434)
to align with its broader purpose,
and the volumeManager implementation
2023-09-08 15:33:46 -05:00
Tim Gross
3ee6c31241 ACLs: allow/deny/default config for Consul/Vault clusters by namespace (#18425)
In Nomad Enterprise when multiple Vault/Consul clusters are configured, cluster admins can control access to clusters for jobs via namespace ACLs, similar to how we've done so for node pools. This changeset updates the ACL configuration structs, but doesn't wire them up.
2023-09-08 11:37:20 -04:00
Tim Gross
b022346575 fingerprint: backoff on Consul fingerprint after initial success (#18426)
In the original design of Consul fingerprinting, we would poll every period so
that we could change the client's fingerprint if Consul became unavailable. As
of 1.4.0 (ref #14673) we no longer update the fingerprint in order to avoid
excessive `Node.Register` RPCs when someone's Consul cluster is flapping.

This allows us to safely backoff Consul fingerprinting on success, just as we
have with Vault.
2023-09-08 08:17:07 -04:00
Tim Gross
a8e68e6479 fingerprint: add support for fingerprinting multiple Consul clusters (#18392)
fingerprint: add support for fingerprinting multiple Consul clusters

Add fingerprinting we'll need to accept multiple Consul clusters in upcoming
Nomad Enterprise features. The fingerprinter will create a map of Consul clients
by cluster name. In Nomad CE, all but the default cluster will be ignored and
there will be no visible behavior change.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-09-07 14:05:35 -04:00
Tim Gross
7cdd592809 jobspec: support cluster field for Vault block (#18408)
This field supports the upcoming ENT-only multiple Vault clusters feature. The
job validation and mutation hooks will come in a separate PR.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-09-07 10:15:28 -04:00
Tim Gross
c145e8b30f fingerprint: add warning in CE when there are multiple vaults (#18412)
Nomad CE only supports a single (default) Vault cluster, so log a warning if the
user has configured multiple Vaults.
2023-09-07 09:51:48 -04:00
Tim Gross
7863d7bcbb jobspec: support cluster field for Consul and Service blocks (#18409)
This field supports the upcoming ENT-only multiple Consul clusters feature. The
job validation and mutation hooks will come in a separate PR.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-09-07 09:48:49 -04:00
James Rasell
0f94bc2482 test: fix name of state service registration test file. (#18406) 2023-09-07 10:30:05 +01:00
James Rasell
4f3a2e1a7d docs: fix broken link to Consul DNS overview page (#18410) 2023-09-07 08:39:49 +01:00