Commit Graph

25069 Commits

Author SHA1 Message Date
Tim Gross
5bd8b89c19 helper: reduce size of buffer used by template connections (#18524)
In #12458 we added an in-memory connection buffer so that template runners that
want access to the Nomad API for Service Registration and Variables can
communicate with Nomad without having to create a real HTTP client. The size of
this buffer (1 MiB) was taken directly from its usage in Vault, and each
connection makes 2 such buffers (send and receive). Because each template runner
has its own connection, when there are large numbers of allocations this adds up
to significant memory usage.

The largest Nomad Variable payload is 64KiB, and a small amount of
metadata. Service Registration responses are much smaller, and we don't include
check results in them (as Consul does), so the size is relatively bounded. We
should be able to safely reduce the size of the buffer by a factor of 10 or more
without forcing the template runner to make multiple read calls over the buffer.

Fixes: #18508
2023-09-18 09:12:09 -04:00
Tim Gross
ad4436ffff job endpoint hooks to enforce access to vault/consul clusters (CE) (#18521)
In Nomad Enterprise, namespace rules can control access to Vault and Consul
clusters. Add job endpoint mutating and validating hooks for both Vault and
Consul so that ENT can enforce these namespace rules. This changeset includes
the stub behaviors for CE.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1234
2023-09-15 13:58:37 -04:00
Shantanu Gadgil
f37f84182d docs: example of multiple crons (#18511) 2023-09-15 10:10:56 -04:00
Gerard Nguyen
1339599185 cli: Add prune flag for nomad server force-leave command (#18463)
This feature will help operator to remove a failed/left node from Serf layer immediately
without waiting for 24 hours for the node to be reaped

* Update CLI with prune flag
* Update API /v1/agent/force-leave with prune query string parameter
* Update CLI and API doc
* Add unit test
2023-09-15 08:45:11 -04:00
Shantanu Gadgil
d2dd64f2c4 point to hashicorp's cronexpr (#18510)
point to hashicorp's cronexpr
2023-09-15 09:23:58 +01:00
Luiz Aoqui
5f951d506a docs: update Vault config for workload identity (#18503)
Update documentation for the agent configuration `vault` block for
workload identity support.
2023-09-14 19:38:36 -03:00
Daniel Bennett
c6dbba7cde csi: implement ControllerExpandVolume (#18359)
the first half of volume expansion,
this allows a user to update requested capacity
("capacity_min" and "capacity_max") in a volume
specification file, and re-issue either Register
or Create volume commands (or api calls).

the requested capacity will now be "reconciled"
with the current real capacity of the volume,
issuing a ControllerExpandVolume RPC call
to a running controller plugin, if requested
"capacity_min" is higher than the current
capacity on the volume in state.

csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#controllerexpandvolume

note: this does not yet cover NodeExpandVolume
2023-09-14 14:13:04 -05:00
wrli20
0329393a28 docs: fix link to alicloud autoscaler plugin (#18495) 2023-09-14 09:23:58 -04:00
stswidwinski
bd519dcbf4 Fix for https://github.com/hashicorp/nomad/issues/18493 (#18494)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-09-14 13:35:15 +01:00
Luiz Aoqui
da7525d9f7 consul: check for warnings on service identity (#18466)
Apply workload identity warnings to group and task level Consul services
that have an identity assigned.
2023-09-13 17:58:34 -03:00
Luiz Aoqui
9e094f64b0 Merge pull request #18489 from hashicorp/post-1.6.2-release
Post 1.6.2 release
2023-09-13 15:50:37 -03:00
Luiz Aoqui
a68a15d97c Merge release 1.6.2 files 2023-09-13 15:44:49 -03:00
hc-github-team-nomad-core
648a53fc49 Prepare for next release 2023-09-13 15:41:21 -03:00
hc-github-team-nomad-core
297de953e0 Generate files for 1.6.2 release 2023-09-13 15:41:21 -03:00
Luiz Aoqui
b9ec271463 changelog: move entry #17858 to improvement (#18484) 2023-09-13 13:35:03 -03:00
Luiz Aoqui
391a6af979 changelog: add entry for #18184 (#18483) 2023-09-13 13:03:11 -03:00
Pavel Aminov
5ddada2973 Adding node_pool to job key validation (#18366) 2023-09-13 11:52:04 -03:00
Tim Gross
756a22f4d5 lint: fix a missing gofmt -s (#18480) 2023-09-13 10:22:19 -04:00
Joshua Timmons
4b6cc14216 Add more links from Variables doc to examples (#18468) 2023-09-13 10:21:41 -04:00
Shantanu Gadgil
12580c345a bubble up the error message from go-getter (#18444) 2023-09-13 09:36:39 -04:00
wrli20
46e72aa8d5 add new target plugin for aliyun (#18473) 2023-09-13 13:39:35 +01:00
James Rasell
532911c380 csi: remove unused internal funcs. (#18459) 2023-09-13 08:11:34 +01:00
Luiz Aoqui
3534307d0d vault: add use_identity and default_identity agent configuration and implicit workload identity (#18343) 2023-09-12 13:53:37 -03:00
James Rasell
1b74f8f9cf scripts: update CNI plugins install version to v1.3.0 (#18460) 2023-09-12 16:20:28 +01:00
Luiz Aoqui
82372fecb8 config: add TTL to agent identity config (#18457)
Add support for identity token TTL in agent configuration fields such as
Consul `service_identity` and `template_identity`.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-09-12 11:13:09 -03:00
Seth Hoenig
2e1974a574 client: refactor cpuset partitioning (#18371)
* client: refactor cpuset partitioning

This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.

Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.

Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.

* cr: tweaks for feedback
2023-09-12 09:11:11 -05:00
Tim Gross
77ca0bb8af docs: support multiple Vault and Consul clusters (ENT-only) (#18432)
This changeset is the documentation for supporting multiple Vault and Consul
clusters in Nomad Enterprise. It includes documentation changes for the agent
configuration (#18255), the namespace specification (#18425), and the vault,
consul, and service blocks of the jobspec (#18409).
2023-09-12 09:33:14 -04:00
Daniel Bennett
a51d46c65c e2e: packer windows from "ECS_Optimized" image (#18453)
"Containers" AMIs evaporated at some point...
https://aws.amazon.com/marketplace/pp/prodview-yfve3zjgfjtug
> This version has been removed and is no longer
> available to new customers.
2023-09-11 12:26:32 -05:00
James Rasell
d923fc554d consul/connect: add new fields to Consul Connect upstream block (#18430)
Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>
2023-09-11 16:02:52 +01:00
James Rasell
668dc5f7a7 client: fix role permission issue with duplicate policies. (#18419)
This change deduplicates the ACL policy list generated from ACL
roles referenced within an ACL token on the client.

Previously the list could contain duplicates, which would cause
erronous permission denied errors when calling client related RPC/
HTTP API endpoints. This is because the client calls the ACL get
policies endpoint which subsequently ensures the caller has
permission to view the ACL policies. This check is performed by
comparing the requested list args with the policies referenced by
the caller ACL token. When a duplicate is present, this check
fails, as the check must ensure the slices match exactly.
2023-09-11 12:52:08 +01:00
Michael Schurter
ef24e40b39 identity: support jwt expiration and rotation (#18262)
Implements expirations and renewals for alternate workload identity tokens.
2023-09-08 14:50:34 -07:00
Daniel Bennett
22cbb913db csi: rename volume Mounter to Manager (#18434)
to align with its broader purpose,
and the volumeManager implementation
2023-09-08 15:33:46 -05:00
Tim Gross
3ee6c31241 ACLs: allow/deny/default config for Consul/Vault clusters by namespace (#18425)
In Nomad Enterprise when multiple Vault/Consul clusters are configured, cluster admins can control access to clusters for jobs via namespace ACLs, similar to how we've done so for node pools. This changeset updates the ACL configuration structs, but doesn't wire them up.
2023-09-08 11:37:20 -04:00
Tim Gross
b022346575 fingerprint: backoff on Consul fingerprint after initial success (#18426)
In the original design of Consul fingerprinting, we would poll every period so
that we could change the client's fingerprint if Consul became unavailable. As
of 1.4.0 (ref #14673) we no longer update the fingerprint in order to avoid
excessive `Node.Register` RPCs when someone's Consul cluster is flapping.

This allows us to safely backoff Consul fingerprinting on success, just as we
have with Vault.
2023-09-08 08:17:07 -04:00
Tim Gross
a8e68e6479 fingerprint: add support for fingerprinting multiple Consul clusters (#18392)
fingerprint: add support for fingerprinting multiple Consul clusters

Add fingerprinting we'll need to accept multiple Consul clusters in upcoming
Nomad Enterprise features. The fingerprinter will create a map of Consul clients
by cluster name. In Nomad CE, all but the default cluster will be ignored and
there will be no visible behavior change.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-09-07 14:05:35 -04:00
Tim Gross
7cdd592809 jobspec: support cluster field for Vault block (#18408)
This field supports the upcoming ENT-only multiple Vault clusters feature. The
job validation and mutation hooks will come in a separate PR.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-09-07 10:15:28 -04:00
Tim Gross
c145e8b30f fingerprint: add warning in CE when there are multiple vaults (#18412)
Nomad CE only supports a single (default) Vault cluster, so log a warning if the
user has configured multiple Vaults.
2023-09-07 09:51:48 -04:00
Tim Gross
7863d7bcbb jobspec: support cluster field for Consul and Service blocks (#18409)
This field supports the upcoming ENT-only multiple Consul clusters feature. The
job validation and mutation hooks will come in a separate PR.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-09-07 09:48:49 -04:00
James Rasell
0f94bc2482 test: fix name of state service registration test file. (#18406) 2023-09-07 10:30:05 +01:00
James Rasell
4f3a2e1a7d docs: fix broken link to Consul DNS overview page (#18410) 2023-09-07 08:39:49 +01:00
James Rasell
b6f6541f50 test: add test for state custom iterator. (#18407) 2023-09-07 08:35:05 +01:00
Daniel Bennett
c28cd59655 fix panic from zero JobTrackedVersions config (#18393)
that occurred during server fsm restore,
which later produced a negative slice index
when trying to upsertJobVersion.
2023-09-06 11:11:52 -05:00
Dao Thanh Tung
82cbbacf69 Update the order of docker auth method (#18399)
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
2023-09-06 11:24:37 +01:00
James Rasell
652532b8ca docs: improve diagram on jobspec overview for multi group + tasks. (#18394) 2023-09-06 08:54:05 +01:00
Piotr Kazmierczak
2fffb96604 client: new Consul client (#18370)
This PR introduces a new Consul client that returns SI tokens based on requests
that contain JWTs.
2023-09-05 20:55:36 +02:00
dependabot[bot]
a03aa0cebb build(deps-dev): bump word-wrap from 1.2.3 to 1.2.5 in /website (#18107) 2023-09-05 13:37:10 +01:00
Dao Thanh Tung
6ba600cbf1 Add unit test for api/deployments.go (#18380) 2023-09-05 07:44:54 +01:00
James Rasell
290a310544 fsm: tidy up wording and func signature in fsm ce file. (#18383) 2023-09-04 11:31:55 +01:00
Luiz Aoqui
b614ef3b01 client: fix panic on alloc restore (#18356)
When restoring an allocation `WIDMgr` was not being set in the alloc
runner config, resulting in a nil panic when the task runner attempted
to start.

Since we will often require the same configuration values when creating
or restoring a new allocation, this commit moves the logic to a shared
function to ensure that `addAlloc` and `restoreState` configure alloc
runners with the same values.
2023-09-01 11:42:00 -03:00
James Rasell
776a26bce7 raft: remove use of deprecated Leader func. (#18352)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-09-01 10:01:34 +01:00