Commit Graph

25110 Commits

Author SHA1 Message Date
James Rasell
df16c96a9f cli: use same offset when following single or multiple alloc logs. (#18604) 2023-10-03 08:43:14 +01:00
Piotr Kazmierczak
3d62438876 consul: consul taskrunner hook should only write tokens that belong to its task (#18635)
Ref hashicorp/team-nomad#404
2023-10-02 19:49:02 +02:00
Piotr Kazmierczak
62a0768775 consul: make service and task identity names unique (#18634)
Ref: hashicorp/team-nomad#404
2023-10-02 19:48:34 +02:00
Kevin Wang
e7b70adc2c cli: improve job and status text (#18628) 2023-10-02 10:31:57 -04:00
dependabot[bot]
ccafb94645 chore(deps): bump github.com/cyphar/filepath-securejoin (#18545) 2023-10-02 08:25:35 +01:00
Luiz Aoqui
7267be719f config: apply defaults to extra Consul and Vault (#18623)
* config: apply defaults to extra Consul and Vault

Apply the expected default values when loading additional Consul and
Vault cluster configuration. Without these defaults some fields would be
left empty.

* config: retain pointer of multi Consul and Vault

When calling `Copy()` the pointer reference from the `"default"` key of
the `Consuls` and `Vaults` maps to the `Consul` and `Vault` field of
`Config` was being lost.

* test: ensure TestAgent has the right reference to the default Consul config
2023-09-29 17:15:20 -03:00
Michael Schurter
3f9bd17687 client: prevent watching stale alloc state (#18612)
When waiting on a previous alloc we must query against the leader before
switching to a stale query with index set.

Also check to ensure the response is fresh before using it like #18269
2023-09-29 12:46:28 -07:00
Tim Gross
aaee3076c2 consul: allow consul block in task scope (#18597)
To support Workload Identity with Consul for templates, we want templates to be
able to use the WI created at the task scope (either implicitly or set by the
user). But to allow different tasks within a group to be assigned to different
clusters as we're doing for Vault, we need to be able to set the `consul` block
with its `cluster` field at the task level to override the group.
2023-09-29 15:03:48 -04:00
Phil Renaud
8da40465af fallback to get definition if submission 404s when restarting job in ui (#18621) 2023-09-29 14:52:21 -04:00
Phil Renaud
badaecea66 Access Control CRUD: Make name fields for Policies and Roles required (#18605) 2023-09-29 12:33:03 -04:00
Piotr Kazmierczak
5dab41881b client: new consul_hook (#18557)
This PR introduces a new allocrunner-level consul_hook which iterates over
services and tasks, if their provider is consul, fetches consul tokens for all of
them, stores them in AllocHookResources and in task secret dirs.

Ref: hashicorp/team-nomad#404

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-09-29 17:41:48 +02:00
Piotr Kazmierczak
0a75a42d94 WI: WIDMgr should expose default identity signatures (#18610)
Since the identity_hook is meant to be the central place that makes signed
identities available to other hooks, it should also expose the default identity
that is signed by the plan applier.

Ref: hashicorp/team-nomad#404
2023-09-29 15:17:59 +02:00
James Rasell
b44cef0e66 docs: make upgrade version detail clearer. (#18608)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-09-29 08:31:14 +01:00
Luiz Aoqui
a4b29a29cb vault: add jwt_backend_path agent config (#18606)
Add agent configuration to allow cluster operators to define the path
where the JWT auth method backend is mounted.
2023-09-28 18:02:30 -03:00
Tim Gross
5001bf4547 consul: use constant instead of "default" literal (#18611)
Use the constant `structs.ConsulDefaultCluster` instead of the string literal
"default", as we've done for Vault.
2023-09-28 16:50:21 -04:00
Juana De La Cuesta
e0407b4cf2 server: Add reporting configuration to nomad server (#18609)
* func: add reporting config to server

* func: add reporting manager for ce

* func: add reporting to quick tests
2023-09-28 22:00:43 +02:00
Michael Schurter
e73026dd4c client: prevent using stale allocs (#18601)
Similar to #18269, it is possible that even if Node.GetClientAllocs
retrieves fresh allocs that the subsequent Alloc.GetAllocs call
retrieves stale allocs. While `diffAlloc(existing, updated)` properly
ignores stale alloc *updates*, alloc deletions have no such check.

So if a client retrieves an alloc created at index 123, and then a
subsequent Alloc.GetAllocs call hits a new server which returns results
at index 100, the client will stop the alloc created at 123 because it
will be missing from the stale response.

This change applies the same logic as #18269 and ensures only fresh
responses are used.

Glossary:
* fresh - modified at an index > the query index
* stale - modified at an index <= the query index
2023-09-28 11:42:57 -07:00
Phil Renaud
859087640a [ui] Simplify times in task events (#18595)
* Regexy time simplification in task events

* Oops, dont assume these are all task restart messages

* Update mirage to provide displayMessage instead of message

* Have a few acceptance tests look for .displayMessage instead of .message for equality now
2023-09-27 17:01:34 -04:00
Luiz Aoqui
fed1992cea vault: remove use_identity agent config (#18592)
The initial intention behind the `vault.use_identity` configuration was
to indicate to Nomad servers that they would need to sign a workload
identities for allocs with a `vault` block.

But in order to support identity renewal, #18262 and #18431 moved the
token signing logic to the alloc runner since a new token needs to be
signed prior to the TTL expiring.

So #18343 implemented `use_identity` as a flag to indicate that the
workload identity JWT flow should be used when deriving Vault tokens for
tasks.

But this configuration value is set on servers so it is not available to
clients at the time of token derivation, making its meaning not clear: a
job may end up using the identity-based flow even when `use_identity` is
`false`.

The only reliable signal available to clients at token derivation time
is the presence of an `identity` block for Vault, and this is already
configured with the `vault.default_identity` configuration block, making
`vault.use_identity` redundant.

This commit removes the `vault.use_identity` configuration and
simplifies the logic on when an implicit Vault identity is injected into
tasks.
2023-09-27 17:44:07 -03:00
Luiz Aoqui
868aba57bb vault: update identity name to start with vault_ (#18591)
* vault: update identity name to start with `vault_`

In the original proposal, workload identities used to derive Vault
tokens were expected to be called just `vault`. But in order to support
multiple Vault clusters it is necessary to associate identities with
specific Vault cluster configuration.

This commit implements a new proposal to have Vault identities named as
`vault_<cluster>`.
2023-09-27 15:53:28 -03:00
Phil Renaud
ef7bccbd40 [ui] ACL Roles in the UI, plus Role, Policy and Token management (#17770)
* Rename pages to include roles

* Models and adapters

* [ui] Any policy checks in the UI now check for roles' policies as well as token policies (#18346)

* combinedPolicies as a concept

* Classic decorator on role adapter

* We added a new request for roles, so the test based on a specific order of requests got fickle fast

* Mirage roles cluster scaffolded

* Acceptance test for roles and policies on the login page

* Update mirage mock for nodes fetch to account for role policies / empty token.policies

* Roles-derived policies checks

* [ui] Access Control with Roles and Tokens (#18413)

* top level policies routes moved into access control

* A few more routes and name cleanup

* Delog and test fixes to account for new url prefix and document titles

* Overview page

* Tokens and Roles routes

* Tokens helios table

* Add a role

* Hacky role page and deletion

* New policy keyboard shortcut and roles breadcrumb nav

* If you leave New Role but havent made any changes, remove the newly-created record from store

* Roles index list and general role route crud

* Roles index actually links to roles now

* Helios button styles for new roles and policies

* Handle when you try to create a new role without having any policies

* Token editing generally

* Create Token functionality

* Cant delete self-token but management token editing and deleting is fine

* Upgrading helios caused codemirror to explode, shimmed

* Policies table fix

* without bang-element condition, modifier would refire over and over

* Token TTL or Time setting

* time will take you on

* Mirage hooks for create and list roles

* Ensure policy names only use allow characters in mirage mocks

* Mirage mocked roles and policies in the default cluster

* log and lintfix

* chromedriver to 2.1.2

* unused unit tests removed

* Nice profile dropdown

* With the HDS accordion, rename our internal component scss ref

* design revisions after discussion

* Tooltip on deleted-policy tokens

* Two-step button peripheral isDeleting gcode removed

* Never to null on token save

* copywrite headers added and empty routefiles removed

* acceptance test fixes for policies endpoint

* Route for updating a token

* Policies testfixes

* Ember on-click-outside modifier upgraded with general ember-modifier upgrade

* Test adjustments to account for new profile header dropdown

* Test adjustments for tokens via policy pages

* Removed an unused route

* Access Control index page tests

* a11y tests

* Tokens index acceptance tests generally

* Lintfix

* Token edit page tests

* Token editing tests

* New token expiration tests

* Roles Index tests

* Role editing policies tests

* A complete set of Access Control Roles tests

* Policies test

* Be more specific about which row to check for expiration time

* Nil check on expirationTime equality

* Management tokens shouldnt show No Roles/Policies, give them their own designation

* Route guard on selftoken, conditional columns, and afterModel at parent to prevent orphaned policies on tokens/roles from stopping a new save

* Policy unloading on delete and other todos plus autofocus conditionally re-enabled

* Invalid policies non-links now a concept for Roles index

* HDS style links to make job.variables.alert links look like links again

* Mirage finding looks weird so making model async in hash even though redundant

* Drop rsvp

* RSVP wasnt the problem, cached lookups were

* remove old todo comments

* de-log
2023-09-27 14:53:09 -04:00
Luiz Aoqui
19241964a4 config: fix some issues with workload identity and multi Consul and Vault (#18590)
* config: fix multi consul and vault config parse

Capture the loop variable when parsing multiple Consul and Vault
configuration blocks so the duration parse function uses the correct
field when it's called later on.

* client: build Vault client with right config

When setting up the multiple Vault clients, the code was always loading
the default configuration, resulting in all clients to be configured the
same way.

* config: fix WorkloadIdentityConfig.Copy() method

Ensure `WorkloadIdentityConfig.Copy()` does not return the original
pointer for the `TTL` field.
2023-09-27 14:41:11 -03:00
Juana De La Cuesta
124272c050 server: Add reporting option to agent (#18572)
* func: add reporting option to agent

* func: add test for merge and fix comments

* Update config_ce.go

* Update config_ce.go

* Update config_ce.go

* fix: add reporting config to default configuration and update to use must over require

* Update command/agent/config_parse.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* Update nomad/structs/config/reporting.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* Update nomad/structs/config/reporting.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* style: rename license and reporting config

* fix: use default function instead of empty struct

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-09-27 00:11:32 +02:00
Daniel Bennett
9b74e11f06 csi: fix volume updating behavior (#18588)
fix for part of: c6dbba7cde
which allowed updating volumes while in use,
because CSI expand may occur while in use.
but it mistakenly stopped copying other
important values that may be updated
whether in use or not.

this moves some of the in-use validation to
happen during Merge(), before writing to state,
leaving UpsertCSIVolume with only minimal final
sanity-checking.
2023-09-26 14:58:17 -05:00
Daniel Bennett
fab968a748 csi: document volume expansion (#18573)
and show Capacity in `volume status` command.
2023-09-26 14:49:15 -05:00
Tim Gross
02a5aab359 consul: provide workload's Consul token to service client (#18559)
This is a work-in-progress changeset to provide workload-specific Consul tokens
that are created by the `consul_hook` and attached to workload registration
requests by the `group_service_hook` and `service_hook`.

This requires unreleased updates to Consul's `api` package, so this changeset
includes a temporary `replace` directive in the go.mod file.
2023-09-26 14:13:29 -04:00
Tim Gross
8b8a9f6497 WI: add test to verify we don't allow empty signatures for JWT (#18586)
Encoded JWTs include an `alg` header key that tells the verifier which signature
algorithm to use. Bafflingly, the JWT standard allows a value of `"none"` here
which bypasses signature verification.

In all shipped versions of Nomad, we explicitly configure verification to a
specific algorithm and ignore the header value entirely to avoid this protocol
flaw. But in #18123 we updated our JWT library to `go-jose`, which rightfully
doesn't support `"none"` but this detail isn't encoded anywhere in our code
base. Add a test that ensures we catch any regressions in the library.
2023-09-26 14:09:57 -04:00
Jose Merchan
20f6ec75ef Update consul-connect.mdx (#18575)
The hyperlink points to a non-existing URL. I suggest change it for this one (https://developer.hashicorp.com/consul/docs/install/ports) which at least listed the port 8503 (grpc tls)
2023-09-26 10:04:54 +01:00
Tim Gross
20eadc7b29 config: move Consul getter out of fingerprinter (#18556) 2023-09-22 10:58:39 -04:00
Daniel Bennett
7bd5c6e84e test: Refactor mock CSI manager (#18554)
and MockCSIManager to support the call counting
that csi_hook_test expects

instead of implementing csimanager
interfaces in two separate places:
* client/allocrunner/csi_hook_test
* client/csi_endpoint_test

they can both use the same mocks defined in
client/pluginmanager/csimanager/
alongside the actual implementations of them.

also refactor TestCSINode_DetachVolume
to use use it like Node_ExpandVolume
so we can also test the happy path there
2023-09-21 16:03:53 -05:00
Charlie Voiselle
70fc8df787 [sentinel] Add existing job to enforceSubmitJob (#18553)
* Add existing job to enforceSubmitJob (CE)
* Add changelog
2023-09-21 14:12:51 -04:00
Juana De La Cuesta
72acaf6623 [17449] Introduces a locking mechanism over variables (#18207)
It includes the work over the state store, the PRC server, the HTTP server, the go API package and the CLI's  command. To read more on the actuall functionality, refer to the RFCs [NMD-178] Locking with Nomad Variables and [NMD-179] Leader election using locking mechanism for the Autoscaler.
2023-09-21 17:56:33 +02:00
Piotr Kazmierczak
86d2cdcf80 client: split identity_hook across allocrunner and taskrunner (#18431)
This commit splits identity_hook between the allocrunner and taskrunner. The
allocrunner-level part of the hook signs each task identity, and the
taskrunner-level part picks it up and stores secrets for each task.

The code revamps the WIDMgr, which is now split into 2 interfaces:
IdentityManager which manages renewals of signatures and handles sending
updates to subscribers via Watch method, and IdentitySigner which only does the
signing.

This work is necessary for having a unified Consul login workflow that comes
with the new Consul integration. A new, allocrunner-level consul_hook will now
be the only hook doing Consul authentication.
2023-09-21 17:31:27 +02:00
Phil Renaud
cf8dde0850 [ui] Color indicators for server/client status (#18318)
* Color the status cell for servers and nodes

* Testfix and changelog

* Leader indicator moved post-word

* Icon and badge treatment

* Capitalizing test checks

* HDS badges dont expose statusClass like we used to, so stop checking for it
2023-09-20 17:05:04 -04:00
Tim Gross
d7bd47d60f config: remove consul.template_identity in lieu of task_identity (#18540)
The original thinking for Workload Identity integration with Consul and Vault
was that we'd allow `template` blocks to specify their own identity. But because
the login to Consul/Vault to get tokens happens at the task level, this would
involve making the `template` block a new WID watcher on its own rather than
using the Consul and Vault hooks we're building at the group/task level.

So it doesn't make sense to have separate identities for individual `template`
blocks rather than at the level of tasks. Update the agent configuration to
rename the `template_identity` to the more accurate `task_identity`, which will
be used for any non-service hooks (just `template` today).

Update the implicit identities job mutation hook to create the identity we'll
need as well.
2023-09-20 15:43:08 -04:00
Tim Gross
fdc6c2151d vault: select Vault API client by cluster name (#18533)
Nomad Enterprise will support configuring multiple Vault clients. Instead of
having a single Vault client field in the Nomad client, we'll have a function
that callers can parameterize by the Vault cluster name that returns the
correctly configured Vault API client wrapper.
2023-09-19 14:35:01 -04:00
Tim Gross
fcb9c4a39c job endpoint: implicit constraints for multi-Vault/Consul (#18528)
Update the implicit constraint mutating hook to support multiple Vault and
Consul clusters in Nomad Enterprise. This requires moving the Vault/Consul
mutating hooks earlier in the list as well, because that'll ensure we've
canonicalized properly for multiple clusters.
2023-09-19 12:19:44 -04:00
Daniel Bennett
4895d708b4 csi: implement NodeExpandVolume (#18522)
following ControllerExpandVolume
in c6dbba7cde,
which expands the disk at e.g. a cloud vendor,
the controller plugin may say that we also need
to issue NodeExpandVolume for the node plugin to
make the new disk space available to task(s) that
have claims on the volume by e.g. expanding
the filesystem on the node.

csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume
2023-09-18 10:30:15 -05:00
dependabot[bot]
d564d7811b chore(website/content): update content-conformance version (#17482) 2023-09-18 11:08:51 -04:00
Seth Hoenig
591394fb62 drivers: plumb hardware topology via grpc into drivers (#18504)
* drivers: plumb hardware topology via grpc into drivers

This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.

* cr: use enum instead of bool for core grade

* cr: fix test slit tables to be possible
2023-09-18 08:58:07 -05:00
Tim Gross
b105e41265 job endpoint: reorder check for disabled job registrations (#18523)
When job registrations are disabled, there's no reason to do the potentially
expensive job mutation and admission hooks. Move the ACL resolution and this
check before those hooks.
2023-09-18 09:15:02 -04:00
Tim Gross
5bd8b89c19 helper: reduce size of buffer used by template connections (#18524)
In #12458 we added an in-memory connection buffer so that template runners that
want access to the Nomad API for Service Registration and Variables can
communicate with Nomad without having to create a real HTTP client. The size of
this buffer (1 MiB) was taken directly from its usage in Vault, and each
connection makes 2 such buffers (send and receive). Because each template runner
has its own connection, when there are large numbers of allocations this adds up
to significant memory usage.

The largest Nomad Variable payload is 64KiB, and a small amount of
metadata. Service Registration responses are much smaller, and we don't include
check results in them (as Consul does), so the size is relatively bounded. We
should be able to safely reduce the size of the buffer by a factor of 10 or more
without forcing the template runner to make multiple read calls over the buffer.

Fixes: #18508
2023-09-18 09:12:09 -04:00
Tim Gross
ad4436ffff job endpoint hooks to enforce access to vault/consul clusters (CE) (#18521)
In Nomad Enterprise, namespace rules can control access to Vault and Consul
clusters. Add job endpoint mutating and validating hooks for both Vault and
Consul so that ENT can enforce these namespace rules. This changeset includes
the stub behaviors for CE.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1234
2023-09-15 13:58:37 -04:00
Shantanu Gadgil
f37f84182d docs: example of multiple crons (#18511) 2023-09-15 10:10:56 -04:00
Gerard Nguyen
1339599185 cli: Add prune flag for nomad server force-leave command (#18463)
This feature will help operator to remove a failed/left node from Serf layer immediately
without waiting for 24 hours for the node to be reaped

* Update CLI with prune flag
* Update API /v1/agent/force-leave with prune query string parameter
* Update CLI and API doc
* Add unit test
2023-09-15 08:45:11 -04:00
Shantanu Gadgil
d2dd64f2c4 point to hashicorp's cronexpr (#18510)
point to hashicorp's cronexpr
2023-09-15 09:23:58 +01:00
Luiz Aoqui
5f951d506a docs: update Vault config for workload identity (#18503)
Update documentation for the agent configuration `vault` block for
workload identity support.
2023-09-14 19:38:36 -03:00
Daniel Bennett
c6dbba7cde csi: implement ControllerExpandVolume (#18359)
the first half of volume expansion,
this allows a user to update requested capacity
("capacity_min" and "capacity_max") in a volume
specification file, and re-issue either Register
or Create volume commands (or api calls).

the requested capacity will now be "reconciled"
with the current real capacity of the volume,
issuing a ControllerExpandVolume RPC call
to a running controller plugin, if requested
"capacity_min" is higher than the current
capacity on the volume in state.

csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#controllerexpandvolume

note: this does not yet cover NodeExpandVolume
2023-09-14 14:13:04 -05:00
wrli20
0329393a28 docs: fix link to alicloud autoscaler plugin (#18495) 2023-09-14 09:23:58 -04:00
stswidwinski
bd519dcbf4 Fix for https://github.com/hashicorp/nomad/issues/18493 (#18494)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-09-14 13:35:15 +01:00