Commit Graph

25180 Commits

Author SHA1 Message Date
Seth Hoenig
043b1a95a7 deps: bump go-set/v2 to alpha.3 (#18844)
fixes a rather critical bug in .Equals implementation
2023-10-24 08:23:25 -05:00
James Rasell
b55dcb3967 test: use must lib for bitmap tests. (#18834) 2023-10-24 07:40:02 +01:00
Luiz Aoqui
70b1862026 test: add E2E vaultcompat test for JWT auth flow (#18822)
Test the JWT auth flow using real Nomad and Vault agents.
2023-10-23 20:00:55 -04:00
Tim Gross
1b3920f96b cli: add prefix ID and wildcard namespace support for service info (#18836)
The `nomad service info` command doesn't support using a wildcard namespace with
a prefix match, the way that we do for many other commands. Update the command
to do a prefix match list query for the services before making the get query.

Fixes: #18831
2023-10-23 13:17:51 -04:00
Tim Gross
8a311255a2 docs: Consul Workload Identity integration (#18685)
Documentation updates to support the new Consul integration with Nomad Workload
Identity. Included:

* Added a large section to the Consul integration docs to explain how to set up
  auth methods and binding rules (by hand, assuming we don't ship a `nomad
  setup-consul` tool for now), and how to safely migrate from the existing
  workflow to the new one.
* Move `consul` block out of `group` and onto its own page now that we have it
  available at the `task` scope, and expanded examples of its use.
* Added the `service_identity` and `task_identity` blocks to the Nomad agent
  configuration, and provided a recommended default.
* Added the `identity` block to the `service` block page.
* Added a rough compatibility matrix to the Consul integration page.
2023-10-23 09:17:22 -04:00
Tim Gross
4d9cc73ed2 sids_hook: fix check for Consul token derived from WI (#18821)
The `sids_hook` serves the legacy Connect workflow, and we want to bypass it
when using workload identities. So the hook checks that there's not already a
Consul token in the alloc hook resources derived from the Workload
Identity. This check was looking for the wrong key. This would cause the hook to
ignore the Consul token we already have and then fail to derive a SI token
unless the Nomad agent has its own token with `acl:write` permission.

Fix the lookup and add tests covering the bypass behavior.
2023-10-23 08:57:02 -04:00
Michael Schurter
a806363f6d OpenID Configuration Discovery Endpoint (#18691)
Added the [OIDC Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) `/.well-known/openid-configuration` endpoint to Nomad, but it is only enabled if the `server.oidc_issuer` parameter is set. Documented the parameter, but without a tutorial trying to actually _use_ this will be very hard.

I intentionally did *not* use https://github.com/hashicorp/cap for the OIDC configuration struct because it's built to be a *compliant* OIDC provider. Nomad is *not* trying to be compliant initially because compliance to the spec does not guarantee it will actually satisfy the requirements of third parties. I want to avoid the problem where in an attempt to be standards compliant we ship configuration parameters that lock us in to a certain behavior that we end up regretting. I want to add parameters and behaviors as there's a demonstrable need.

Users always have the escape hatch of providing their own OIDC configuration endpoint. Nomad just needs to know the Issuer so that the JWTs match the OIDC configuration. There's no reason the actual OIDC configuration JSON couldn't live in S3 and get served directly from there. Unlike JWKS the OIDC configuration should be static, or at least change very rarely.

This PR is just the endpoint extracted from #18535. The `RS256` algorithm still needs to be added in hopes of supporting third parties such as [AWS IAM OIDC Provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html).

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-10-20 17:11:41 -07:00
Seth Hoenig
0020139440 core: port common code changes from ENT for numa scheduling (#18818)
Some additional changes were made in the ENT PR to the common code in
support of numa scheduling; this PR copies those changes back to CE.
2023-10-20 13:19:02 -05:00
Luiz Aoqui
6d4b62200b log: add Consul and Vault cluster name to output (#18817)
Ensure Consul and Vault loggers have the cluster name as an attribute to
help differentiate log source.
2023-10-20 14:03:56 -04:00
Phil Renaud
8902afe651 Nomad Actions (#18794)
* Scaffolding actions (#18639)

* Task-level actions for job submissions and retrieval

* FIXME: Temporary workaround to get ember dev server to pass exec through to 4646

* Update api/tasks.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update command/agent/job_endpoint.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Diff and copy implementations

* Action structs get their own file, diff updates to behave like our other diffs

* Test to observe actions changes in a version update

* Tests migrated into structs/diff_test and modified with PR comments in mind

* APIActionToSTructsAction now returns a new value

* de-comment some plain parts, remove unused action lookup

* unused param in action converter

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* New endpoint: job/:id/actions (#18690)

* unused param in action converter

* backing out of parse_job level and moved toward new endpoint level

* Adds taskName and taskGroupName to actions at job level

* Unmodified job mock actions tests

* actionless job test

* actionless job test

* Multi group multi task actions test

* HTTP method check for GET, cleaner errors in job_endpoint_test

* decomment

* Actions aggregated at job model level (#18733)

* Removal of temporary fix to proxy to 4646

* Run Action websocket endpoint (#18760)

* Working demo for review purposes

* removal of cors passthru for websockets

* Remove job_endpoint-specific ws handlers and aimed at existing alloc exec handlers instead

* PR comments adressed, no need for taskGroup pass, better group and task lookups from alloc

* early return in action validate and removed jobid from req args per PR comments

* todo removal, we're checking later in the rpc

* boolean style change on tty

* Action CLI command (#18778)

* Action command init and stuck-notes

* Conditional reqpath to aim at Job action endpoint

* De-logged

* General CLI command cleanup, observe namespace, pass action as string, get random alloc w group adherence

* tab and varname cleanup

* Remove action param from Allocations().Exec calls

* changelog

* dont nil-check acl

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-10-20 13:05:55 -04:00
Seth Hoenig
3e8ebf85f5 lang: add a helper for iterating a map in order (#18809)
In some cases it is helpful to iterate a map in the sorted order of
the maps keyset - particularly in implementations of some function for
which the tests cannot be deterministic without order.
2023-10-20 08:11:35 -05:00
James Rasell
1a0d1efb0d cli: use single dep func for opening URLs. (#18808) 2023-10-20 08:24:11 +01:00
James Rasell
ca9e08e6b5 monitor: add log include location option on monitor CLI and API (#18795) 2023-10-20 07:55:22 +01:00
Tim Gross
f5c5035fde testutil: add ACL bootstrapping to test server configuration (#18811)
Some of our `api` package tests have ACLs enabled, but none of those tests also
run clients and the "wait for the clients to be live" code reads from the Node
API. The caller can't bootstrap ACLs until `NewTestServer` returns, and this
makes for a circular dependency.

Allow developers to provide a bootstrap token to the test server config, and
if it's available, have the server bootstrap the ACL system with it before
checking for live clients.
2023-10-19 16:50:38 -04:00
Seth Hoenig
83720740f5 core: plumbing to support numa aware scheduling (#18681)
* core: plumbing to support numa aware scheduling

* core: apply node resources compatibility upon fsm rstore

Handle the case where an upgraded server dequeus an evaluation before
a client triggers a new fingerprint - which would be needed to cause
the compatibility fix to run. By running the compat fix on restore the
server will immediately have the compatible pseudo topology to use.

* lint: learn how to spell pseudo
2023-10-19 15:09:30 -05:00
Piotr Kazmierczak
0410b8acea client: remove unnecessary debugging from consul client mock (#18807) 2023-10-19 16:23:42 +02:00
Luiz Aoqui
8b9a5fde4e vault: add multi-cluster support on templates (#18790)
In Nomad Enterprise, a task may connect to a non-default Vault cluster,
requiring `consul-template` to be configured with a specific client
`vault` block.
2023-10-18 20:45:01 -04:00
Piotr Kazmierczak
16d71582f6 client: consul_hook tests (#18780)
ref https://github.com/hashicorp/team-nomad/issues/404
2023-10-18 20:02:35 +02:00
Luiz Aoqui
99e54da9a9 ui: fix websocket connections on dev proxy (#18791)
* ui: fix websocket connections on dev proxy

`ember-cli` includes a handler for websocket upgrade requests to start
proxying requests. Handling the same upgrade event by also proxying in
`api.js` causes some kind of conflict where the connection is closed
unexpectedly.

The only change necessary on upgrade is to overwrite the Origin header
so Nomad accepts the connection.

* ui: remove unused variables
2023-10-18 09:50:10 -04:00
Daniel Bennett
b027d8f771 do not embed *Server (#18786)
these structs embedding Server, then Server _also embedding them_,
confused my IDE, isn't necessary, and just feels wrong!
2023-10-17 15:15:00 -05:00
Tim Gross
d0957eb109 Consul: agent config updates for WI (#18774)
This changeset makes two changes:
* Removes the `consul.use_identity` field from the agent configuration. This behavior is properly covered by the presence of `consul.service_identity` / `consul.task_identity` blocks.
* Adds a `consul.task_auth_method` and `consul.service_auth_method` fields to the agent configuration. This allows the cluster administrator to choose specific Consul Auth Method names for their environment, with a reasonable default.
2023-10-17 14:42:14 -04:00
Tim Gross
ac56855f07 consul: add multi-cluster support to client constructors (#18624)
When agents start, they create a shared Consul client that is then wrapped as
various interfaces for testability, and used in constructing the Nomad client
and server. The interfaces that support workload services (rather than the Nomad
agent itself) need to support multiple Consul clusters for Nomad
Enterprise. Update these interfaces to be factory functions that return the
Consul client for a given cluster name. Update the `ServiceClient` to split
workload updates between clusters by creating a wrapper around all the clients
that delegates to the cluster-specific `ServiceClient`.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-10-17 13:46:49 -04:00
Daniel Bennett
8234a422f3 only generate default workload identity once (#18776)
per alloc task. this can save a bit of cpu when
running plans for tasks that already exist,
and prevents Nomad tokens from changing,
which can cause task template{}s to restart
unnecessarily.
2023-10-17 12:31:18 -05:00
modrake
51ffe4208e workaround and fixes for MPL and copywrite bot (#18775) 2023-10-17 08:02:13 +01:00
Luiz Aoqui
349c032369 vault: update task runner vault hook to support workload identity (#18534) 2023-10-16 19:37:57 -04:00
James Rasell
1ffdd576bb agent: add config option to enable file and line log detail. (#18768) 2023-10-16 15:59:16 +01:00
James Rasell
fe0a06e4bc demo: clarify CSI host-path single client limitation. (#18770)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-10-16 15:58:26 +01:00
Tim Gross
cbd7248248 auth: use ACLsDisabledACL when ACLs are disabled (#18754)
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
Ref: https://github.com/hashicorp/nomad/pull/18730
Ref: https://github.com/hashicorp/nomad/pull/18744
2023-10-16 09:30:24 -04:00
Kevin Wang
6dcc402188 chore(docs): update file HCL function (#18696) 2023-10-16 09:03:50 +01:00
Piotr Kazmierczak
299f3bf74b client: use WI-issued consul tokens in the template_hook (#18752)
ref https://github.com/hashicorp/team-nomad/issues/404
2023-10-16 09:39:20 +02:00
dependabot[bot]
cb2363f2fb chore(deps): bump github.com/hashicorp/go-bexpr from 0.1.12 to 0.1.13 (#18758) 2023-10-16 08:21:57 +01:00
Piotr Kazmierczak
b697de9dda client: correct consul block validation in the consul_hook (#18751) 2023-10-13 15:15:04 +02:00
Tim Gross
0931f2ba12 csi: add test for that plugin allocs are filtered by namespace (#18753)
A CSI plugin can be made up of multiple jobs, which may not be in the same
namespace. When querying for a plugin and getting information about the
allocations that implement the plugin, we need to filter by the namespaces the
user has access to.

This test existed in the ENT code base and was never moved over to CE when we
made namespaces part of the CE product.
2023-10-13 09:06:36 -04:00
James Rasell
e02dd2a331 vault: use an importable const for Vault header string. (#18740) 2023-10-13 07:39:06 +01:00
Tim Gross
484f91b893 auth: remove "mixed auth" special casing for Variables endpoint (#18744)
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the third in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch involves leveraging the refactored `auth` package to remove the weird
"mixed auth" helper functions that only support the Variables read/list RPC
handlers. Instead, pass the ACL object and claim together into the
`AllowVariableOperations` method in the usual `acl` package.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
Ref: https://github.com/hashicorp/nomad/pull/18730

Fixes: https://github.com/hashicorp/nomad/issues/15875
2023-10-12 16:43:11 -04:00
Piotr Kazmierczak
91753308b3 WI: set the right identity name for Consul tasks (#18742)
Consul tasks should only have 1 identity of the form consul/{consul_cluster_name}.
2023-10-12 20:34:15 +02:00
Tim Gross
3633ca0f8c auth: add client-only ACL (#18730)
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the third in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch involves creating a new "virtual" ACL object for checking permissions
on client operations and a matching `AuthenticateClientOnly` method for
client-only RPCs that can produce that ACL.

Unlike the server ACLs PR, this also includes a special case for "legacy" client
RPCs where the client was not previously sending the secret as it
should (leaning on mTLS only). Those client RPCs were fixed in Nomad 1.6.0, but
it'll take a while before we can guarantee they'll be present during upgrades.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
2023-10-12 12:21:48 -04:00
dependabot[bot]
cecd9b0472 chore(deps): bump golang.org/x/net from 0.14.0 to 0.17.0 (#18734) 2023-10-12 07:58:59 +01:00
Tim Gross
c7f97722ef consul hook: get WIs only for own task group (#18732)
The WID manager will only sign WI tokens for the allocation's task group. We're
accidentally looping over all the task groups, which for jobs with multiple task
groups results in a failure in the `consul_hook`.
2023-10-11 17:01:28 -04:00
Tim Gross
b39632fa6f testing: fix configuration for retry tests (#18731)
The retry tests in the `api` package set up a client but don't use `NewClient`,
so the address never gets parsed into a `url.URL` and that's causing some test
failures.
2023-10-11 14:06:31 -04:00
Charlie Voiselle
7266d267b0 Add unix domain socket support to API (#16872)
- Expose internal HTTP client's Do() via Raw
- Use URL parser to identify scheme
- Align more with curl output
- Add changelog
- Fix test failure; add tests for socket envvars
- Apply review feedback for tests
- Consolidate address parsing
- Address feedback from code reviews

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-10-11 11:04:12 -04:00
Tim Gross
a92461cdc9 auth: add server-only ACL (#18715)
* auth: add server-only ACL

The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the second in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch involves creating a new "virtual" ACL object for checking permissions
on server operations and a matching `AuthenticateServerOnly` method for
server-only RPCs that can produce that ACL.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
2023-10-11 10:59:31 -04:00
Tim Gross
7ca619fe97 deps: remove Vault SDK (#18725)
Nomad imports the Vault SDK to get testing helpers, but it turns out the only
thing actually in use was a single string constant for the Vault namespace
header. Remove this dependency and hardcode the constant to reduce dependency
churn.
2023-10-11 10:42:09 -04:00
Tim Gross
e22c5b82f3 WID manager: request signed identities for services (#18650)
Includes changes to WID Manager that make it request signed identities for
services, as well as a few improvements to WIHandle introduced in #18672.

---------

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2023-10-11 12:07:16 +02:00
Juana De La Cuesta
70b020e583 server: Rename functions and use iterator function for clarity (#18716) 2023-10-11 09:47:10 +02:00
Tim Gross
635afee376 build: bump to go 1.21.3 (#18717)
Go 1.21.3 fixes an important HTTP2 CVE (see CVE-2023-39325 and
CVE-2023-44487). Nomad does not use HTTP2 and is not vulnerable. However we
should pick up the toolchain bump if for no other reason than we don't have to
answer questions about that.
2023-10-10 16:37:24 -04:00
Luiz Aoqui
ef6814388c cli: remove default for ACL token type on update (#18689)
With a default value set to `client`, the `nomad acl token update`
command can silently downgrade a management token to client on update if
the command does not specify `-type=management` on every update.
2023-10-10 15:51:13 -04:00
Tim Gross
9c2ecbf1d3 auth: refactor Authenticate into its own package (#18703)
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs.

This patchset is the first in a series to eliminate the use of `nil` ACLs as a
sentinel value for when ACLs are disabled. This one is entirely refactoring to
reduce the burden of reviewing the final patchsets that have the functional
changes:

* Move RPC auth into a new `nomad/auth` package, injecting the dependencies
  required from the server. Expose only those public methods on `nomad/auth`
  that are intended for use in the RPC handlers.
* Keep the existing large authentication test as an integration test.
* Add unit tests covering the methods of `nomad/auth` we intend on keeping. The
  assertions for many of these will change once we have no `nil` sentinels and
  can make safe assertions about permissions on the resulting `ACL` objects.
2023-10-10 11:01:24 -04:00
James Rasell
9c57ddd838 core: add preempt to desired updates stringer function return. (#18702) 2023-10-10 09:55:18 +01:00
dependabot[bot]
9a38a9c188 chore(deps): bump github.com/docker/cli (#18565) 2023-10-10 09:12:32 +01:00