46 Commits

Author SHA1 Message Date
Piotr Kazmierczak
648bacda77 testing: migrate nomad/scheduler off of testify (#25968)
In the spirit of #25909, this PR removes testify dependencies from the scheduler
package, along with reflect.DeepEqual removal. This is again a combination of
semgrep and hx editing magic.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-06-04 09:29:28 +02:00
Tim Gross
cfe6349378 testing: migrate nomad/state off testify (#25909)
We've been gradually migrating from `testify` to `shoenig/test` on a
test-by-test basis. While working on a large refactoring in the state store, I
found this to create a lot of diffs incidental to the refactoring.

In this changeset, I've used a prototype collection of semgrep fix rules to
autofix most of the uses of testify in the `nomad/state` package. Then I went in
manually and fixed any resulting problems, as well as a few minor test bugs that
`shoenig/test` catches and `testify` does not because of its API. I've also
added a semgrep rule for marking a package as "testify clean", so that we don't
accidentally add it back to any package we manage to remove it from going
forward.

While I'm here, I've removed most of the uses of `reflect.DeepEqual` in the
tests as well as cleaned up some older idioms that Go has nicer syntax for now.
2025-05-22 09:18:46 -04:00
Tim Gross
b6d9424c4b semgrep: adjust forbidden package rule for regex matches (#25904)
We have several semgrep rules forbidding imports of packages we don't
want. While testing out a new rule I discovered that the rule we have is
completely ineffective. Update the rule to detect imports using the Go language
plugin, including regex matching on some packages where it's forbidden to import
the root but fine to import a subpackage or different version.

The go-set import rule is an example of one where our `go-set/v3` imports fails
the re-written check unless we use the regex syntax. If you replace the pattern
rule with `import "=~/github.com\/hashicorp\/go-set/v3$/"` it would fail.
2025-05-20 16:39:24 -04:00
James Rasell
2eb35a4678 build: Update Go to v1.24.1 (#25249) 2025-03-06 10:33:14 +00:00
James Rasell
d8841e011f semgrep: Fix invalid RPC rule and add validation GHA workflow. (#25088) 2025-02-12 09:44:27 +00:00
Charlie Voiselle
30ab8897d2 deps: Switch from mitchellh/cli to hashicorp/cli (#19321)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2024-12-19 15:41:11 +00:00
Tim Gross
ce04fe4a4e acls: reduce permissions of client agent virtual policy (#23304)
Nomad client agents run as privileged processes and require access to much of
the cluster state, secrets, etc. to operate. But we can improve upon this by
tightening up the virtual policy that use for RPC requests authenticated by the
node secret ID. This changeset removes the `node:read`, `plugin:read`, and
`plugin:list` policy, as well as namespace operations. In return, we add a
`AllowClientOp` check to the RPCs the client uses that would otherwise need
those policies.

Where possible, the update RPCs have also been changed to match on node ID so
that a client can only make the RPC that impacts itself. In future work, we may
be able to downscope further by adding node pool filtering to `AllowClientOp`.

Ref: https://github.com/hashicorp/nomad-enterprise/issues/1528
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1529
Ref: https://hashicorp.atlassian.net/browse/NET-9925
2024-06-12 11:32:22 -04:00
Tim Gross
ea5f2f6748 acl: remove remaining unused nil ACL object handling (#20456)
As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the
object returned by ResolveACL if there's no error return, because in the case
where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking
nil is not a bug but should be discouraged because it opens us up to future bugs
that would bypass ACLs.

We fixed a bunch of these cases in https://github.com/hashicorp/nomad/pull/20150
but I didn't update the semgrep rule, which meant we missed a few more. Update
the semgrep rule and fix the remaining cases.
2024-04-18 14:34:17 -04:00
Seth Hoenig
ae6c4c8e3f deps: purge use of old x/exp packages (#20373) 2024-04-12 08:29:00 -05:00
Luiz Aoqui
41277f823f license: fix some imports of BUSL-1.1 in MPL-2.0 (#19832)
Some packages licensed under MPL-2.0 were incorrectly importing code
from packages licensed under BUSL-1.1.

Not all imports are fixed here as they will require additional work to
untangle them. To help track progress this commit adds a Semgrep rule
that detects incorrect BUSL-1.1 imports in MPL-2.0 packages.
2024-01-29 12:04:12 -05:00
Seth Hoenig
afac9d10dd deps: purge and prohibit use of go-set/v1 (#18869) 2023-10-26 08:56:43 -05:00
Michael Schurter
a806363f6d OpenID Configuration Discovery Endpoint (#18691)
Added the [OIDC Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) `/.well-known/openid-configuration` endpoint to Nomad, but it is only enabled if the `server.oidc_issuer` parameter is set. Documented the parameter, but without a tutorial trying to actually _use_ this will be very hard.

I intentionally did *not* use https://github.com/hashicorp/cap for the OIDC configuration struct because it's built to be a *compliant* OIDC provider. Nomad is *not* trying to be compliant initially because compliance to the spec does not guarantee it will actually satisfy the requirements of third parties. I want to avoid the problem where in an attempt to be standards compliant we ship configuration parameters that lock us in to a certain behavior that we end up regretting. I want to add parameters and behaviors as there's a demonstrable need.

Users always have the escape hatch of providing their own OIDC configuration endpoint. Nomad just needs to know the Issuer so that the JWTs match the OIDC configuration. There's no reason the actual OIDC configuration JSON couldn't live in S3 and get served directly from there. Unlike JWKS the OIDC configuration should be static, or at least change very rarely.

This PR is just the endpoint extracted from #18535. The `RS256` algorithm still needs to be added in hopes of supporting third parties such as [AWS IAM OIDC Provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html).

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-10-20 17:11:41 -07:00
Tim Gross
cbd7248248 auth: use ACLsDisabledACL when ACLs are disabled (#18754)
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
Ref: https://github.com/hashicorp/nomad/pull/18730
Ref: https://github.com/hashicorp/nomad/pull/18744
2023-10-16 09:30:24 -04:00
Tim Gross
3633ca0f8c auth: add client-only ACL (#18730)
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the third in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch involves creating a new "virtual" ACL object for checking permissions
on client operations and a matching `AuthenticateClientOnly` method for
client-only RPCs that can produce that ACL.

Unlike the server ACLs PR, this also includes a special case for "legacy" client
RPCs where the client was not previously sending the secret as it
should (leaning on mTLS only). Those client RPCs were fixed in Nomad 1.6.0, but
it'll take a while before we can guarantee they'll be present during upgrades.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
2023-10-12 12:21:48 -04:00
Tim Gross
a92461cdc9 auth: add server-only ACL (#18715)
* auth: add server-only ACL

The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the second in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch involves creating a new "virtual" ACL object for checking permissions
on server operations and a matching `AuthenticateServerOnly` method for
server-only RPCs that can produce that ACL.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
2023-10-11 10:59:31 -04:00
James Rasell
c43dcb4bf8 ci: ensure semgrep tests all state store funcs for FSM time rule. (#18315) 2023-08-24 15:08:53 +01:00
Seth Hoenig
f5b0da1d55 all: swap exp packages for maps, slices (#18311) 2023-08-23 15:42:13 -05:00
Seth Hoenig
d9341f0664 update go1.21 (#18184)
* build: update to go1.21

* go: eliminate helpers in favor of min/max

* build: run go mod tidy

* build: swap depguard for semgrep

* command: fixup broken tls error check on go1.21
2023-08-14 08:43:27 -05:00
hashicorp-copywrite[bot]
f2acbdb49b Update copyright file headers to BUSL-1.1 2023-08-10 17:27:09 -05:00
Michael Schurter
d14362ec19 core: add jwks rpc and http api (#18035)
Add JWKS endpoint to HTTP API for exposing the root public signing keys used for signing workload identity JWTs.

Part 1 of N components as part of making workload identities consumable by third party services such as Consul and Vault. Identity attenuation (audience) and expiration (+renewal) are necessary to securely use workload identities with 3rd parties, so this merge does not yet document this endpoint.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-07-27 11:27:17 -07:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Piotr Kazmierczak
a16d3a1126 acl: RPC endpoints for JWT auth (#15918) 2023-03-30 09:39:56 +02:00
Michael Schurter
542b23e999 Accept Workload Identities for Client RPCs (#16254)
This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs.

Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled.

---------

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-02-27 10:17:47 -08:00
Tim Gross
ce614bf30b tests: don't mutate global structs in core scheduler tests (#16120)
Some of the core scheduler tests need the maximum batch size for writes to be
smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by
unsafely modifying the global struct, which creates test flakes in other tests.

Modify the functions under test to take a batch size parameter. Production code
will pass the global while the tests can inject smaller values. Turn the
`structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for
avoiding this kind of thing in the future.
2023-02-10 09:26:00 -05:00
Tim Gross
e53b591582 metrics: Add remaining server RPC rate metrics (#15901) 2023-01-27 08:29:53 -05:00
Tim Gross
4ba836bbd6 metrics: Add RPC rate metrics to endpoints that validate TLS names (#15900) 2023-01-26 15:04:25 -05:00
Tim Gross
11af1259c4 WI: allow workloads to use RPCs associated with HTTP API (#15870)
This changeset allows Workload Identities to authenticate to all the RPCs that
support HTTP API endpoints, for use with PR #15864.

* Extends the work done for pre-forwarding authentication to all RPCs that
  support a HTTP API endpoint.
* Consolidates the auth helpers used by the CSI, Service Registration, and Node
  endpoints that are currently used to support both tokens and client secrets.

Intentionally excluded from this changeset:
* The Variables endpoint still has custom handling because of the implicit
  policies. Ideally we'll figure out an efficient way to resolve those into real
  policies and then we can get rid of that custom handling.
* The RPCs that don't currently support auth tokens (i.e. those that don't
  support HTTP endpoints) have not been updated with the new pre-forwarding auth
  We'll be doing this under a separate PR to support RPC rate metrics.
2023-01-25 14:33:06 -05:00
James Rasell
54cc797894 ci: add semgrep update for known OIDC unauthenticated RPCs. 2023-01-18 10:18:35 +00:00
Tim Gross
cab35b3b1c Authenticate method improvements (#15734)
This changeset covers a sidebar discussion that @schmichael and I had around the
design for pre-forwarding auth. This includes some changes extracted out of
#15513 to make it easier to review both and leave a clean history.

* Remove fast path for NodeID. Previously-connected clients will have a NodeID
  set on the context, and because this is a large portion of the RPCs sent we
  fast-pathed it at the top of the `Authenticate` method. But the context is
  shared for all yamux streams over the same yamux session (and TCP
  connection). This lets an authenticated HTTP request to a client use the
  NodeID for authentication, which is a privilege escalation. Remove the fast
  path and annotate it so that we don't break it again.

* Add context to decisions around AuthenticatedIdentity. The `Authenticate`
  method taken on its own looks like it wants to return an `acl.ACL` that folds
  over all the various identity types (creating an ephemeral ACL on the fly if
  neccessary). But keeping these fields idependent allows RPC handlers to
  differentiate between internal and external origins so we most likely want to
  avoid this. Leave some docstrings as a warning as to why this is built the way
  it is.

* Mutate the request rather than returning. When reviewing #15513 we decided
  that forcing the request handler to call `SetIdentity` was repetitive and
  error prone. Instead, the `Authenticate` method mutates the request by setting
  its `AuthenticatedIdentity`.
2023-01-10 09:46:38 -05:00
Tim Gross
47c2d4ab34 Pre forwarding authentication (#15417)
Upcoming work to instrument the rate of RPC requests by consumer (and eventually
rate limit) require that we authenticate a RPC request before forwarding. Add a
new top-level `Authenticate` method to the server and have it return an
`AuthenticatedIdentity` struct. RPC handlers will use the relevant fields of
this identity for performing authorization.

This changeset includes:
* The main implementation of `Authenticate`
* Provide a new RPC `ACL.WhoAmI` for debugging authentication. This endpoint
  returns the same `AuthenticatedIdentity` that will be used by RPC handlers. At
  some point we might want to give this an equivalent HTTP endpoint but I didn't
  want to add that to our public API until some of the other Workload Identity
  work is solidified, especially if we don't need it yet.
* A full coverage test of the `Authenticate` method. This sets up two server
  nodes with mTLS and ACLs, some tokens, and some allocations with workload
  identities.
* Wire up an example of using `Authenticate` in the `Namespace.Upsert` RPC and
  see how authorization happens after forwarding.
* A new semgrep rule for `Authenticate`, which we'll need to update once we're
  ready to wire up more RPC endpoints with authorization steps.
2022-12-06 14:44:03 -05:00
James Rasell
faabc2b2c2 api: ensure ACL role upsert decode error returns a 400 status code. (#15253) 2022-11-18 17:47:43 +01:00
Tim Gross
f1f684400f variables: fix filter on List RPC
The List RPC correctly authorized against the prefix argument. But when
filtering results underneath the prefix, it only checked authorization for
standard ACL tokens and not Workload Identity. This results in WI tokens being
able to read List results (metadata only: variable paths and timestamps) for
variables under the `nomad/` prefix that belong to other jobs in the same
namespace.

Fixes the filtering and split the `handleMixedAuthEndpoint` function into
separate authentication and authorization steps so that we don't need to
re-verify the claim token on each filtered object.

Also includes:
* update semgrep rule for mixed auth endpoints
* variables: List returns empty set when all results are filtered
2022-10-27 13:08:05 -04:00
Tim Gross
5a9e0625b0 semgrep: add MeasureSinceWithLabels to FSM time rule (#14812)
Metrics state is local to the server and needs to use time, which is normally
forbidden in the FSM code. We have a bypass for this rule for
`metrics.MeasureSince` but needed one for `metrics.MeasureSinceWithLabels` as well.
2022-10-06 10:59:53 -04:00
Michael Schurter
1bc5c718b4 Data race fixes in tests and a new semgrep rule (#14594)
* test: don't use loop vars in goroutines

fixes a data race in the test

* test: copy objects in statestore before mutating

fixes data race in test

* test: @lgfa29's segmgrep rule for loops/goroutines

Found 2 places where we were improperly using loop variables inside
goroutines.
2022-09-15 10:35:08 -07:00
James Rasell
c67fd40084 api: use errors.New not fmt.Errorf when error doesn't have format. (#14027)
* api: use errors.New not fmt.Errorf when error doesn't have format.

* semgrep: add rule to catch fmt.Errorf use without formatting.
2022-08-05 17:05:47 +02:00
Michael Schurter
be2262eb22 Add semgrep rule to catch non-determinism in FSM (#13725)
See `message:` in rule for details.

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-07-12 15:44:24 -07:00
Tim Gross
d3e9b9ac7e workload identity (#13223)
In order to support implicit ACL policies for tasks to get their own
secrets, each task would need to have its own ACL token. This would
add extra raft overhead as well as new garbage collection jobs for
cleaning up task-specific ACL tokens. Instead, Nomad will create a
workload Identity Claim for each task.

An Identity Claim is a JSON Web Token (JWT) signed by the server’s
private key and attached to an Allocation at the time a plan is
applied. The encoded JWT can be submitted as the X-Nomad-Token header
to replace ACL token secret IDs for the RPCs that support identity
claims.

Whenever a key is is added to a server’s keyring, it will use the key
as the seed for a Ed25519 public-private private keypair. That keypair
will be used for signing the JWT and for verifying the JWT.

This implementation is a ruthlessly minimal approach to support the
secure variables feature. When a JWT is verified, the allocation ID
will be checked against the Nomad state store, and non-existent or
terminal allocation IDs will cause the validation to be rejected. This
is sufficient to support the secure variables feature at launch
without requiring implementation of a background process to renew
soon-to-expire tokens.
2022-07-11 13:34:05 -04:00
Luiz Aoqui
9849ceb0bf ci: add semgrep rule to catch usage of invalid string extensions (#12509) 2022-04-08 10:58:32 -04:00
Luiz Aoqui
dfe185467e ci: fix semgrep rule for RPC authentication 2022-03-25 12:00:48 -04:00
Seth Hoenig
fec8d6e030 ci: do not exclude Parallel semgrep rule 2022-03-17 13:45:56 -05:00
Seth Hoenig
ae21af4f9b ci: semgrep rule for parallel tests
Adds a semgrep rule warning about using ci.Parallel instead of t.Parallel
2022-03-17 08:43:37 -05:00
Luiz Aoqui
c3a4abc1ac ci: disable Go test semgrep rules (#12175) 2022-03-02 20:30:27 -05:00
Luiz Aoqui
29ffa02683 fix mTLS certificate check on agent to agent RPCs (#11998)
PR #11956 implemented a new mTLS RPC check to validate the role of the
certificate used in the request, but further testing revealed two flaws:

  1. client-only endpoints did not accept server certificates so the
     request would fail when forwarded from one server to another.
  2. the certificate was being checked after the request was forwarded,
     so the check would happen over the server certificate, not the
     actual source.

This commit checks for the desired mTLS level, where the client level
accepts both, a server or a client certificate. It also validates the
cercertificate before the request is forwarded.
2022-02-04 20:35:20 -05:00
Luiz Aoqui
290bd0d521 add semgrep rule to check for potential time.After leaks (#12001) 2022-02-03 17:33:07 -05:00
Luiz Aoqui
c613dc5d2c Verify TLS certificate on endpoints that are used between agents only (#11956) 2022-02-02 15:03:18 -05:00
Luiz Aoqui
f657529831 ci: add semgrep (#11934) 2022-01-26 16:32:47 -05:00