nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Piotr Kazmierczak	648bacda77	testing: migrate nomad/scheduler off of testify (#25968 ) In the spirit of #25909, this PR removes testify dependencies from the scheduler package, along with reflect.DeepEqual removal. This is again a combination of semgrep and hx editing magic. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-04 09:29:28 +02:00
Tim Gross	cfe6349378	testing: migrate nomad/state off testify (#25909 ) We've been gradually migrating from `testify` to `shoenig/test` on a test-by-test basis. While working on a large refactoring in the state store, I found this to create a lot of diffs incidental to the refactoring. In this changeset, I've used a prototype collection of semgrep fix rules to autofix most of the uses of testify in the `nomad/state` package. Then I went in manually and fixed any resulting problems, as well as a few minor test bugs that `shoenig/test` catches and `testify` does not because of its API. I've also added a semgrep rule for marking a package as "testify clean", so that we don't accidentally add it back to any package we manage to remove it from going forward. While I'm here, I've removed most of the uses of `reflect.DeepEqual` in the tests as well as cleaned up some older idioms that Go has nicer syntax for now.	2025-05-22 09:18:46 -04:00
Tim Gross	b6d9424c4b	semgrep: adjust forbidden package rule for regex matches (#25904 ) We have several semgrep rules forbidding imports of packages we don't want. While testing out a new rule I discovered that the rule we have is completely ineffective. Update the rule to detect imports using the Go language plugin, including regex matching on some packages where it's forbidden to import the root but fine to import a subpackage or different version. The go-set import rule is an example of one where our `go-set/v3` imports fails the re-written check unless we use the regex syntax. If you replace the pattern rule with `import "=~/github.com\/hashicorp\/go-set/v3$/"` it would fail.	2025-05-20 16:39:24 -04:00
James Rasell	2eb35a4678	build: Update Go to v1.24.1 (#25249 )	2025-03-06 10:33:14 +00:00
James Rasell	d8841e011f	semgrep: Fix invalid RPC rule and add validation GHA workflow. (#25088 )	2025-02-12 09:44:27 +00:00
Charlie Voiselle	30ab8897d2	deps: Switch from mitchellh/cli to hashicorp/cli (#19321 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2024-12-19 15:41:11 +00:00
Tim Gross	ce04fe4a4e	acls: reduce permissions of client agent virtual policy (#23304 ) Nomad client agents run as privileged processes and require access to much of the cluster state, secrets, etc. to operate. But we can improve upon this by tightening up the virtual policy that use for RPC requests authenticated by the node secret ID. This changeset removes the `node:read`, `plugin:read`, and `plugin:list` policy, as well as namespace operations. In return, we add a `AllowClientOp` check to the RPCs the client uses that would otherwise need those policies. Where possible, the update RPCs have also been changed to match on node ID so that a client can only make the RPC that impacts itself. In future work, we may be able to downscope further by adding node pool filtering to `AllowClientOp`. Ref: https://github.com/hashicorp/nomad-enterprise/issues/1528 Ref: https://github.com/hashicorp/nomad-enterprise/pull/1529 Ref: https://hashicorp.atlassian.net/browse/NET-9925	2024-06-12 11:32:22 -04:00
Tim Gross	ea5f2f6748	acl: remove remaining unused nil ACL object handling (#20456 ) As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the object returned by ResolveACL if there's no error return, because in the case where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking nil is not a bug but should be discouraged because it opens us up to future bugs that would bypass ACLs. We fixed a bunch of these cases in https://github.com/hashicorp/nomad/pull/20150 but I didn't update the semgrep rule, which meant we missed a few more. Update the semgrep rule and fix the remaining cases.	2024-04-18 14:34:17 -04:00
Seth Hoenig	ae6c4c8e3f	deps: purge use of old x/exp packages (#20373 )	2024-04-12 08:29:00 -05:00
Luiz Aoqui	41277f823f	license: fix some imports of BUSL-1.1 in MPL-2.0 (#19832 ) Some packages licensed under MPL-2.0 were incorrectly importing code from packages licensed under BUSL-1.1. Not all imports are fixed here as they will require additional work to untangle them. To help track progress this commit adds a Semgrep rule that detects incorrect BUSL-1.1 imports in MPL-2.0 packages.	2024-01-29 12:04:12 -05:00
Seth Hoenig	afac9d10dd	deps: purge and prohibit use of go-set/v1 (#18869 )	2023-10-26 08:56:43 -05:00
Michael Schurter	a806363f6d	OpenID Configuration Discovery Endpoint (#18691 ) Added the [OIDC Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) `/.well-known/openid-configuration` endpoint to Nomad, but it is only enabled if the `server.oidc_issuer` parameter is set. Documented the parameter, but without a tutorial trying to actually _use_ this will be very hard. I intentionally did not use https://github.com/hashicorp/cap for the OIDC configuration struct because it's built to be a compliant OIDC provider. Nomad is not trying to be compliant initially because compliance to the spec does not guarantee it will actually satisfy the requirements of third parties. I want to avoid the problem where in an attempt to be standards compliant we ship configuration parameters that lock us in to a certain behavior that we end up regretting. I want to add parameters and behaviors as there's a demonstrable need. Users always have the escape hatch of providing their own OIDC configuration endpoint. Nomad just needs to know the Issuer so that the JWTs match the OIDC configuration. There's no reason the actual OIDC configuration JSON couldn't live in S3 and get served directly from there. Unlike JWKS the OIDC configuration should be static, or at least change very rarely. This PR is just the endpoint extracted from #18535. The `RS256` algorithm still needs to be added in hopes of supporting third parties such as [AWS IAM OIDC Provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html). Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-10-20 17:11:41 -07:00
Tim Gross	cbd7248248	auth: use `ACLsDisabledACL` when ACLs are disabled (#18754 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the final patch in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch adds a new virtual ACL policy field for when ACLs are disabled and updates our authentication logic to use it. Included: * Extends auth package tests to demonstrate that nil ACLs are treated as failed auth and disabled ACLs succeed auth. * Adds a new `AllowDebug` ACL check for the weird special casing we have for pprof debugging when ACLs are disabled. * Removes the remaining unexported methods (and repeated tests) from the `nomad/acl.go` file. * Update the semgrep rules to detect improper nil ACL checking and remove the old invalid ACL checks. * Update the contributing guide for RPC authentication. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703 Ref: https://github.com/hashicorp/nomad/pull/18715 Ref: https://github.com/hashicorp/nomad/pull/16799 Ref: https://github.com/hashicorp/nomad/pull/18730 Ref: https://github.com/hashicorp/nomad/pull/18744	2023-10-16 09:30:24 -04:00
Tim Gross	3633ca0f8c	auth: add client-only ACL (#18730 ) The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the third in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch involves creating a new "virtual" ACL object for checking permissions on client operations and a matching `AuthenticateClientOnly` method for client-only RPCs that can produce that ACL. Unlike the server ACLs PR, this also includes a special case for "legacy" client RPCs where the client was not previously sending the secret as it should (leaning on mTLS only). Those client RPCs were fixed in Nomad 1.6.0, but it'll take a while before we can guarantee they'll be present during upgrades. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703 Ref: https://github.com/hashicorp/nomad/pull/18715 Ref: https://github.com/hashicorp/nomad/pull/16799	2023-10-12 12:21:48 -04:00
Tim Gross	a92461cdc9	auth: add server-only ACL (#18715 ) * auth: add server-only ACL The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By using `nil` as a sentinel value, we have the risk of nil pointer exceptions and improper handling of `nil` when returned from our various auth methods that can lead to privilege escalation bugs. This is the second in a series to eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled. This patch involves creating a new "virtual" ACL object for checking permissions on server operations and a matching `AuthenticateServerOnly` method for server-only RPCs that can produce that ACL. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218 Ref: https://github.com/hashicorp/nomad/pull/18703	2023-10-11 10:59:31 -04:00
James Rasell	c43dcb4bf8	ci: ensure semgrep tests all state store funcs for FSM time rule. (#18315 )	2023-08-24 15:08:53 +01:00
Seth Hoenig	f5b0da1d55	all: swap exp packages for maps, slices (#18311 )	2023-08-23 15:42:13 -05:00
Seth Hoenig	d9341f0664	update go1.21 (#18184 ) * build: update to go1.21 * go: eliminate helpers in favor of min/max * build: run go mod tidy * build: swap depguard for semgrep * command: fixup broken tls error check on go1.21	2023-08-14 08:43:27 -05:00
hashicorp-copywrite[bot]	f2acbdb49b	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:09 -05:00
Michael Schurter	d14362ec19	core: add jwks rpc and http api (#18035 ) Add JWKS endpoint to HTTP API for exposing the root public signing keys used for signing workload identity JWTs. Part 1 of N components as part of making workload identities consumable by third party services such as Consul and Vault. Identity attenuation (audience) and expiration (+renewal) are necessary to securely use workload identities with 3rd parties, so this merge does not yet document this endpoint. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-07-27 11:27:17 -07:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Piotr Kazmierczak	a16d3a1126	acl: RPC endpoints for JWT auth (#15918 )	2023-03-30 09:39:56 +02:00
Michael Schurter	542b23e999	Accept Workload Identities for Client RPCs (#16254 ) This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs. Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled. --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-02-27 10:17:47 -08:00
Tim Gross	ce614bf30b	tests: don't mutate global structs in core scheduler tests (#16120 ) Some of the core scheduler tests need the maximum batch size for writes to be smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by unsafely modifying the global struct, which creates test flakes in other tests. Modify the functions under test to take a batch size parameter. Production code will pass the global while the tests can inject smaller values. Turn the `structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for avoiding this kind of thing in the future.	2023-02-10 09:26:00 -05:00
Tim Gross	e53b591582	metrics: Add remaining server RPC rate metrics (#15901 )	2023-01-27 08:29:53 -05:00
Tim Gross	4ba836bbd6	metrics: Add RPC rate metrics to endpoints that validate TLS names (#15900 )	2023-01-26 15:04:25 -05:00
Tim Gross	11af1259c4	WI: allow workloads to use RPCs associated with HTTP API (#15870 ) This changeset allows Workload Identities to authenticate to all the RPCs that support HTTP API endpoints, for use with PR #15864. * Extends the work done for pre-forwarding authentication to all RPCs that support a HTTP API endpoint. * Consolidates the auth helpers used by the CSI, Service Registration, and Node endpoints that are currently used to support both tokens and client secrets. Intentionally excluded from this changeset: * The Variables endpoint still has custom handling because of the implicit policies. Ideally we'll figure out an efficient way to resolve those into real policies and then we can get rid of that custom handling. * The RPCs that don't currently support auth tokens (i.e. those that don't support HTTP endpoints) have not been updated with the new pre-forwarding auth We'll be doing this under a separate PR to support RPC rate metrics.	2023-01-25 14:33:06 -05:00
James Rasell	54cc797894	ci: add semgrep update for known OIDC unauthenticated RPCs.	2023-01-18 10:18:35 +00:00
Tim Gross	cab35b3b1c	Authenticate method improvements (#15734 ) This changeset covers a sidebar discussion that @schmichael and I had around the design for pre-forwarding auth. This includes some changes extracted out of #15513 to make it easier to review both and leave a clean history. * Remove fast path for NodeID. Previously-connected clients will have a NodeID set on the context, and because this is a large portion of the RPCs sent we fast-pathed it at the top of the `Authenticate` method. But the context is shared for all yamux streams over the same yamux session (and TCP connection). This lets an authenticated HTTP request to a client use the NodeID for authentication, which is a privilege escalation. Remove the fast path and annotate it so that we don't break it again. * Add context to decisions around AuthenticatedIdentity. The `Authenticate` method taken on its own looks like it wants to return an `acl.ACL` that folds over all the various identity types (creating an ephemeral ACL on the fly if neccessary). But keeping these fields idependent allows RPC handlers to differentiate between internal and external origins so we most likely want to avoid this. Leave some docstrings as a warning as to why this is built the way it is. * Mutate the request rather than returning. When reviewing #15513 we decided that forcing the request handler to call `SetIdentity` was repetitive and error prone. Instead, the `Authenticate` method mutates the request by setting its `AuthenticatedIdentity`.	2023-01-10 09:46:38 -05:00
Tim Gross	47c2d4ab34	Pre forwarding authentication (#15417 ) Upcoming work to instrument the rate of RPC requests by consumer (and eventually rate limit) require that we authenticate a RPC request before forwarding. Add a new top-level `Authenticate` method to the server and have it return an `AuthenticatedIdentity` struct. RPC handlers will use the relevant fields of this identity for performing authorization. This changeset includes: * The main implementation of `Authenticate` * Provide a new RPC `ACL.WhoAmI` for debugging authentication. This endpoint returns the same `AuthenticatedIdentity` that will be used by RPC handlers. At some point we might want to give this an equivalent HTTP endpoint but I didn't want to add that to our public API until some of the other Workload Identity work is solidified, especially if we don't need it yet. * A full coverage test of the `Authenticate` method. This sets up two server nodes with mTLS and ACLs, some tokens, and some allocations with workload identities. * Wire up an example of using `Authenticate` in the `Namespace.Upsert` RPC and see how authorization happens after forwarding. * A new semgrep rule for `Authenticate`, which we'll need to update once we're ready to wire up more RPC endpoints with authorization steps.	2022-12-06 14:44:03 -05:00
James Rasell	faabc2b2c2	api: ensure ACL role upsert decode error returns a 400 status code. (#15253 )	2022-11-18 17:47:43 +01:00
Tim Gross	f1f684400f	variables: fix filter on List RPC The List RPC correctly authorized against the prefix argument. But when filtering results underneath the prefix, it only checked authorization for standard ACL tokens and not Workload Identity. This results in WI tokens being able to read List results (metadata only: variable paths and timestamps) for variables under the `nomad/` prefix that belong to other jobs in the same namespace. Fixes the filtering and split the `handleMixedAuthEndpoint` function into separate authentication and authorization steps so that we don't need to re-verify the claim token on each filtered object. Also includes: * update semgrep rule for mixed auth endpoints * variables: List returns empty set when all results are filtered	2022-10-27 13:08:05 -04:00
Tim Gross	5a9e0625b0	semgrep: add MeasureSinceWithLabels to FSM time rule (#14812 ) Metrics state is local to the server and needs to use time, which is normally forbidden in the FSM code. We have a bypass for this rule for `metrics.MeasureSince` but needed one for `metrics.MeasureSinceWithLabels` as well.	2022-10-06 10:59:53 -04:00
Michael Schurter	1bc5c718b4	Data race fixes in tests and a new semgrep rule (#14594 ) * test: don't use loop vars in goroutines fixes a data race in the test * test: copy objects in statestore before mutating fixes data race in test * test: @lgfa29's segmgrep rule for loops/goroutines Found 2 places where we were improperly using loop variables inside goroutines.	2022-09-15 10:35:08 -07:00
James Rasell	c67fd40084	api: use errors.New not fmt.Errorf when error doesn't have format. (#14027 ) * api: use errors.New not fmt.Errorf when error doesn't have format. * semgrep: add rule to catch fmt.Errorf use without formatting.	2022-08-05 17:05:47 +02:00
Michael Schurter	be2262eb22	Add semgrep rule to catch non-determinism in FSM (#13725 ) See `message:` in rule for details. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-07-12 15:44:24 -07:00
Tim Gross	d3e9b9ac7e	workload identity (#13223 ) In order to support implicit ACL policies for tasks to get their own secrets, each task would need to have its own ACL token. This would add extra raft overhead as well as new garbage collection jobs for cleaning up task-specific ACL tokens. Instead, Nomad will create a workload Identity Claim for each task. An Identity Claim is a JSON Web Token (JWT) signed by the server’s private key and attached to an Allocation at the time a plan is applied. The encoded JWT can be submitted as the X-Nomad-Token header to replace ACL token secret IDs for the RPCs that support identity claims. Whenever a key is is added to a server’s keyring, it will use the key as the seed for a Ed25519 public-private private keypair. That keypair will be used for signing the JWT and for verifying the JWT. This implementation is a ruthlessly minimal approach to support the secure variables feature. When a JWT is verified, the allocation ID will be checked against the Nomad state store, and non-existent or terminal allocation IDs will cause the validation to be rejected. This is sufficient to support the secure variables feature at launch without requiring implementation of a background process to renew soon-to-expire tokens.	2022-07-11 13:34:05 -04:00
Luiz Aoqui	9849ceb0bf	ci: add semgrep rule to catch usage of invalid string extensions (#12509 )	2022-04-08 10:58:32 -04:00
Luiz Aoqui	dfe185467e	ci: fix semgrep rule for RPC authentication	2022-03-25 12:00:48 -04:00
Seth Hoenig	fec8d6e030	ci: do not exclude Parallel semgrep rule	2022-03-17 13:45:56 -05:00
Seth Hoenig	ae21af4f9b	ci: semgrep rule for parallel tests Adds a semgrep rule warning about using ci.Parallel instead of t.Parallel	2022-03-17 08:43:37 -05:00
Luiz Aoqui	c3a4abc1ac	ci: disable Go test semgrep rules (#12175 )	2022-03-02 20:30:27 -05:00
Luiz Aoqui	29ffa02683	fix mTLS certificate check on agent to agent RPCs (#11998 ) PR #11956 implemented a new mTLS RPC check to validate the role of the certificate used in the request, but further testing revealed two flaws: 1. client-only endpoints did not accept server certificates so the request would fail when forwarded from one server to another. 2. the certificate was being checked after the request was forwarded, so the check would happen over the server certificate, not the actual source. This commit checks for the desired mTLS level, where the client level accepts both, a server or a client certificate. It also validates the cercertificate before the request is forwarded.	2022-02-04 20:35:20 -05:00
Luiz Aoqui	290bd0d521	add semgrep rule to check for potential time.After leaks (#12001 )	2022-02-03 17:33:07 -05:00
Luiz Aoqui	c613dc5d2c	Verify TLS certificate on endpoints that are used between agents only (#11956 )	2022-02-02 15:03:18 -05:00
Luiz Aoqui	f657529831	ci: add semgrep (#11934 )	2022-01-26 16:32:47 -05:00

46 Commits