Commit Graph

3608 Commits

Author SHA1 Message Date
Luiz Aoqui
7466496608 config: fix identity config for Consul service (#18363)
Rename the agent configuraion for workload identity to
`WorkloadIdentityConfig` to make its use more explicit and remove the
`ServiceName` field since it is never expected to be defined in a
configuration file.

Also update the job mutation to inject a service identity following
these rules:

1. Don't inject identity if `consul.use_identity` is false.
2. Don't inject identity if `consul.service_identity` is not specified.
3. Don't inject identity if service provider is not `consul`.
4. Set name and service name if the service specifies an identity.
5. Inject `consul.service_identity` if service does not specify an
   identity.
2023-08-31 11:22:48 -03:00
James Rasell
a9d5beb141 test: use correct parallel test setup func (#18326) 2023-08-25 13:51:36 +01:00
Piotr Kazmierczak
b430d21a67 agent: add consul.service_identity and consul.template_identity blocks (#18279)
This PR introduces updates to the agent config required for workload identity support.
2023-08-24 17:45:34 +02:00
Seth Hoenig
f5b0da1d55 all: swap exp packages for maps, slices (#18311) 2023-08-23 15:42:13 -05:00
Андрей Неустроев
3e61b3a37d Add multiple times in periodic jobs (#17858) 2023-08-22 15:42:31 -04:00
Piotr Kazmierczak
9fa39eb829 jobspec: add nomad_service field and identity block (#18239)
This PR introduces updates to the jobspec required for workload identity support for services.
---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-08-21 20:07:47 +02:00
Luiz Aoqui
196213c451 jobspec: add role to vault (#18257) 2023-08-18 15:29:02 -04:00
Tim Gross
a8bad048b6 config: parsing support for multiple Consul clusters in agent config (#18255)
Add the plumbing we need to accept multiple Consul clusters in Nomad agent
configuration, to support upcoming Nomad Enterprise features. The `consul` blocks
are differentiated by a new `name` field, and if the `name` is omitted it
becomes the "default" Consul configuration. All blocks with the same name are
merged together, as with the existing behavior.

As with the `vault` block, we're still using HCL1 for parsing configuration and
the `Decode` method doesn't parse multiple blocks differentiated only by a field
name without a label. So we've had to add an extra parsing pass, similar to what
we've done for HCL1 jobspecs. This also revealed a subtle bug in the `vault`
block handling of extra keys when there are multiple `vault` blocks, which I've
fixed here.

For now, all existing consumers will use the "default" Consul configuration, so
there's no user-facing behavior change in this changeset other than the contents
of the agent self API.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-08-18 15:25:16 -04:00
James Rasell
6108f5c4c3 admin: rename _oss files to _ce (#18209) 2023-08-18 07:47:24 +01:00
Tim Gross
74b796e6d0 config: parsing support for multiple Vault clusters in agent config (#18224)
Add the plumbing we need to accept multiple Vault clusters in Nomad agent
configuration, to support upcoming Nomad Enterprise features. The `vault` blocks
are differentiated by a new `name` field, and if the `name` is omitted it
becomes the "default" Vault configuration. All blocks with the same name are
merged together, as with the existing behavior.

Unfortunately we're still using HCL1 for parsing configuration and the `Decode`
method doesn't parse multiple blocks differentiated only by a field name without
a label. So we've had to add an extra parsing pass, similar to what we've done
for HCL1 jobspecs.

For now, all existing consumers will use the "default" Vault configuration, so
there's no user-facing behavior change in this changeset other than the contents
of the agent self API.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-08-17 14:10:32 -04:00
Tim Gross
f00bff09f1 fix multiple overflow errors in exponential backoff (#18200)
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.

Introduce a helper with a check on the cap before the bitshift to avoid overflow in all 
places this can occur.

Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
2023-08-15 14:38:18 -04:00
Michael Schurter
0e22fc1a0b identity: add support for multiple identities + audiences (#18123)
Allows for multiple `identity{}` blocks for tasks along with user-specified audiences. This is a building block to allow workload identities to be used with Consul, Vault and 3rd party JWT based auth methods.

Expiration is still unimplemented and is necessary for JWTs to be used securely, so that's up next.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-08-15 09:11:53 -07:00
Esteban Barrios
65d562b760 config: add configurable content security policy (#18085) 2023-08-14 14:23:03 -04:00
Seth Hoenig
d9341f0664 update go1.21 (#18184)
* build: update to go1.21

* go: eliminate helpers in favor of min/max

* build: run go mod tidy

* build: swap depguard for semgrep

* command: fixup broken tls error check on go1.21
2023-08-14 08:43:27 -05:00
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00
Devashish Taneja
472693d642 server: add config to tune job versions retention. #17635 (#17939) 2023-08-07 14:47:40 -04:00
Abbas Yazdanpanah
388198abef CLI: make snapshot name requiered in creating volume snapshots (#17958)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-08-04 10:36:07 +01:00
Luiz Aoqui
768978883d cli: search all namespaces for node volumes (#17925)
When looking for CSI volumes to display in the `node status` command the
CLI needs to search all namespaces.
2023-08-01 09:55:39 -04:00
Tim Gross
4fb5bf9a16 cli: support wildcard namespace in alloc subcommands (#18095)
The alloc exec and filesystem/logs commands allow passing the `-job` flag to
select a random allocation. If the namespace for the command is set to `*`, the
RPC handler doesn't handle this correctly as it's expecting to query for a
specific job. Most commands handle this ambiguity by first verifying that only a
single object of the type in question exists (ex. a single node or job).

Update these commands so that when the `-job` flag is set we first verify
there's a single job that matches. This also allows us to extend the
functionality to allow for the `-job` flag to support prefix matching.

Fixes: #12097
2023-07-31 13:15:15 -04:00
Gerard Nguyen
9e98d694a6 feature: Add new field render_templates on restart block (#18054)
This feature is necessary when user want to explicitly re-render all templates on task restart.
E.g. to fetch all new secrets from Vault, even if the lease on the existing secrets has not been expired.
2023-07-28 11:53:32 -07:00
Luiz Aoqui
ee31916c3b cli: add help message for -consul-namespace (#18081)
Add missing help entry for the `-consul-namespace` flag in `nomad job
run`.
2023-07-28 10:22:59 -04:00
Michael Schurter
d14362ec19 core: add jwks rpc and http api (#18035)
Add JWKS endpoint to HTTP API for exposing the root public signing keys used for signing workload identity JWTs.

Part 1 of N components as part of making workload identities consumable by third party services such as Consul and Vault. Identity attenuation (audience) and expiration (+renewal) are necessary to securely use workload identities with 3rd parties, so this merge does not yet document this endpoint.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-07-27 11:27:17 -07:00
Ville Vesilehto
2c463bb038 chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968) 2023-07-26 15:28:09 +01:00
Ville Vesilehto
5c9cd35055 chore(variable): Go stdlib vars for HTTP methods and status codes (#18062) 2023-07-26 14:30:11 +01:00
Ville Vesilehto
a8fd803176 chore(nodepool): Go stdlib vars for HTTP methods and status codes (#18061) 2023-07-26 14:23:28 +01:00
Kevin Mulvey
ea37488e54 check in stderrFrame is nil before logging stderrFrame.Data (#17815) 2023-07-24 09:33:14 +01:00
hc-github-team-nomad-core
583f8773fa Generate files for 1.6.1 release 2023-07-21 11:09:15 -04:00
Nando
ca26673781 volume-status : show namespace the volume belongs to (#17911)
* volume-status : show namespace the volume belongs to
2023-07-19 16:36:51 -04:00
hc-github-team-nomad-core
573cab2b1d Generate files for 1.6.0 release 2023-07-19 10:38:08 -04:00
hc-github-team-nomad-core
335bb8b9e1 Generate files for 1.6.0-rc.1 release 2023-07-12 09:54:14 -04:00
Luiz Aoqui
99fb36e119 np: update docs and add test for nil lists (#17899)
Document and test that if a namespace does not provide an `allow` or
`deny` list than those are treated as `nil` and have a different
behaviour from an empty list (`[]string{}`).
2023-07-11 10:59:45 -04:00
Lance Haig
1541358ef3 Add the ability to customise the details of the CA (#17309)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-07-11 08:53:09 +01:00
hc-github-team-nomad-core
2baba5821a Generate files for 1.6.0-beta.1 release 2023-06-28 11:06:20 -04:00
Tim Gross
555214199a cli: fix broken node pool jobs test (#17715)
In #17705 we fixed a bug in the treatment of the "all" node pool for the `node
pool jobs` command but missed a test in the CLI.
2023-06-23 14:10:45 -07:00
grembo
6f04b91912 Add disable_file parameter to job's vault stanza (#13343)
This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using 
`image` filesystem isolation. As a result, more powerful tokens can be used 
in a job definition, allowing it to use template stanzas to issue all kinds of 
secrets (database secrets, Vault tokens with very specific policies, etc.), 
without sharing that issuing power with the task itself.

This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.

If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
2023-06-23 15:15:04 -04:00
Phil Renaud
fe49f22247 Moves to the current LTS release of Node for our build and release workflows (#17639) 2023-06-21 15:17:24 -04:00
Luiz Aoqui
6c64847e1b np: scheduler configuration updates (#17575)
* jobspec: rename node pool scheduler_configuration

In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.

* np: add memory oversubscription config

* np: make scheduler config ENT
2023-06-19 11:41:46 -04:00
Luiz Aoqui
80e1ad68ba cli: prevent panic if job node pool is nil (#17571)
If the `nomad` CLI is used to access a cluster running a version that
does not include node pools the command will `nil` panic when trying to
resolve the job's node pool.
2023-06-16 17:08:36 -04:00
Luiz Aoqui
4f7c38b2a7 node pools: namespace integration (#17562)
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.

Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.

If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.

In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
2023-06-16 16:30:22 -04:00
Tim Gross
eee2315d5d docs: clarify node pool apply/delete behavior (#17529) 2023-06-14 15:58:53 -04:00
Tim Gross
0ac85db680 cli: fix missing -quiet flag for var init (#17526)
The `var init` command was intended to have support for a `-quiet` flag but it
was not documented and never parsed.
2023-06-14 14:52:46 -04:00
Tim Gross
6bd1ebed29 docs: note namespace apply/delete behaviors, fix metric (#17527)
This changeset includes some fixes to documentation discovered while working on
node pools, but we didn't want to include in the node pool PRs so they can get
backported easily:

* namespace apply/delete commands are forwarded to the authoritative region
* deleting a namespace requires there are no non-terminal jobs in any of the
  federated regions
* fixed a typo in the name of the `nomad.client.allocated.disk` metric
2023-06-14 14:52:06 -04:00
Tim Gross
0aeeaf1083 node pools: implement node pool init command (#17479)
Implement a `nomad node pool init` command that generates an example spec file
in either HCL or JSON format.
2023-06-13 14:51:29 -04:00
Luiz Aoqui
5db9e64cdd node pool: node pool upsert on multiregion node register (#17503)
When registering a node with a new node pool in a non-authoritative
region we can't create the node pool because this new pool will not be
replicated to other regions.

This commit modifies the node registration logic to only allow automatic
node pool creation in the authoritative region.

In non-authoritative regions, the client is registered, but the node
pool is not created. The client is kept in the `initialing` status until
its node pool is created in the authoritative region and replicated to
the client's region.
2023-06-13 11:28:28 -04:00
stswidwinski
887d3060c4 conf: Add preemption_config to the server extra HCL keys which should be removed (#17481)
Add preemption_config to the set of keys which should be pruned from the server
config as described in #17480.
2023-06-13 10:48:19 +02:00
Tim Gross
5bb6e5758d node pools: replicate from authoritative region (#17456)
Upserts and deletes of node pools are forwarded to the authoritative region,
just like we do for namespaces, quotas, ACL policies, etc. Replicate node pools
from the authoritative region.
2023-06-12 13:24:24 -04:00
Phil Renaud
408ab828f7 [ui] Parallelize ember tests (#17442)
* Exam to parallelize tests

* Logging to try to solve test flakiness

* Logging in another failure

* Hardening for one test and snapshot for another

* Explicitly set the first one as the servicedAlloc instead of randomly picking

* A wild CircleCI test failure appears

* de-log
2023-06-07 17:01:35 -04:00
Tim Gross
9a6078a2ae node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Luiz Aoqui
354d741c95 node pool: implement nomad node pool nodes CLI (#17444) 2023-06-07 10:37:27 -04:00