Commit Graph

264 Commits

Author SHA1 Message Date
Tim Gross
e8d203e7ce transparent proxy: add jobspec support (#20144)
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: https://github.com/hashicorp/nomad/issues/10628
2024-04-04 17:01:07 -04:00
Tim Gross
bdf3ff301e jobspec: add support for destination partition to upstream block (#20167)
Adds support for specifying a destination Consul admin partition in the
`upstream` block.

Fixes: https://github.com/hashicorp/nomad/issues/19785
2024-03-22 16:15:22 -04:00
Tim Gross
10dd738a03 jobspec: update gateway.ingress.service Consul API fields (#20176)
Add support for further configuring `gateway.ingress.service` blocks to bring
this block up-to-date with currently available Consul API fields (except for
namespace and admin partition, which will need be handled under a different
PR). These fields are sent to Consul as part of the job endpoint submission hook
for Connect gateways.

Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>
2024-03-22 13:50:48 -04:00
Amir Abbas
40b8f17717 Support insecure flag on artifact (#20126) 2024-03-14 10:59:20 -05:00
Juana De La Cuesta
20cfbc82d3 Introduces Disconnect block into the TaskGroup configuration (#19886)
This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.
2024-02-19 16:41:35 +01:00
Tim Gross
e986c298ac alloc exec: fix panics after stream close (#19932)
In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.
2024-02-12 09:43:34 -05:00
Juana De La Cuesta
120c3ca3c9 Add granular control of SELinux labels for host mounts (#19839)
Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label

* Update website/content/docs/job-specification/volume_mount.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* fix: typo

* func: make volume mount verification happen even on  mounts with no volume

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-02-05 10:05:33 +01:00
Tim Gross
0935f443dc vault: support allowing tokens to expire without refresh (#19691)
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.

When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.

Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.

Fixes: https://github.com/hashicorp/nomad/issues/8690
2024-01-10 14:49:02 -05:00
Tim Gross
d3e5cae1eb consul: support admin partitions (#19665)
Add support for Consul Enterprise admin partitions. We added fingerprinting in
https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition`
field. The expectation is that most users will create a mapping of Nomad node
pool to Consul admin partition. But we'll also create an implicit constraint for
the fingerprinted value.

Fixes: https://github.com/hashicorp/nomad/issues/13139
2024-01-10 10:41:29 -05:00
Mike Nomitch
31f4296826 Adds support for failures before warning to Consul service checks (#19336)
Adds support for failures before warning and failures before critical
to the automatically created Nomad client and server services in Consul
2023-12-14 11:33:31 -08:00
Juana De La Cuesta
cf539c405e Add a new parameter to avoid starting a replacement for lost allocs (#19101)
This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run.

In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned.
Implements issue #10366

Co-authored-by: Dom Lavery <dom@circleci.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-12-06 12:28:42 +01:00
James Rasell
532402aa2d actions: use specific RPC request object and tighten naming. (#19149) 2023-11-23 07:42:37 +00:00
Michael Schurter
c4ae91f8be Fix WorkloadIdentity.TTL handling, jobspec2 testing, and hcl1 vs 2 parsing (#19024)
* make the little dots consistent
* don't trim delimiter as that over matches
* test jobspec2 package
* copy api/WorkloadIdentity.TTL -> structs
* test ttl parsing
* fix hcl1 v 2 parsing mismatch
* make jobspec(1) tests match jobspec2 tests
2023-11-08 09:01:16 -08:00
Michael Schurter
e49ca3c431 identity: Implement change_mode (#18943)
* identity: support change_mode and change_signal

wip - just jobspec portion

* test struct

* cleanup some insignificant boogs

* actually implement change mode

* docs tweaks

* add changelog

* test identity.change_mode operations

* use more words in changelog

* job endpoint tests

* address comments from code review

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-11-01 09:41:11 -05:00
Luiz Aoqui
3ddf1ecf1d actions: minor bug fixes and improvements (#18904) 2023-10-31 17:06:02 -04:00
Phil Renaud
8902afe651 Nomad Actions (#18794)
* Scaffolding actions (#18639)

* Task-level actions for job submissions and retrieval

* FIXME: Temporary workaround to get ember dev server to pass exec through to 4646

* Update api/tasks.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update command/agent/job_endpoint.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Diff and copy implementations

* Action structs get their own file, diff updates to behave like our other diffs

* Test to observe actions changes in a version update

* Tests migrated into structs/diff_test and modified with PR comments in mind

* APIActionToSTructsAction now returns a new value

* de-comment some plain parts, remove unused action lookup

* unused param in action converter

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* New endpoint: job/:id/actions (#18690)

* unused param in action converter

* backing out of parse_job level and moved toward new endpoint level

* Adds taskName and taskGroupName to actions at job level

* Unmodified job mock actions tests

* actionless job test

* actionless job test

* Multi group multi task actions test

* HTTP method check for GET, cleaner errors in job_endpoint_test

* decomment

* Actions aggregated at job model level (#18733)

* Removal of temporary fix to proxy to 4646

* Run Action websocket endpoint (#18760)

* Working demo for review purposes

* removal of cors passthru for websockets

* Remove job_endpoint-specific ws handlers and aimed at existing alloc exec handlers instead

* PR comments adressed, no need for taskGroup pass, better group and task lookups from alloc

* early return in action validate and removed jobid from req args per PR comments

* todo removal, we're checking later in the rpc

* boolean style change on tty

* Action CLI command (#18778)

* Action command init and stuck-notes

* Conditional reqpath to aim at Job action endpoint

* De-logged

* General CLI command cleanup, observe namespace, pass action as string, get random alloc w group adherence

* tab and varname cleanup

* Remove action param from Allocations().Exec calls

* changelog

* dont nil-check acl

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-10-20 13:05:55 -04:00
Seth Hoenig
83720740f5 core: plumbing to support numa aware scheduling (#18681)
* core: plumbing to support numa aware scheduling

* core: apply node resources compatibility upon fsm rstore

Handle the case where an upgraded server dequeus an evaluation before
a client triggers a new fingerprint - which would be needed to cause
the compatibility fix to run. By running the compat fix on restore the
server will immediately have the compatible pseudo topology to use.

* lint: learn how to spell pseudo
2023-10-19 15:09:30 -05:00
Tim Gross
cbd7248248 auth: use ACLsDisabledACL when ACLs are disabled (#18754)
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
Ref: https://github.com/hashicorp/nomad/pull/18730
Ref: https://github.com/hashicorp/nomad/pull/18744
2023-10-16 09:30:24 -04:00
Tim Gross
aaee3076c2 consul: allow consul block in task scope (#18597)
To support Workload Identity with Consul for templates, we want templates to be
able to use the WI created at the task scope (either implicitly or set by the
user). But to allow different tasks within a group to be assigned to different
clusters as we're doing for Vault, we need to be able to set the `consul` block
with its `cluster` field at the task level to override the group.
2023-09-29 15:03:48 -04:00
James Rasell
d923fc554d consul/connect: add new fields to Consul Connect upstream block (#18430)
Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>
2023-09-11 16:02:52 +01:00
Michael Schurter
ef24e40b39 identity: support jwt expiration and rotation (#18262)
Implements expirations and renewals for alternate workload identity tokens.
2023-09-08 14:50:34 -07:00
Tim Gross
7cdd592809 jobspec: support cluster field for Vault block (#18408)
This field supports the upcoming ENT-only multiple Vault clusters feature. The
job validation and mutation hooks will come in a separate PR.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-09-07 10:15:28 -04:00
Tim Gross
7863d7bcbb jobspec: support cluster field for Consul and Service blocks (#18409)
This field supports the upcoming ENT-only multiple Consul clusters feature. The
job validation and mutation hooks will come in a separate PR.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-09-07 09:48:49 -04:00
Seth Hoenig
f5b0da1d55 all: swap exp packages for maps, slices (#18311) 2023-08-23 15:42:13 -05:00
Андрей Неустроев
3e61b3a37d Add multiple times in periodic jobs (#17858) 2023-08-22 15:42:31 -04:00
Piotr Kazmierczak
9fa39eb829 jobspec: add nomad_service field and identity block (#18239)
This PR introduces updates to the jobspec required for workload identity support for services.
---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-08-21 20:07:47 +02:00
Luiz Aoqui
196213c451 jobspec: add role to vault (#18257) 2023-08-18 15:29:02 -04:00
Michael Schurter
0e22fc1a0b identity: add support for multiple identities + audiences (#18123)
Allows for multiple `identity{}` blocks for tasks along with user-specified audiences. This is a building block to allow workload identities to be used with Consul, Vault and 3rd party JWT based auth methods.

Expiration is still unimplemented and is necessary for JWTs to be used securely, so that's up next.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-08-15 09:11:53 -07:00
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
Gerard Nguyen
9e98d694a6 feature: Add new field render_templates on restart block (#18054)
This feature is necessary when user want to explicitly re-render all templates on task restart.
E.g. to fetch all new secrets from Vault, even if the lease on the existing secrets has not been expired.
2023-07-28 11:53:32 -07:00
Ville Vesilehto
2c463bb038 chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968) 2023-07-26 15:28:09 +01:00
grembo
6f04b91912 Add disable_file parameter to job's vault stanza (#13343)
This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using 
`image` filesystem isolation. As a result, more powerful tokens can be used 
in a job definition, allowing it to use template stanzas to issue all kinds of 
secrets (database secrets, Vault tokens with very specific policies, etc.), 
without sharing that issuing power with the task itself.

This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.

If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
2023-06-23 15:15:04 -04:00
Samantha
7ef1905333 check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Tim Gross
2d059bbf22 node pools: add node_pool field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00
Tim Gross
2aa3c746c4 logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087)
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.

This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.

This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.

Fixes #17076
2023-05-04 16:01:18 -04:00
Tim Gross
30bc456f03 logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
Seth Hoenig
2c44cbb001 api: enable support for setting original job source (#16763)
* api: enable support for setting original source alongside job

This PR adds support for setting job source material along with
the registration of a job.

This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.

The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.

The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.

* api: avoid writing var content to disk for parsing

* api: move submission validation into RPC layer

* api: return an error if updating a job submission without namespace or job id

* api: be exact about the job index we associate a submission with (modify)

* api: reword api docs scheduling

* api: prune all but the last 6 job submissions

* api: protect against nil job submission in job validation

* api: set max job source size in test server

* api: fixups from pr
2023-04-11 08:45:08 -05:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Seth Hoenig
e324a04613 cleanup: use jobID name rather than jobName in job endpoints (#16777)
These endpoints all refer to JobID by the time you get to the RPC request
layer, but the HTTP handler functions call the field JobName, which is a different
field (... though often with the same value).
2023-04-04 09:11:58 -05:00
Horacio Monsalvo
5957880112 connect: add meta on ConsulSidecarService (#16705)
Co-authored-by: Sol-Stiep <sol.stiep@southworks.com>
2023-03-30 16:09:28 -04:00
Alessio Perugini
365ccf4377 Allow configurable range of Job priorities (#16084) 2023-02-17 09:23:13 -05:00
Michael Schurter
9bab96ebd3 Task API via Unix Domain Socket (#15864)
This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled.

This PR contains the core feature and tests but followup work is required for the following TODO items:

- Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink
- Unit tests for auth middleware
- Caching for auth middleware
- Rate limiting on negative lookups for auth middleware

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-02-06 11:31:22 -08:00
Charlie Voiselle
fe4ff5be2a Add option to expose workload token to task (#15755)
Add `identity` jobspec block to expose workload identity tokens to tasks.

---------

Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-02-02 10:59:14 -08:00
Jorge Marey
340ad2db58 Rename fields on proxyConfig (#15541)
* Change api Fields for expose and paths

* Add changelog entry

* changelog: add deprecation notes about connect fields

* api: minor style tweaks

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-01-30 09:31:16 -06:00
Piotr Kazmierczak
949a6f60c7 renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
Seth Hoenig
4698d8da79 consul/connect: support for proxy upstreams opaque config (#15761)
This PR adds support for configuring `proxy.upstreams[].config` for
Consul Connect upstreams. This is an opaque config value to Nomad -
the data is passed directly to Consul and is unknown to Nomad.
2023-01-12 08:20:54 -06:00
Charlie Voiselle
52a254ba22 template: error on missing key (#15141)
* Support error_on_missing_value for templates
* Update docs for template stanza
2022-11-04 13:23:01 -04:00
Phil Renaud
1a29e72f7f [ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833)
* Adds meta to job list stub and displays a pack logo on the jobs index

* Changelog

* Modifying struct for optional meta param

* Explicitly ask for meta anytime I look up a job from index or job page

* Test case for the endpoint

* adding meta field to API struct and ommitting from response if empty

* passthru method added to api/jobs.list

* Meta param listed in docs for jobs list

* Update api/jobs.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-11-02 16:58:24 -04:00
Seth Hoenig
ff1a30fe8d cleanup more helper updates (#14638)
* cleanup: refactor MapStringStringSliceValueSet to be cleaner

* cleanup: replace SliceStringToSet with actual set

* cleanup: replace SliceStringSubset with real set

* cleanup: replace SliceStringContains with slices.Contains

* cleanup: remove unused function SliceStringHasPrefix

* cleanup: fixup StringHasPrefixInSlice doc string

* cleanup: refactor SliceSetDisjoint to use real set

* cleanup: replace CompareSliceSetString with SliceSetEq

* cleanup: replace CompareMapStringString with maps.Equal

* cleanup: replace CopyMapStringString with CopyMap

* cleanup: replace CopyMapStringInterface with CopyMap

* cleanup: fixup more CopyMapStringString and CopyMapStringInt

* cleanup: replace CopySliceString with slices.Clone

* cleanup: remove unused CopySliceInt

* cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice

* cleanup: replace CopyMap with maps.Clone

* cleanup: run go mod tidy
2022-09-21 14:53:25 -05:00
Piotr Kazmierczak
34e4b080f6 template: custom change_mode scripts (#13972)
This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707
2022-08-24 17:43:01 +02:00