Commit Graph

2297 Commits

Author SHA1 Message Date
nicoche
ffcb72bfe3 api: Add Notes field to service checks (#22397)
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2024-06-10 16:59:49 +02:00
Piotr Kazmierczak
2a09abc477 metrics: quota utilization configuration and documentation (#22912)
Introduces support for (optional) quota utilization metrics

CE part of the hashicorp/nomad-enterprise#1488 change
2024-06-03 21:06:19 +02:00
Phil Renaud
014f5145dc Lockfile and bindata_assetfs recompiled on latest main (#22434) 2024-05-31 13:23:59 -04:00
Phil Renaud
86ee56b8c5 [ui] Jobs index page badge for when a job has a paused task (#22392)
* Adds a badge on the jobs index page if any task within any allocation of a running job is currently paused

* Snapshot and acceptance tests for paused states

* Cleared yarn cache

* Remove MirageScenario from the test dependency chain

* Logging before toString

* Cardinal sin of time-based test execution

* Maybe weve been lucky for years and the clientStatus has always been running for this test by happenstance

* Back away from the time-based and toward the settled() approach
2024-05-30 21:18:35 -04:00
Tim Gross
de38ff4189 consul: set partition for gateway config entries (#22228)
When we write Connect gateway configuation entries from the server, we're not
passing in the intended partition. This means we're using the server's own
partition to submit the configuration entries and this may not match. Note this
requires the Nomad server's token has permission to that partition.

Also, move the config entry write after we check Sentinel policies. This allows
us to return early if we hit a Sentinel error without making Consul RPCs first.
2024-05-29 16:31:02 -04:00
hc-github-team-nomad-core
32d820644a Generate files for 1.8.0 release 2024-05-29 11:48:55 -04:00
hc-github-team-nomad-core
c374bd375b Generate files for 1.8.0-rc.1 release 2024-05-23 16:55:05 -04:00
Daniel Bennett
4415fabe7d jobspec: time based task execution (#22201)
this is the CE side of an Enterprise-only feature.
a job trying to use this in CE will fail to validate.

to enable daily-scheduled execution entirely client-side,
a job may now contain:

task "name" {
  schedule {
    cron {
      start    = "0 12 * * * *" # may not include "," or "/"
      end      = "0 16"         # partial cron, with only {minute} {hour}
      timezone = "EST"          # anything in your tzdb
    }
  }
...

and everything about the allocation will be placed as usual,
but if outside the specified schedule, the taskrunner will block
on the client, waiting on the schedule start, before proceeding
with the task driver execution, etc.

this includes a taksrunner hook, which watches for the end of
the schedule, at which point it will kill the task.

then, restarts-allowing, a new task will start and again block
waiting for start, and so on.

this also includes all the plumbing required to pipe API calls
through from command->api->agent->server->client, so that
tasks can be force-run, force-paused, or resume the schedule
on demand.
2024-05-22 15:40:25 -05:00
Phil Renaud
e8b77fcfa0 [ui] Jobspec UI block: Descriptions and Links (#18292)
* Hacky but shows links and desc

* markdown

* Small pre-test cleanup

* Test for UI description and link rendering

* JSON jobspec docs and variable example job get UI block

* Jobspec documentation for UI block

* Description and links moved into the Title component and made into Helios components

* Marked version upgrade

* Allow links without a description and max description to 1000 chars

* Node 18 for setup-js

* markdown sanitization

* Ui to UI and docs change

* Canonicalize, copy and diff for job.ui

* UI block added to testJob for structs testing

* diff test

* Remove redundant reset

* For readability, changing the receiving pointer of copied job variables

* TestUI endpiont conversion tests

* -require +must

* Nil check on Links

* JobUIConfig.Links as pointer

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-05-22 15:00:45 -04:00
Seth Hoenig
09bd11383c client: alloc_mounts directory must be sibling of data directory (#22199)
This PR adjusts the default location of -alloc-mounts-dir path to be a
sibling of the -data-dir path rather than a child. This is because on a
production-hardened systems the data dir is supposed to be chmod 0700
owned by root - preventing the exec2 task driver (and others using
unveil file system isolation features) from working properly.

For reference the directory structure from -data-dir now looks like this
after running an example job. Under the alloc_mounts directory, task
specific directories are mode 0710 and owned by the task user (which
may be a dynamic user UID/GID).

➜ sudo tree -p -d -u /tmp/mynomad
[drwxrwxr-x shoenig ]  /tmp/mynomad
├── [drwx--x--x root    ]  alloc_mounts
│   └── [drwx--x--- 80552   ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6-mytask
│       ├── [drwxrwxrwx nobody  ]  alloc
│       │   ├── [drwxrwxrwx nobody  ]  data
│       │   ├── [drwxrwxrwx nobody  ]  logs
│       │   └── [drwxrwxrwx nobody  ]  tmp
│       ├── [drwxrwxrwx nobody  ]  local
│       ├── [drwxr-xr-x root    ]  private
│       ├── [drwx--x--- 80552   ]  secrets
│       └── [drwxrwxrwt nobody  ]  tmp
└── [drwx------ root    ]  data
    ├── [drwx--x--x root    ]  alloc
    │   └── [drwxr-xr-x root    ]  c753b71d-c6a1-3370-1f59-47ab838fd8a6
    │       ├── [drwxrwxrwx nobody  ]  alloc
    │       │   ├── [drwxrwxrwx nobody  ]  data
    │       │   ├── [drwxrwxrwx nobody  ]  logs
    │       │   └── [drwxrwxrwx nobody  ]  tmp
    │       └── [drwx--x--- 80552   ]  mytask
    │           ├── [drwxrwxrwx nobody  ]  alloc
    │           │   ├── [drwxrwxrwx nobody  ]  data
    │           │   ├── [drwxrwxrwx nobody  ]  logs
    │           │   └── [drwxrwxrwx nobody  ]  tmp
    │           ├── [drwxrwxrwx nobody  ]  local
    │           ├── [drwxrwxrwx nobody  ]  private
    │           ├── [drwx--x--- 80552   ]  secrets
    │           └── [drwxrwxrwt nobody  ]  tmp
    ├── [drwx------ root    ]  client
    └── [drwxr-xr-x root    ]  server
        ├── [drwx------ root    ]  keystore
        ├── [drwxr-xr-x root    ]  raft
        │   └── [drwxr-xr-x root    ]  snapshots
        └── [drwxr-xr-x root    ]  serf

32 directories
2024-05-22 13:14:34 -05:00
Deniz Onur Duzgun
1cc99cc1b4 bug: resolve type conversion alerts (#20553) 2024-05-15 13:22:10 -04:00
Tim Gross
c9fd93c772 connect: support volume_mount blocks for sidecar task overrides (#20575)
Users can override the default sidecar task for Connect workloads. This sidecar
task might need access to certificate stores on the host. Allow adding the
`volume_mount` block to the sidecar task override.

Also fixes a bug where `volume_mount` blocks would not appear in plan diff
outputs.

Fixes: https://github.com/hashicorp/nomad/issues/19786
2024-05-14 12:49:37 -04:00
hc-github-team-nomad-core
e1a176c120 Generate files for 1.8.0-beta.1 release 2024-05-07 07:06:07 +00:00
Daniel Bennett
cf87a556b3 api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130)
introduce a new API /v1/jobs/statuses, primarily for use in the UI,
which collates info about jobs, their allocations, and latest deployment.

currently the UI gets *all* of /v1/jobs and sorts and paginates them client-side
in the browser, and its "summary" column is based on historical summary data
(which can be visually misleading, and sometimes scary when a job has failed
at some point in the not-yet-garbage-collected past).

this does pagination and filtering and such, and returns jobs sorted by ModifyIndex,
so latest-changed jobs still come first. it pulls allocs and latest deployment
straight out of current state for more a more robust, holistic view of the job status.
it is less efficient per-job, due to the extra state lookups, but should be more efficient
per-page (excepting perhaps for job(s) with very-many allocs).

if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`,
then the response will be limited to that subset of jobs. the main goal here is to
prevent "jostling" the user in the UI when jobs come into and out of existence.

and if a blocking query is started with `?index=N`, then the query should only
unblock if jobs "on page" change, rather than any change to any of the state
tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary
HTTP round trips.
2024-05-03 15:01:40 -05:00
Tim Gross
54fc146432 agent: add support for sdnotify protocol (#20528)
Nomad agents expect to receive `SIGHUP` to reload their configuration. The
signal handler for this is installed fairly late in agent startup, after the
client or server components are up and running. This means that configuration
management tools can potentially reload the configuration before the agent can
handle it, causing the agent to crash.

We don't want to allow configuration reload during client or server component
startup, because it would significantly complicate initialization. Instead,
we'll implement the systemd notify protocol. This causes systemd to block
sending configuration reload signals until the agent is actually ready. Users
can still bypass this by sending signals directly.

Note that there are several Go libraries that implement the sdnotify protocol,
but most are part of much larger projects which would create a lot of dependabot
burden. The bits of the protocol we need are extremely simple to implement in a
just a couple of functions.

For non-Linux or non-systemd Linux systems, this feature is a no-op. In future
work we could potentially implement service notification for Windows as well.

Fixes: https://github.com/hashicorp/nomad/issues/3885
2024-05-03 13:42:07 -04:00
Juana De La Cuesta
64978662b6 Post 1.7.7 release (#20421)
Generate files for 1.7.7 release, prepare for next release and merge release 1.7.7 files
2024-04-17 10:44:32 +02:00
Seth Hoenig
ae6c4c8e3f deps: purge use of old x/exp packages (#20373) 2024-04-12 08:29:00 -05:00
astudentofblake
7b7ed12326 func: Allow custom paths to be added the the getter landlock (#20349)
* func: Allow custom paths to be added the the getter landlock

Fixes: 20315

* fix: slices imports
fix: more meaningful examples
fix: improve documentation
fix: quote error output
2024-04-11 15:17:33 -05:00
Tim Gross
e8d203e7ce transparent proxy: add jobspec support (#20144)
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: https://github.com/hashicorp/nomad/issues/10628
2024-04-04 17:01:07 -04:00
James Rasell
facc3e8013 agent: allow configuration of in-memory telemetry sink. (#20166)
This change adds configuration options for setting the in-memory
telemetry sink collection and retention durations. This sink backs
the metrics JSON API and previously had hard-coded default values.

The new options are particularly useful when running development or
debug environments, where metrics collection is desired at a fast
and granular rate.
2024-03-25 15:00:18 +00:00
Tim Gross
bdf3ff301e jobspec: add support for destination partition to upstream block (#20167)
Adds support for specifying a destination Consul admin partition in the
`upstream` block.

Fixes: https://github.com/hashicorp/nomad/issues/19785
2024-03-22 16:15:22 -04:00
Tim Gross
10dd738a03 jobspec: update gateway.ingress.service Consul API fields (#20176)
Add support for further configuring `gateway.ingress.service` blocks to bring
this block up-to-date with currently available Consul API fields (except for
namespace and admin partition, which will need be handled under a different
PR). These fields are sent to Consul as part of the job endpoint submission hook
for Connect gateways.

Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>
2024-03-22 13:50:48 -04:00
Michael Schurter
23e4b7c9d2 Upgrade go-msgpack to v2 (#20173)
Replaces #18812

Upgraded with:
```
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';'
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';'
go get
go get -v -u github.com/hashicorp/raft-boltdb/v2
go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80
```

see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0
for details
2024-03-21 11:44:23 -07:00
Tim Gross
7b9bce2d08 config: fix client.template config merging with defaults (#20165)
When loading the client configuration, the user-specified `client.template`
block was not properly merged with the default values. As a result, if the user
set any `client.template` field, all the other field defaulted to their zero
values instead of the documented defaults.

This changeset:
* Adds the missing `Merge` method for the client template config and ensures
  it's called.
* Makes a single source of truth for the default template configuration,
  instead of two different constructors.
* Extends the tests to cover the merge of a partial block better.

Fixes: https://github.com/hashicorp/nomad/issues/20164
2024-03-20 10:18:56 -04:00
Tim Gross
5138c1c82f autopilot: add Enterprise health information to API endpoint (#20153)
Add information about autopilot health to the `/operator/autopilot/health` API
in Nomad Enterprise.

I've pulled the CE changes required for this feature out of @lindleywhite's PR
in the Enterprise repo. A separate PR will include a new `operator autopilot
health` command that can present this information at the command line.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394
Co-authored-by: Lindley <lindley@hashicorp.com>
2024-03-18 11:38:17 -04:00
Amir Abbas
40b8f17717 Support insecure flag on artifact (#20126) 2024-03-14 10:59:20 -05:00
Seth Hoenig
05937ab75b exec2: add client support for unveil filesystem isolation mode (#20115)
* exec2: add client support for unveil filesystem isolation mode

This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.

* actually create alloc-mounts-dir directory

* fix doc strings about alloc mount dir paths
2024-03-13 08:24:17 -05:00
hc-github-team-nomad-core
46182c2a83 Generate files for 1.7.6 release 2024-03-12 12:04:04 +01:00
Seth Hoenig
286dce7a2a exec2: add a client.users configuration block (#20093)
* exec: add a client.users configuration block

For now just add min/max dynamic user values; soon we can also absorb
the "user.denylist" and "user.checked_drivers" options from the
deprecated client.options map.

* give the no-op pool implementation a better name

* use explicit error types to make referencing them cleaner in tests

* use import alias to not shadow package name
2024-03-08 16:02:32 -06:00
Seth Hoenig
4d83733909 tests: swap testify for test in more places (#20028)
* tests: swap testify for test in plugins/csi/client_test.go

* tests: swap testify for test in testutil/

* tests: swap testify for test in host_test.go

* tests: swap testify for test in plugin_test.go

* tests: swap testify for test in utils_test.go

* tests: swap testify for test in scheduler/

* tests: swap testify for test in parse_test.go

* tests: swap testify for test in attribute_test.go

* tests: swap testify for test in plugins/drivers/

* tests: swap testify for test in command/

* tests: fixup some test usages

* go: run go mod tidy

* windows: cpuset test only on linux
2024-02-29 12:11:35 -06:00
Juana De La Cuesta
20cfbc82d3 Introduces Disconnect block into the TaskGroup configuration (#19886)
This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.
2024-02-19 16:41:35 +01:00
hc-github-team-nomad-core
6e08d9ffff Generate files for 1.7.5 release 2024-02-13 11:32:59 -05:00
Tim Gross
e986c298ac alloc exec: fix panics after stream close (#19932)
In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.
2024-02-12 09:43:34 -05:00
Tim Gross
110d93ab25 windows: remove LazyDLL calls for system modules (#19925)
On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to
load a few system DLL files, which does not prevent DLL hijacking
attacks. Hypothetically a local attacker on the client host that can place an
abusive library in a specific location could use this to escalate privileges to
the Nomad process. Although this attack does not fall within the Nomad security
model, it doesn't hurt to follow good practices here.

We can remove two of these DLL loads by using wrapper functions provided by the
stdlib in `x/sys/windows`

Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>
2024-02-09 08:47:48 -05:00
hc-github-team-nomad-core
875e96cccc Generate files for 1.7.4 release 2024-02-08 10:40:24 -05:00
Juana De La Cuesta
120c3ca3c9 Add granular control of SELinux labels for host mounts (#19839)
Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label

* Update website/content/docs/job-specification/volume_mount.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* fix: typo

* func: make volume mount verification happen even on  mounts with no volume

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-02-05 10:05:33 +01:00
James Rasell
41555b6370 cli: Fix minor help formatting issue in agent command. (#19743) 2024-01-17 12:18:00 +00:00
hc-github-team-nomad-core
ddfc157c0a Generate files for 1.7.3 release 2024-01-15 15:58:41 -05:00
Luiz Aoqui
e1e80f383e vault: add new nomad setup vault -check commmand (#19720)
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
2024-01-12 15:48:30 -05:00
Tim Gross
0935f443dc vault: support allowing tokens to expire without refresh (#19691)
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.

When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.

Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.

Fixes: https://github.com/hashicorp/nomad/issues/8690
2024-01-10 14:49:02 -05:00
Tim Gross
d3e5cae1eb consul: support admin partitions (#19665)
Add support for Consul Enterprise admin partitions. We added fingerprinting in
https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition`
field. The expectation is that most users will create a mapping of Nomad node
pool to Consul admin partition. But we'll also create an implicit constraint for
the fingerprinted value.

Fixes: https://github.com/hashicorp/nomad/issues/13139
2024-01-10 10:41:29 -05:00
Mike Nomitch
31f4296826 Adds support for failures before warning to Consul service checks (#19336)
Adds support for failures before warning and failures before critical
to the automatically created Nomad client and server services in Consul
2023-12-14 11:33:31 -08:00
hc-github-team-nomad-core
b777013ff9 Generate files for 1.7.2 release 2023-12-14 11:23:55 +01:00
hc-github-team-nomad-core
180fd54918 Generate files for 1.7.1 release 2023-12-08 14:39:09 -05:00
Luiz Aoqui
099ee06a60 Revert "deps: update go-metrics to v0.5.3 (#19190)" (#19374)
* Revert "deps: update go-metrics to v0.5.3 (#19190)"

This reverts commit ddb060d8b3.

* changelog: add entry for #19374
2023-12-08 08:46:55 -05:00
Luiz Aoqui
c624dc2121 config: fix loading Vault token from env var (#19349)
The `defaultVault` variable is a pointer to the Vault configuration
named `default`. Initially, this variable points to the Vault
configuration that is used to load CLI flag values, but after those are
merged with the default and config file values the pointer reference
must be updated before mutating the config with environment variable
values.
2023-12-07 11:56:53 -05:00
Luiz Aoqui
27d2ad1baf cli: add -dev-consul and -dev-vault agent mode (#19327)
The `-dev-consul` and `-dev-vault` flags add default identities and
configuration to the Nomad agent to connect and use the workload
identity integration with Consul and Vault.
2023-12-07 11:51:20 -05:00
hc-github-team-nomad-core
e799b06f02 Generate files for 1.7.0 release 2023-12-07 16:43:02 +01:00
Tim Gross
3c4e2009f5 connect: deployments should wait for Connect sidecar checks (#19334)
When a Connect service is registered with Consul, Nomad includes the nested
`Connect.SidecarService` field that includes health checks for the Envoy
proxy. Because these are not part of the job spec, the alloc health tracker
created by `health_hook` doesn't know to read the value of these checks.

In many circumstances this won't be noticed, but if the Envoy health check
happens to take longer than the `update.min_healthy_time` (perhaps because it's
been set low), it's possible for a deployment to progress too early such that
there will briefly be no healthy instances of the service available in Consul.

Update the Consul service client to find the nested sidecar service in the
service catalog and attach it to the results provided to the tracker. The
tracker can then check the sidecar health checks.

Fixes: https://github.com/hashicorp/nomad/issues/19269
2023-12-06 16:59:51 -05:00
Juana De La Cuesta
cf539c405e Add a new parameter to avoid starting a replacement for lost allocs (#19101)
This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run.

In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned.
Implements issue #10366

Co-authored-by: Dom Lavery <dom@circleci.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-12-06 12:28:42 +01:00