Commit Graph

25795 Commits

Author SHA1 Message Date
Tim Gross
b1657dd1fa CSI: track node claim before staging to prevent interleaved unstage (#20550)
The CSI hook for each allocation that claims a volume runs concurrently. If a
call to `MountVolume` happens at the same time as a call to `UnmountVolume` for
the same volume, it's possible for the second alloc to detect the volume has
already been staged, then for the original alloc to unpublish and unstage it,
only for the second alloc to then attempt to publish a volume that's been
unstaged.

The usage tracker on the volume manager was intended to prevent this behavior
but the call to claim the volume was made only after staging and publishing was
complete. Move the call to claim the volume for the usage tracker to the top of
the `MountVolume` workflow to prevent it from being unstaged until all consuming
allocations have called `UnmountVolume`.

Fixes: https://github.com/hashicorp/nomad/issues/20424
2024-05-16 09:45:07 -04:00
Tim Gross
953bfcc31e services: retry failed Nomad service deregistrations from client (#20596)
When the allocation is stopped, we deregister the service in the alloc runner's
`PreKill` hook. This ensures we delete the service registration and wait for the
shutdown delay before shutting down the tasks, so that workloads can drain their
connections. However, the call to remove the workload only logs errors and never
retries them.

Add a short retry loop to the `RemoveWorkload` method for Nomad services, so
that transient errors give us an extra opportunity to deregister the service
before the tasks are stopped, before we need to fall back to the data integrity
improvements implemented in #20590.

Ref: https://github.com/hashicorp/nomad/issues/16616
2024-05-16 08:59:54 -04:00
Dianne Laguerta
cabdd7eddb migrate GHA workflows to using single runner labels (#20581) 2024-05-16 13:35:10 +01:00
Szymon Nowicki-Korgol
898dddc5db structs: Fix job canonicalization for array type fields (#20522)
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2024-05-16 14:05:12 +02:00
Phil Renaud
6886edf033 Makes it so an empty state query blocks and changes the style to be more Nomadic (#20588) 2024-05-15 13:57:48 -04:00
Deniz Onur Duzgun
1cc99cc1b4 bug: resolve type conversion alerts (#20553) 2024-05-15 13:22:10 -04:00
Tim Gross
6d806a9934 services: fix data integrity errors for Nomad native services (#20590)
This changeset fixes three potential data integrity issues between allocations
and their Nomad native service registrations.

* When a node is marked down because it missed heartbeats, we remove Vault and
  Consul tokens (for the pre-Workload Identity workflows) after we've written
  the node update to Raft. This is unavoidably non-transactional because the
  Consul and Vault servers aren't in the same Raft cluster as Nomad itself. But
  we've unnecessarily mirrored this same behavior to deregister Nomad
  services. This makes it possible for the leader to successfully write the node
  update to Raft without removing services.

  To address this, move the delete into the same Raft transaction. One minor
  caveat with this approach is the upgrade path: if the leader is upgraded first
  and a node is marked down during this window, older followers will have stale
  information until they are also upgraded. This is unavoidable without
  requiring the leader to unconditionally make an extra Raft write for every
  down node until 2 LTS versions after Nomad 1.8.0. This temporary reduction in
  data integrity for stale reads seems like a reasonable tradeoff.

* When an allocation is marked client-terminal from the client in
  `UpdateAllocsFromClient`, we have an opportunity to ensure data integrity by
  deregistering services for that allocation.

* When an allocation is deleted during eval garbage collection, we have an
  opportunity to ensure data integrity by deregistering services for that
  allocation. This is a cheap no-op if the allocation has been previously marked
  client-terminal.

This changeset does not address client-side retries for the originally reported
issue, which will be done in a separate PR.

Ref: https://github.com/hashicorp/nomad/issues/16616
2024-05-15 11:56:07 -04:00
Seth Hoenig
4148ca1769 client: mount shared alloc dir as nobody (#20589)
In the Unveil filesystem isolation mode we were mounting the shared
alloc dir with the UID/GID of the user of the task dir being mounted
and 0710 filesystem permissions. This was causing the actual task dir
to become inaccessible to other tasks in the allocation (a race where
the last mounter wins). Instead mount the shared alloc dir as nobody
with 0777 filesystem permissions.
2024-05-15 10:43:30 -05:00
Tim Gross
c9fd93c772 connect: support volume_mount blocks for sidecar task overrides (#20575)
Users can override the default sidecar task for Connect workloads. This sidecar
task might need access to certificate stores on the host. Allow adding the
`volume_mount` block to the sidecar task override.

Also fixes a bug where `volume_mount` blocks would not appear in plan diff
outputs.

Fixes: https://github.com/hashicorp/nomad/issues/19786
2024-05-14 12:49:37 -04:00
James Rasell
04ba358266 client: expose network namespace CNI config as task env vars. (#11810)
This change exposes CNI configuration details of a network
namespace as environment variables. This allows a task to use
these value to configure itself; a potential use case is to run
a Raft application binding to IP and Port details configured using
the bridge network mode.
2024-05-14 09:02:06 +01:00
Juana De La Cuesta
169818b1bd [gh-6980] Client: clean up old allocs before running new ones using the exec task driver. (#20500)
Whenever the "exec" task driver is being used, nomad runs a plug in that in time runs the task on a container under the hood. If by any circumstance the executor is killed, the task is reparented to the init service and wont be stopped by Nomad in case of a job updated or stop.

This commit introduces two mechanisms to avoid this behaviour:

* Adds signal catching and handling to the executor, so in case of a SIGTERM, the signal will also be passed on to the task.
* Adds a pre start clean up of the processes in the container, ensuring only the ones the executor runs are present at any given time.
2024-05-14 09:51:27 +02:00
Tim Gross
5b328d9adc CSI: add support for wildcard namespaces on plugin status (#20551)
The `nomad plugin status :plugin_id` command lists allocations that implement
the plugin being queried. This list is filtered by the `-namespace` flag as
usual. Cluster admins will likely deploy plugins to a single namespace, but for
convenience they may want to have the wildcard namespace set in their command
environment.

Add support for handling the wildcard namespace to the CSI plugin RPC handler.

Fixes: https://github.com/hashicorp/nomad/issues/20537
2024-05-13 15:42:35 -04:00
Tim Gross
0fb22eeab3 docs: fix broken markdown in alloc exec (#20576) 2024-05-13 15:34:37 -04:00
Tim Gross
65ae61249c CSI: include volume namespace in staging path (#20532)
CSI volumes are namespaced. But the client does not include the namespace in the
staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host. The
per-allocation paths don't need to be namespaced, because an allocation can only
mount volumes from its job's own namespace.

Rework the CSI hook tests to have more fine-grained control over the mock
on-disk state. Add tests covering upgrades from staging paths missing
namespaces.

Fixes: https://github.com/hashicorp/nomad/issues/18741
2024-05-13 11:24:09 -04:00
Tim Gross
623486b302 deps: vendor containernetworking/plugins functions for net NS utils (#20556)
We bring in `containernetworking/plugins` for the contents of a single file,
which we use in a few places for running a goroutine in a specific network
namespace. This code hasn't needed an update in a couple of years, and a good
chunk of what we need was previously vendored into `client/lib/nsutil`
already.

Updating the library via dependabot is causing errors in Docker driver tests
because it updates a lot of transient dependencies, and it's bringing in a pile
of new transient dependencies like opentelemetry. Avoid this problem going
forward by vendoring the remaining code we hadn't already.

Ref: https://github.com/hashicorp/nomad/pull/20146
2024-05-13 09:10:16 -04:00
Tim Gross
baee2a0f38 docs: correct ACL requirements for CSI plugins (#20552)
CSI plugins are not namespaced, and there's no "list plugin" ACL. Instead,
listing and reading plugins require the `plugin:read` ACL.
2024-05-13 09:10:02 -04:00
James Rasell
65d86cbccc github: fix lint action check with install-vault descriptions. (#20547) 2024-05-13 09:54:41 +01:00
Tim Gross
1251c1ded9 docs: note that plugin policy is required in the UI for CSI volumes (#20557)
The ACL docs have a section explaining that some parts of the UI need slightly
wider read permissions than expected. These docs should include that you need
`plugin:read` to look at CSI volume pages in the UI.

Fixes: https://github.com/hashicorp/nomad/issues/18527
2024-05-10 16:42:10 -04:00
James Rasell
7e42ad869a client: fix unallocated CPU metric when reserved cpu is set. (#20543) 2024-05-09 10:55:22 +01:00
Phil Renaud
8620fdca85 Adds the token name to the Profile link in the top nav (#20539) 2024-05-08 12:33:58 -04:00
Piotr Kazmierczak
fe1533e638 Merge pull request #20536 from hashicorp/release/1.8.0-beta.1
Release/1.8.0 beta.1
2024-05-08 08:29:04 +02:00
Seth Hoenig
14a022cbc0 drivers/raw_exec: enable setting cgroup override values (#20481)
* drivers/raw_exec: enable setting cgroup override values

This PR enables configuration of cgroup override values on the `raw_exec`
task driver. WARNING: setting cgroup override values eliminates any
gauruntee Nomad can make about resource availability for *any* task on
the client node.

For cgroup v2 systems, set a single unified cgroup path using `cgroup_v2_override`.
The path may be either absolute or relative to the cgroup root.

config {
  cgroup_v2_override = "custom.slice/app.scope"
}

or

config {
  cgroup_v2_override = "/sys/fs/cgroup/custom.slice/app.scope"
}

For cgroup v1 systems, set a per-controller path for each controller using
`cgroup_v1_override`. The path(s) may be either absolute or relative to
the controller root.

config {
  cgroup_v1_override = {
    "pids": "custom/app",
    "cpuset": "custom/app",
  }
}

or

config {
  cgroup_v1_override = {
    "pids": "/sys/fs/cgroup/pids/custom/app",
    "cpuset": "/sys/fs/cgroup/cpuset/custom/app",
  }
}

* drivers/rawexec: ensure only one of v1/v2 cgroup override is set

* drivers/raw_exec: executor should error if setting cgroup does not work

* drivers/raw_exec: create cgroups in raw_exec tests

* drivers/raw_exec: ensure we fail to start if custom cgroup set and non-root

* move custom cgroup func into shared file

---------

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2024-05-07 16:46:27 -07:00
hc-github-team-nomad-core
e1333eb9f6 Prepare for next release 2024-05-07 07:06:12 +00:00
hc-github-team-nomad-core
e1a176c120 Generate files for 1.8.0-beta.1 release 2024-05-07 07:06:07 +00:00
Piotr Kazmierczak
d68d9c27e1 Prepare release 1.8.0-beta.1 2024-05-07 09:01:15 +02:00
James Rasell
5041460043 core: do not create evaluations within batch deregister endpoint. (#20510)
The batch deregister RPC endpoint is only used by the internal
garbage collection process, it is not exposed via the HTTP API or
used anywhere else.

The GC process ensures that a job can only be removed from state
if all related evaluations and allocations are in a state that
means they can also be removed from state. This means that we do
not need to create evaluations when jobs are being deregistered
via this endpoint.
2024-05-07 07:39:13 +01:00
Phil Renaud
16479af38d Jobs Index Page: Live Updates + Pagination (#20452)
* Hook and latch on the initial index

* Serialization and restart of controller and table

* de-log

* allocBlocks reimplemented at job model level

* totalAllocs doesnt mean on jobmodel what it did in steady.js

* Hamburgers to sausages

* Hacky way to bring new jobs back around and parent job handling in list view

* Getting closer to hook/latch

* Latch from update on hook from initialize, but fickle

* Note on multiple-watch problem

* Sensible monday morning comment removal

* use of abortController to handle transition and reset events

* Next token will now update when there's an on-page shift

* Very rough anti-jostle technique

* Demoable, now to move things out of route and into controller

* Into the controller, generally

* Smarter cancellations

* Reset abortController on index models run, and system/sysbatch jobs now have an improved groupCountSum computed property

* Prev Page reverse querying

* n+1th jobs existing will trigger nextToken/pagination display

* Start of a GET/POST statuses return

* Namespace fix

* Unblock tests

* Realizing to my small horror that this skipURLModification flag may be too heavy handed

* Lintfix

* Default liveupdates localStorage setting to true

* Pagination and index rethink

* Big uncoupling of watchable and url-append stuff

* Testfixes for region, search, and keyboard

* Job row class for test purposes

* Allocations in test now contain events

* Starting on the jobs list tests in earnest

* Forbidden state de-bubbling cleanup

* Job list page size fixes

* Facet/Search/Filter jobs list tests skipped

* Maybe it's the automatic mirage logging

* Unbreak task unit test

* Pre-sort sort

* styling for jobs list pagination and general PR cleanup

* moving from Job.ActiveDeploymentID to Job.LatestDeployment.ID

* modifyIndex-based pagination (#20350)

* modifyIndex-based pagination

* modifyIndex gets its own column and pagination compacted with icons

* A generic withPagination handler for mirage

* Some live-PR changes

* Pagination and button disabled tests

* Job update handling tests for jobs index

* assertion timeout in case of long setTimeouts

* assert.timeouts down to 500ms

* de-to-do

* Clarifying comment and test descriptions

* Bugfix: resizing your browser on the new jobs index page would make the viz grow forever (#20458)

* [ui] Searching and filtering options (#20459)

* Beginnings of a search box for filter expressions

* jobSearchBox integration test

* jobs list updateFilter initial test

* Basic jobs list filtering tests

* First attempt at side-by-side facets and search with a computed filter

* Weirdly close to an iterative approach but checked isnt tracked properly

* Big rework to make filter composition and decomposition work nicely with the url

* Namespace facet dropdown added

* NodePool facet dropdown added

* hdsFacet for future testing and basic namespace filtering test

* Namespace filter existence test

* Status filtering

* Node pool/dynamic facet test

* Test patchups

* Attempt at optimize test fix

* Allocation re-load on optimize page explainer

* The Big Un-Skip

* Post-PR-review cleanup

* todo-squashing

* [ui] Handle parent/child jobs with the paginated Jobs Index route (#20493)

* First pass at a non-watchQuery version

* Parameterized jobs get child fetching and jobs index status style for parent jobs

* Completed allocs vs Running allocs in a child-job context, and fix an issue where moving from parent to parent would not reset index

* Testfix and better handling empty-child-statuses-list

* Parent/child test case

* Dont show empty allocation-status bars for parent jobs with no children

* Splits Settings into 2 sections, sign-in/profile and user settings (#20535)

* Changelog
2024-05-06 17:09:37 -04:00
Phil Renaud
890c2ce713 Remove json linting while editing variables (#20529) 2024-05-03 16:33:33 -04:00
Daniel Bennett
cf87a556b3 api: new /v1/jobs/statuses endpoint for /ui/jobs page (#20130)
introduce a new API /v1/jobs/statuses, primarily for use in the UI,
which collates info about jobs, their allocations, and latest deployment.

currently the UI gets *all* of /v1/jobs and sorts and paginates them client-side
in the browser, and its "summary" column is based on historical summary data
(which can be visually misleading, and sometimes scary when a job has failed
at some point in the not-yet-garbage-collected past).

this does pagination and filtering and such, and returns jobs sorted by ModifyIndex,
so latest-changed jobs still come first. it pulls allocs and latest deployment
straight out of current state for more a more robust, holistic view of the job status.
it is less efficient per-job, due to the extra state lookups, but should be more efficient
per-page (excepting perhaps for job(s) with very-many allocs).

if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`,
then the response will be limited to that subset of jobs. the main goal here is to
prevent "jostling" the user in the UI when jobs come into and out of existence.

and if a blocking query is started with `?index=N`, then the query should only
unblock if jobs "on page" change, rather than any change to any of the state
tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary
HTTP round trips.
2024-05-03 15:01:40 -05:00
Tim Gross
54fc146432 agent: add support for sdnotify protocol (#20528)
Nomad agents expect to receive `SIGHUP` to reload their configuration. The
signal handler for this is installed fairly late in agent startup, after the
client or server components are up and running. This means that configuration
management tools can potentially reload the configuration before the agent can
handle it, causing the agent to crash.

We don't want to allow configuration reload during client or server component
startup, because it would significantly complicate initialization. Instead,
we'll implement the systemd notify protocol. This causes systemd to block
sending configuration reload signals until the agent is actually ready. Users
can still bypass this by sending signals directly.

Note that there are several Go libraries that implement the sdnotify protocol,
but most are part of much larger projects which would create a lot of dependabot
burden. The bits of the protocol we need are extremely simple to implement in a
just a couple of functions.

For non-Linux or non-systemd Linux systems, this feature is a no-op. In future
work we could potentially implement service notification for Windows as well.

Fixes: https://github.com/hashicorp/nomad/issues/3885
2024-05-03 13:42:07 -04:00
Tim Gross
f41bc468eb consul: provide CONSUL_HTTP_TOKEN env var to tasks (#20519)
When available, we provide an environment variable `CONSUL_TOKEN` to tasks, but
this isn't the environment variable expected by the Consul CLI. Job
specifications like deploying an API Gateway become noticeably nicer if we can
instead provide the expected env var.
2024-05-03 11:30:33 -04:00
James Rasell
cd9e032855 deps: upgrade hashicorp/cap to v0.6.0 (#20517) 2024-05-03 15:30:48 +01:00
Tim Gross
f9dd120d29 cli: add -jwks-ca-file to Vault/Consul setup commands (#20518)
When setting up auth methods for Consul and Vault in production environments, we
can typically assume that the CA certificate for the JWKS endpoint will be in
the host certificate store (as part of the usual configuration management
cluster admins needs to do). But for quick demos with `-dev` agents, this won't
be the case.

Add a `-jwks-ca-file` parameter to the setup commands so that we can use this
tool to quickly setup WI with `-dev` agents running TLS.
2024-05-03 08:26:29 -04:00
Seth Hoenig
422d62df89 checklist: remove steps for openapi for rpc (#20515) 2024-05-02 08:53:45 -05:00
James Rasell
3f866a7e82 test: regenerate test TLS certificates. (#20511) 2024-05-02 13:58:32 +01:00
Michael Schurter
3aefc010d7 test: remove spurious print statements (#20503) 2024-05-01 09:47:56 -07:00
Tim Gross
77dc74a301 quota: ensure quota usage is freed when jobs are purged (#20492)
When a job is purged, we delete all its allocations and the client detects the
absense of the allocations to clean up its resources locally. But the client
won't be able to send an allocation status update in this case, which frees the
quota being used by that allocation. Instead, we need to free the quota usage
inside the state store immediately. To do so, we check if the allocation is
already client-terminal before copying it and passing it into the Enterprise
code for cleanup.

This commit also refactors the job delete to make it clear there's a single
caller of this alloc deletion path. This refactoring eliminates some wasteful
logic that queries the "allocs" table, allocates a slice of strings for their
IDs, and then queries the "allocs" table one-by-one for each of them for
deletion anyways.

Tests for this code can be found in the linked ENT repo PR.

Fixes: https://github.com/hashicorp/nomad-enterprise/issues/1422
Ref: https://hashicorp.atlassian.net/browse/NOMAD-620
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1432
2024-05-01 08:44:22 -04:00
Piotr Kazmierczak
abe9c0803a e2e: unflake TestWorkloadIdentity/testNobody (#20499)
sometimes the container quits too fast
2024-04-30 18:17:14 +02:00
James Rasell
05a7bb53d3 cli: fix handling of scaling jobs which don't generate evals. (#20479)
In some cases, Nomad job scaling will not generate evaluations
such as parameterized jobs. This change fixes the CLI behaviour
in this case, and copies the job run command for consistency.
2024-04-30 10:32:31 +01:00
Tim Gross
ff2d9de592 Revert "E2E: skip Vault 1.16.1 for JWT compatibility test (#20301)" (#20484)
This reverts commit 45b36371a12ffae5b5bfaaeadb08f801fb6bc98d. Now that Vault
1.16.2 has shipped, the E2E test will pick up only a working version.

Closes: https://github.com/hashicorp/nomad/issues/20298
2024-04-26 09:36:09 -04:00
Seth Hoenig
5f64e42d73 client: fixup how alloc mounts directory are setup (#20463) 2024-04-26 07:29:52 -05:00
Seth Hoenig
7874d21881 docs: add exec2 task driver page (#20480) 2024-04-24 07:26:54 -05:00
Seth Hoenig
8ae1a0e356 docs: add docs around dynamic workload users (#20477) 2024-04-23 07:57:40 -05:00
Seth Hoenig
1dfc715721 docs: add docs for fsisolation.Unveil fs isolation mode (#20475) 2024-04-23 07:55:54 -05:00
Daniel Bennett
3ac3bc1cfe acl: token global mode can not be changed (#20464)
true up CLI and docs with API reality
2024-04-22 11:58:47 -05:00
Tim Gross
ea5f2f6748 acl: remove remaining unused nil ACL object handling (#20456)
As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the
object returned by ResolveACL if there's no error return, because in the case
where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking
nil is not a bug but should be discouraged because it opens us up to future bugs
that would bypass ACLs.

We fixed a bunch of these cases in https://github.com/hashicorp/nomad/pull/20150
but I didn't update the semgrep rule, which meant we missed a few more. Update
the semgrep rule and fix the remaining cases.
2024-04-18 14:34:17 -04:00
Piotr Kazmierczak
048f4511e2 docs: correct nanoseconds to milliseconds for MeasureSince metrics (#20446) 2024-04-18 18:16:58 +02:00
dependabot[bot]
b25de662a1 chore(deps): bump github.com/docker/docker from 25.0.2+incompatible to 26.0.1+incompatible (#20389)
* chore(deps): bump github.com/docker/docker

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 25.0.2+incompatible to 26.0.1+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v25.0.2...v26.0.1)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* include changelog

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-04-18 11:35:09 -04:00
Tim Gross
e4fe564bba deps: update golang.org/x/net (#20434)
Although Nomad does not use HTTP2, vulnerability scans detect our version of
`golang.org/x/net` as having an HPACK DoS vuln (GHSA-4v7x-pqxf-cx7m). Upgrade
the library so as to quiet the alerts.

Fixes: https://github.com/hashicorp/nomad-enterprise/issues/1423
2024-04-18 10:34:35 -04:00
Tim Gross
b662f1e6e5 docs: fix incorrect dispatch payload limit in API docs (#20433)
The dispatch payload limit is limited to 16KiB, not 64KiB. It's correct in the
command docs but incorrect in the API docs.

Ref: https://github.com/hashicorp/nomad/blob/v1.7.7/nomad/job_endpoint.go#L36-L38
Fixes: https://github.com/hashicorp/nomad/issues/20432
2024-04-18 10:20:15 -04:00