Commit Graph

25881 Commits

Author SHA1 Message Date
Piotr Kazmierczak
0e8a67f0e1 docker: oom_score_adj support (#23297) 2024-06-12 10:49:20 +02:00
Matt McQuillan
7f1665d326 Merge pull request #23286 from hashicorp/mmcquillan/jiraworkflow
Adding GHA workflow to sync with Jira
2024-06-11 12:52:57 -04:00
Tim Gross
44078d4786 docs: update configuration docs to include trace-level logging (#23285) 2024-06-11 09:19:52 -04:00
Tim Gross
7d73065066 numa: fix scheduler panic due to topology serialization bug (#23284)
The NUMA topology struct field `NodeIDs` is a `idset.Set`, which has no public
members. As a result, this field is never serialized via msgpack and persisted
in state. When `numa.affinity = "prefer"`, the scheduler dereferences this nil
field and panics the scheduler worker.

Ideally we would fix this by adding a msgpack serialization extension, but
because the field already exists and is just always empty, this breaks RPC wire
compatibility across upgrades. Instead, create a new field that's populated at
the same time we populate the more useful `idset.Set`, and repopulate the set on
demand.

Fixes: https://hashicorp.atlassian.net/browse/NET-9924
2024-06-11 08:55:00 -04:00
Tim Gross
288a048a2e e2e: add prerelease builds to Consul/Vault compatibility tests (#23287)
Update the Consul/Vault build downloader functions so that we include the
current prerelease build (if any) in our E2E compatibility testing we do on each
PR. This will automatically cycle out when the GA build is released, because
that build is "higher" in the sorted set.
2024-06-11 08:54:27 -04:00
Tim Gross
61608e43cb test: move NUMA platform scan out of testing global (#23289)
The `testing.go` test helpers file for the driver manager initializes the NUMA
scan as a package-global variable. This causes it to be pulled in even in
production builds, so even running commands like `nomad version` will cause the
NUMA scan to happen. Move the scan into the test helper setup.
2024-06-11 08:52:51 -04:00
James Rasell
00570d221b docs: update ACL policy example spec to remove plugin write cap. (#23277) 2024-06-11 07:44:27 +01:00
Matt McQuillan
55edc0289a Update .github/workflows/jira-sync.yml
From linting, quoting the env var

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-06-10 15:16:58 -04:00
Matt McQuillan
6af76c02d1 Adding GHA workflow to sync with Jira 2024-06-10 13:44:49 -04:00
James Rasell
d2a03ded78 acl: fix validation of ACL plugin policy entries. (#23274) 2024-06-10 16:17:51 +01:00
Tim Gross
fa70267787 scheduler: RescheduleTracker dropped if follow-up fails placements (#12319)
When an allocation fails it triggers an evaluation. The evaluation is processed
and the scheduler sees it needs to reschedule, which triggers a follow-up
eval. The follow-up eval creates a plan to `(stop 1) (place 1)`. The replacement
alloc has a `RescheduleTracker` (or gets its `RescheduleTracker` updated).

But in the case where the follow-up eval can't place all allocs (there aren't
enough resources), it can create a partial plan to `(stop 1) (place 0)`. It then
creates a blocked eval. The plan applier stops the failed alloc. Then when the
blocked eval is processed, the job is missing an allocation, so the scheduler
creates a new allocation. This allocation is _not_ a replacement from the
perspective of the scheduler, so it's not handed off a `RescheduleTracker`.

This changeset fixes this by annotating the reschedule tracker whenever the
scheduler can't place a replacement allocation. We check this annotation for
allocations that have the `stop` desired status when filtering out allocations
to pass to the reschedule tracker. I've also included tests that cover this case
and expands coverage of the relevant area of the code.

Fixes: https://github.com/hashicorp/nomad/issues/12147
Fixes: https://github.com/hashicorp/nomad/issues/17072
2024-06-10 11:15:40 -04:00
nicoche
ffcb72bfe3 api: Add Notes field to service checks (#22397)
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2024-06-10 16:59:49 +02:00
James Rasell
1c976d126e docs: update snapshot inspect CLI detail to mirror recent changes. (#23276) 2024-06-10 14:30:13 +01:00
Phil Renaud
a933292897 Sanitize params input to alert a user when their scenario is invalid (#23261) 2024-06-07 11:12:21 -04:00
Seth Hoenig
45da80bde2 client: cleanup empty task directory when using unveil filesystem isolation (#23237)
This PR fixes a bug where Nomad client would leave behind an empty directory
created on behalf of tasks making use of the unveil filesystem isolation
mode (i.e. using exec2 task driver). Once unmounting is complete, we should
remember to also delete the directory.

Fixes #22433
2024-06-06 10:47:23 -05:00
Tim Gross
71fd5c2474 testing: pull Docker images from mirror (#23190)
In https://github.com/hashicorp/nomad/pull/17401 we added test helpers that
would allow `docker` driver tests to pull from a mirror of the Docker Hub
registry. Extend the use of this helper a test that recently hit
rate-limiting.

Fixes: https://github.com/hashicorp/nomad/issues/23174
2024-06-06 11:21:45 -04:00
Gerard Nguyen
c3c2240304 Update nomad operator snapshot inspect with more detail (#20062)
Co-authored-by: Michael Schurter <michael.schurter@gmail.com>
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2024-06-06 06:57:10 +01:00
Tim Gross
34f34440ac build: remove 32-bit ARM builds (#23189)
We no longer intend to release 32-bit builds for any platform. We'd previously
removed the builds for i386 on both Linux and Windows, but never got around to
removing the ARM builds. Add a note about this deprecation in the release notes
for 1.8.x.
2024-06-05 15:47:20 -04:00
Tim Gross
17093d62f0 docs: describe omitted spread behavior and perf impact (#23184)
Update the documentation for the `spread` block:
* Make it clear that the default behavior within a given job when the `spread`
  block is omitted is to spread out allocs among feasible nodes.
* Describe the difference between the `spread` block and `spread` scheduler
  algorithm.
* Add warnings about the performance impact of using `spread` and how to
  mitigate it.
2024-06-05 13:28:09 -04:00
Piotr Kazmierczak
abc6fe325d docs: fix typo in nomad quota utilization metrics (#23185) 2024-06-05 16:20:44 +02:00
Tim Gross
c99428d553 build: update to go1.22.4 (#23172)
Update Go toolchain to 1.22.4, which addresses two vulnerabilities in the Go
stdlib.

* CVE-2024-24789: impacts handling of certain types of invalid zip files, which
  could be exploited to create a zip file with unexpected contents. This could
  potentially impact Nomad users of `artifact` blocks who download untrusted
  artifacts.
* CVE-2024-24790: impacts parsing of IPv4-mapped IPv6 addresses.
2024-06-05 09:03:15 -04:00
Will Owens
e6bf43e825 jobspec2: add test for parsing contraint alternates (#23175) 2024-06-05 09:02:39 -04:00
Tim Gross
39dee90ad4 docs: clarify node drain behavior for batch workloads (#23170)
Our documentation for the `node drain` command doesn't include a treatment of
batch jobs, which are not migrated. The user is left to piece this behavior
together from the `migrate` documentation and the tutorial. Instead, let's
explicitly list the behaviors per job type.

Fixes: https://github.com/hashicorp/nomad/issues/17563
2024-06-05 08:47:37 -04:00
Tim Gross
67967c99a7 scheduler: stack test should use job.ID and not job.Name (#23169)
Some of our scheduler tests use the `AllocName` function from the structs
package incorrectly. This function should always receive the `Job.ID` and not
the `Job.Name`. Fix this to prevent future bugs from copy-pasting usage around.
2024-06-05 08:34:04 -04:00
Charlie Voiselle
74d8bc5d01 Updating hashicorp/vault-action to v3.0.0 (#23171)
Removes a dependency on a node 16 action, which are EOLed.
2024-06-04 15:40:18 -04:00
Seth Hoenig
d9416afee5 testing: fix the value of NOMAD_SECRETS_DIR in test harness (#23166)
This PR fixes the value of NOMAD_SECRETS_DIR to be the alloc_mounts
secrets directory instead of the real secrets directory, which is protected
by root 0700 even when running tests.

Needed for https://github.com/hashicorp/nomad-driver-exec2/issues/29
2024-06-04 10:58:10 -05:00
James Rasell
e73d8bb114 docs: update exec2 install apt/yum commands for pre-release. (#22428) 2024-06-04 14:41:57 +01:00
Ryan R Sundberg
096c72a2f4 Consul Connect: Fix validation with multiple local_bind_socket_paths (#22312)
When a consul connect sidecar service is defined with multiple
local_bind_socket_path upstreams, validation would fail due to duplicate
socket address bindings on `:0` being detected.

Validate local_bind_socket_path sockets separately from IP address
sockets.
2024-06-04 08:46:24 -04:00
dependabot[bot]
13e1a72325 build(deps): bump minimatch in /scripts/screenshots/src (#15353)
Bumps [minimatch](https://github.com/isaacs/minimatch) from 3.0.4 to 3.1.2.
- [Release notes](https://github.com/isaacs/minimatch/releases)
- [Commits](https://github.com/isaacs/minimatch/compare/v3.0.4...v3.1.2)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-03 16:39:39 -04:00
Piotr Kazmierczak
2a09abc477 metrics: quota utilization configuration and documentation (#22912)
Introduces support for (optional) quota utilization metrics

CE part of the hashicorp/nomad-enterprise#1488 change
2024-06-03 21:06:19 +02:00
Charlie Voiselle
180bab892d Update hcl/v2 to latest patched version v2.20.2-0.20240517235513-55d9c02d147d (#22439) 2024-05-31 15:42:17 -04:00
Phil Renaud
784ec507b8 Omit the current-time-displaying components during our visual diff tests (#22435) 2024-05-31 13:41:26 -04:00
Phil Renaud
ddfadca618 Checking for the type of event param before executing a lazy click (#22429) 2024-05-31 13:24:22 -04:00
Phil Renaud
014f5145dc Lockfile and bindata_assetfs recompiled on latest main (#22434) 2024-05-31 13:23:59 -04:00
Phil Renaud
36c2439503 [ui] Tests for Sentinel Policies (#22398)
* Tests for Sentinel Policies UI

* Further sentinel tests

* job allocations test reinstated
2024-05-31 10:38:54 -04:00
Seth Hoenig
2054e87158 e2e: add tests for exec2 task driver (#22406)
* e2e: add tests for exec2 task driver

* e2e: use envoy 1.29.4 because consul

* e2e: add a bridge networking http test for exec driver

* e2e: split up http test so curl always starts after the server
2024-05-31 09:22:39 -05:00
Phil Renaud
86ee56b8c5 [ui] Jobs index page badge for when a job has a paused task (#22392)
* Adds a badge on the jobs index page if any task within any allocation of a running job is currently paused

* Snapshot and acceptance tests for paused states

* Cleared yarn cache

* Remove MirageScenario from the test dependency chain

* Logging before toString

* Cardinal sin of time-based test execution

* Maybe weve been lucky for years and the clientStatus has always been running for this test by happenstance

* Back away from the time-based and toward the settled() approach
2024-05-30 21:18:35 -04:00
Piotr Kazmierczak
307fd590d7 docker: new container_exists_attempts configuration field (#22419)
This allows users to set a custom value of attempts that will be made to purge
an existing (not running) container if one is found during task creation.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-05-30 19:22:14 +02:00
Piotr Kazmierczak
bf11e39ac8 docker: add a unit test for "container already exists" error when creating containers (#22238) 2024-05-30 11:24:28 +02:00
James Rasell
6cb9bed236 docs: add operations benchmarking page with nomad-bench link. (#22393) 2024-05-30 07:34:10 +01:00
Phil Renaud
1412e65bbd [ui] Dropdowns on the jobs index page get a max-height and filtering (#20626)
* Adds a max-height to dropdowns lest they get any funny ideas

* Filter filtering
2024-05-29 21:01:57 -04:00
David Yu
5f0dea189e Merge pull request #22411 from hashicorp/docs-tbte
docs: add docs for time based task execution
2024-05-29 16:23:57 -07:00
Michael Schurter
7048d3a482 link release notes to schedule block 2024-05-29 15:53:15 -07:00
Michael Schurter
a2fe43030c rap 2024-05-29 15:50:33 -07:00
Michael Schurter
5a0c74d1f9 Apply suggestions from code review
Co-authored-by: David Yu <dyu@hashicorp.com>
2024-05-29 15:50:33 -07:00
Michael Schurter
fe0bda9c34 speling 2024-05-29 15:50:33 -07:00
Michael Schurter
690abefc4a docs: add docs for time based task execution 2024-05-29 15:50:33 -07:00
Phil Renaud
e09b29113c [ui] Helios and Power Select upgrades (#22328)
* Helios and Power Select upgrades

* Renamed namespaced contextual components
2024-05-29 17:00:56 -04:00
Phil Renaud
8a9d58ae8f Storybook scripts and references removed (#22232) 2024-05-29 16:34:26 -04:00
Tim Gross
140747240f consul: include admin partition in JWT login requests (#22226)
When logging into a JWT auth method, we need to explicitly supply the Consul
admin partition if the local Consul agent is in a partition. We can't derive
this from agent configuration because the Consul agent's configuration is
canonical, so instead we get the partition from the fingerprint (if
available). This changeset updates the Consul client constructor so that we
close over the partition from the fingerprint.

Ref: https://hashicorp.atlassian.net/browse/NET-9451
2024-05-29 16:31:09 -04:00