Commit Graph

27077 Commits

Author SHA1 Message Date
Allison Larson
fd16f80b5a Only error on constraints if no allocs are running (#25850)
* Only error on constraints if no allocs are running

When running `nomad job run <JOB>` multiple times with constraints
defined, there should be no error as a result of filtering out nodes
that do not/have not ever satsified the constraints.

When running a systems job with constraint, any run after an initial
startup returns an exit(2) and a warning about unplaced allocations due
to constraints. An error that is not encountered on the initial run,
though the constraint stays the same.

This is because the node that satisfies the condition is already running
the allocation, and the placement is ignored. Another placement is
attempted, but the only node(s) left are the ones that do not satisfy
the constraint. Nomad views this case (no allocations that were
attempted to placed could be placed successfully) as an error, and
reports it as such. In reality, no allocations should be placed or
updated in this case, but it should not be treated as an error.

This change uses the `ignored` placements from diffSystemAlloc to attempt to
determine if the case encountered is an error (no ignored placements
means that nothing is already running, and is an error), or is not one
(an ignored placement means that the task is already running somewhere
on a node). It does this at the point where `failedTGAlloc` is
populated, so placement functionality isn't changed, just the field that
populates error.

There is functionality that should be preserved which (correctly)
notifies a user if a job is attempted that cannot be run on any node due
to the constraints filtering out all available nodes. This should still
behave as expected.

* Add changelog entry

* Handle in-place updates for constrained system jobs

* Update .changelog/25850.txt

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>

* Remove conditionals

---------

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2025-05-15 15:14:03 -07:00
Tim Gross
9ee2582379 upgrade test: remove change mode from Vault workload (#25861)
During the upgrade test we can trigger a re-render of the Vault secret due to
client restart before the allocrunner has marked the task as running, which
triggers the change mode on the template and restarts the task. This results in
a race where the alloc is still "pending" when we go to check it. We never
change the value of this secret in upgrade testing, so paper over this race
condition by setting a "noop" change mode.
2025-05-15 10:10:58 -04:00
James Rasell
be84613dc3 test: Only run and lint Linux network hook test on Linux. (#25858) 2025-05-15 13:33:37 +01:00
Martina Santangelo
18eddf53a4 commands: adds job start command to start stopped jobs (#24150)
---------

Co-authored-by: Michael Smithhisler <michael.smithhisler@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-05-14 15:17:44 -04:00
Tim Gross
8a87c33594 build: pin actionlint workflow (#25855)
We're required to pin Docker images for Actions to a specific SHA now and this
is tripping scans in the Enterprise repo. Update the actionlint image.

Ref: https://go.hashi.co/memo/sec-032
2025-05-14 14:25:37 -04:00
James Rasell
ef25c3d55a cli: Fix help indentation format on node meta commands. (#25851) 2025-05-14 14:53:48 +01:00
Tim Gross
8a5a057d88 offline license utilization reporting (#25844)
Nomad Enterprise users operating in air-gapped or otherwise secured environments
don't want to send license reporting metrics directly from their
servers. Implement manual/offline reporting by periodically recording usage
metrics snapshots in the state store, and providing an API and CLI by which
cluster administrators can download the snapshot for review and out-of-band
transmission to HashiCorp.

This is the CE portion of the work required for implemention in the Enterprise
product. Nomad CE does not perform utilization reporting.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673
Ref: https://hashicorp.atlassian.net/browse/NMD-68
Ref: https://go.hashi.co/rfc/nmd-210
2025-05-14 09:51:13 -04:00
Aimee Ukasick
79d35f072a Move environment section; CE-712 (#25845) 2025-05-13 12:31:08 -05:00
Piotr Kazmierczak
57cd7d7bca admin: Post 1.10.1 release (#25842) 2025-05-13 14:46:47 +02:00
Tim Gross
6c9f2fdd29 reduce upgrade testing flakes (#25839)
This changeset includes several adjustments to the upgrade testing scripts to
reduce flakes and make problems more understandable:

* When a node is drained prior to the 3rd client upgrade, it's entirely
  possible the 3rd client to be upgraded is the drained node. This results in
  miscounting the expected number of allocations because many of them will be
  "complete" (service/batch) or "pending" (system). Leave the system jobs running
  during drains and only count the running allocations at that point as the
  expected set. Move the inline script that gets this count into a script file for
  legibility.

* When the last initial workload is deployed, it's possible for it to be
  briefly still in "pending" when we move to the next step. Poll for a short
  window for the expected count of jobs.

* Make sure that any scripts that are being run right after a server or client
 is coming back up can handle temporary unavailability gracefully.

* Change the debugging output of several scripts to avoid having the debug
  output run into the error message (Ex. "some allocs are not running" looked like
  the first allocation running was the missing allocation).

* Add some notes to the README about running locally with `-dev` builds and
  tagging a cluster with your own name.

Ref: https://hashicorp.atlassian.net/browse/NMD-162
2025-05-13 08:40:22 -04:00
Piotr Kazmierczak
1bbd9eb4b0 sentinel 2025-05-13 14:38:59 +02:00
Piotr Kazmierczak
c590c4dd3c Merge release 1.10.1 files 2025-05-13 14:28:42 +02:00
hc-github-team-nomad-core
31b7a94a88 Prepare for next release 2025-05-13 14:26:48 +02:00
hc-github-team-nomad-core
9ef42e9807 Generate files for 1.10.1 release 2025-05-13 14:26:48 +02:00
Juana De La Cuesta
695ba2c159 Fix the verify alloc script (#25837)
* fix: use the raw option on jq to avoid trating the " like a char

* Update verify_allocs.sh
2025-05-12 14:53:28 +02:00
dependabot[bot]
120c7bd6e0 chore(deps): bump golang.org/x/sync from 0.13.0 to 0.14.0 (#25828)
Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.13.0 to 0.14.0.
- [Commits](https://github.com/golang/sync/compare/v0.13.0...v0.14.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sync
  dependency-version: 0.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-12 11:13:10 +02:00
dependabot[bot]
6de7523de3 chore(deps): bump google.golang.org/grpc from 1.71.1 to 1.72.0 (#25767)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.71.1 to 1.72.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.71.1...v1.72.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.72.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-12 10:51:22 +02:00
dependabot[bot]
a7ad560285 chore(deps): bump github.com/miekg/dns from 1.1.65 to 1.1.66 (#25829)
Bumps [github.com/miekg/dns](https://github.com/miekg/dns) from 1.1.65 to 1.1.66.
- [Changelog](https://github.com/miekg/dns/blob/master/Makefile.release)
- [Commits](https://github.com/miekg/dns/compare/v1.1.65...v1.1.66)

---
updated-dependencies:
- dependency-name: github.com/miekg/dns
  dependency-version: 1.1.66
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-12 10:50:43 +02:00
dependabot[bot]
1d8b9c72a3 chore(deps): bump golang.org/x/sys from 0.32.0 to 0.33.0 (#25830)
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.32.0 to 0.33.0.
- [Commits](https://github.com/golang/sys/compare/v0.32.0...v0.33.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-version: 0.33.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-12 10:13:43 +02:00
dependabot[bot]
e6d90104c5 chore(deps): bump github.com/hashicorp/consul/api from 1.30.0 to 1.32.1 (#25831)
Bumps [github.com/hashicorp/consul/api](https://github.com/hashicorp/consul) from 1.30.0 to 1.32.1.
- [Release notes](https://github.com/hashicorp/consul/releases)
- [Changelog](https://github.com/hashicorp/consul/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/consul/compare/api/v1.30.0...api/v1.32.1)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/consul/api
  dependency-version: 1.32.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-12 10:13:07 +02:00
James Rasell
0b265d2417 encrypter: Track initial tasks for is ready calculation. (#25803)
The server startup could "hang" to the view of an operator if it
had a key that could not be decrypted or replicated loaded from
the FSM at startup.

In order to prevent this happening, the server startup function
will now use a timeout to wait for the encrypter to be ready. If
the timeout is reached, the error is sent back to the caller which
fails the CLI command. This bubbling of error message will also
flush to logs which will provide addition operator feedback.

The server only cares about keys loaded from the FSM snapshot and
trailing logs before the encrypter should be classed as ready. So
that the encrypter ready function does not get blocked by keys
added outside of the initial Raft load, we take a snapshot of the
decryption tasks as we enter the blocking call, and class these as
our barrier.
2025-05-07 15:38:16 +01:00
Tim Gross
3690a0118e build: update go toolchain to 1.24.3 (#25818) 2025-05-07 09:57:31 -04:00
James Rasell
296d03d9dd encrypter: Remove tracking of cancelation for decrypt tasks. (#25795)
New wrapped keys were added to the encrypter and tracked using
their keyID with the context cancelation function. This tracking
was performed primarily so the FSM could load its known key
objects and logs with entries for the same ID superseding existing
decryption tasks. This is a hard to reason about approach and in
theory can cause timing problems in conjunction with the locking.

The new approach still tracks decryption tasks but does not store
the cancelation context. This context is now controlled within a
single function in an attempt to provide a clearer workflow. In
the event two calls for the same key are made in close succession
meaning there is no entry in the keyring for the key yet, all
tasks will be launched. The first-past-the-post will write the
cipher to encrypter state, the second task will complete but not
write the cipher.
2025-05-07 14:35:24 +01:00
Juana De La Cuesta
cb09696b1c Nojira upgrade3 (#25817)
* fix: typo

* fix: correct the script for unbound var

* fix: typo

* fix: typo
2025-05-06 18:21:33 +02:00
Juana De La Cuesta
f68203549b Fix the verify allocs, missing echo (#25816)
* fix: typo

* fix: correct the script for unbound var

* fix: typo
2025-05-06 17:16:56 +02:00
Juana De La Cuesta
42d4067d55 Nojira upgrade3 (#25815)
* fix: typo

* fix: correct the script for unbound var
2025-05-06 16:57:44 +02:00
Juana De La Cuesta
da0ea9935d fix: typo (#25814) 2025-05-06 16:44:25 +02:00
Juana De La Cuesta
22921418b6 Check for allocs running before checking for IDs after a client upgrade (#25790)
* fix: wait for all allocs to be running before checking for their IDs after client upgrade

* style: linter fix

* fix: filter running allocs per client ID when checking for allocs after upgrade
2025-05-06 16:22:45 +02:00
dependabot[bot]
242ee16c81 chore(deps): bump github.com/docker/docker (#25810)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.0.4+incompatible to 28.1.1+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v28.0.4...v28.1.1)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-version: 28.1.1+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-05 09:04:32 -04:00
Tim Gross
da592ab1b7 testing: fix vault setup test's reliance on specific Raft index (#25806)
The test for `nomad setup vault` command expects a specific `CreateIndex` for the
job it creates. Any Raft write when a server comes up or establishes leadership
can cause this test to break. Interpolate the expected index as we've done for
other indexes on the job to make this test less brittle.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673#issuecomment-2847619747
2025-05-02 14:30:10 -04:00
James Rasell
21fd0bbb8a ci: Regenerate TLS certificates used for testing. (#25804) 2025-05-02 13:51:02 +01:00
James Rasell
449da5bc11 deps: Update mitchellh/colorstring to d06e56a500db (#25801) 2025-05-02 11:30:41 +01:00
James Rasell
01cd762d27 encrypter: Ignore wrapped key additions with zero wrapped keys. (#25791)
When a Nomad server restores its state via a snapshot and logs, it
is possible a legacy wrapped key object/log is found. This key
will not contain any wrapped keys and therefore should be ingored
within the encrypter.

It is theoretically possible without this change that a key which
generates zero decrypt tasks supersedes a running task and will
place itself in the tracked decrypt task tracker. This decrypt
task has no running work to remove its entry.
2025-05-01 14:54:41 +01:00
Juana De La Cuesta
dfc1412e22 Merge pull request #25721 from hashicorp/NMD-321-reload
Force an agent return if there is an error on reload
2025-05-01 14:43:08 +02:00
dependabot[bot]
f54804c16b chore(deps): bump github.com/miekg/dns from 1.1.64 to 1.1.65 (#25766) 2025-05-01 07:40:09 +01:00
Chris Roberts
a69baeea8c Merge pull request #25792 from hashicorp/b-pagination-tkn
paginator: fix tokenizer comparison of composite index and ID
2025-04-30 13:14:11 -07:00
Chris Roberts
ba1683f40e Update wording in the changelog entry 2025-04-30 11:17:19 -07:00
Chris Roberts
db360fc085 paginator: fix tokenizer comparison of composite index and ID
The `CreateIndexAndIDTokenizer` creates a composite token by
combining the create index value and ID from the object with
a `.`. Tokens are then compared lexicographically. The comparison
is appropriate for the ID segment of the token, but it is not for
the create index segement. Since the create index values are stored
with numeric ordering, using a lexicographical comparison can cause
unexpected results.

For example, when comparing the token `12.object-id` to `102.object-id`
the result will show `12.object-id` being greater. This is the
correct comparison but it is incorrect for the intention of the token.
With the knowledge of the composition of the token, the response
should be that `12.object-id` is less.

The unexpected behavior can be seen when performing lists (like listing
allocations). The behavior is encountered inconsistently due to
two requirements which must be met:

1. Create index values with a large enough span (ex: 12 and 102)
2. Correct per page value to get a "bad" next token (ex: prefix with 102)

To prevent the unexpected behavior, the target token is split
and the components are used individually to compare against the
object.

Fixes #25435
2025-04-30 09:51:24 -07:00
Juana De La Cuesta
dcaa96f0e5 Update website/content/docs/upgrade/upgrade-specific.mdx
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-04-30 15:03:49 +02:00
zouyu1026
18e508ff05 Update index.mdx (#25755)
the old link https://caravanproject.io/ point to Gambling website.
update the github wiki
2025-04-30 06:22:29 -05:00
Juana De La Cuesta
e8fb36f4d3 Style: typo 2025-04-30 13:01:57 +02:00
Juanadelacuesta
9288a3141a func and docs: Use the config from the client and not from the agent that is already parsed. Add the breaking change to the release notes 2025-04-30 10:53:02 +02:00
Tu Nguyen
bee2400958 update iframe to videoembed (#25783) 2025-04-29 10:58:04 -05:00
Aimee Ukasick
4075b0b8ba Docs: Add garbage collection page (#25715)
* add garbage collection page

* finish client; add resources section

* finish server section; task driver section

* add front matter description

* fix typos

* Address Tim's feedback
2025-04-28 08:37:23 -05:00
Adrian Todorov
a4dd1c962e docs: Update Nvidia device driver docs to link to list of supported cards and newer versions (#25531)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-04-28 08:32:58 +01:00
dependabot[bot]
4be69dddd4 chore(deps): bump github.com/hashicorp/vault/api from 1.15.0 to 1.16.0 (#25763) 2025-04-28 07:49:49 +01:00
dependabot[bot]
71065af720 chore(deps): bump github.com/hashicorp/go-discover (#25764) 2025-04-28 07:14:28 +01:00
Piotr Kazmierczak
3e688cf928 acl: add missing JWT auth method validation (#25757) 2025-04-25 14:53:25 +02:00
Piotr Kazmierczak
32ca833c70 client: unflake TestClient_ACL_ResolveToken_InvalidClaims (#25758) 2025-04-25 14:53:09 +02:00
James Rasell
e928131482 ui: Only show paused icon when allocs in pending state are paused. (#25742)
Jobs were being marked incorectly as having paused allocations
when termimal allocations were marked with the paused boolean. The
UI should only mark a job as including paused allocations when
these paused allocations are in the correct client state, which is
pending.

---------

Co-authored-by: Phil Renaud <phil@riotindustries.com>
2025-04-25 07:45:45 +01:00