Commit Graph

27056 Commits

Author SHA1 Message Date
Tim Gross
3690a0118e build: update go toolchain to 1.24.3 (#25818) 2025-05-07 09:57:31 -04:00
James Rasell
296d03d9dd encrypter: Remove tracking of cancelation for decrypt tasks. (#25795)
New wrapped keys were added to the encrypter and tracked using
their keyID with the context cancelation function. This tracking
was performed primarily so the FSM could load its known key
objects and logs with entries for the same ID superseding existing
decryption tasks. This is a hard to reason about approach and in
theory can cause timing problems in conjunction with the locking.

The new approach still tracks decryption tasks but does not store
the cancelation context. This context is now controlled within a
single function in an attempt to provide a clearer workflow. In
the event two calls for the same key are made in close succession
meaning there is no entry in the keyring for the key yet, all
tasks will be launched. The first-past-the-post will write the
cipher to encrypter state, the second task will complete but not
write the cipher.
2025-05-07 14:35:24 +01:00
Juana De La Cuesta
cb09696b1c Nojira upgrade3 (#25817)
* fix: typo

* fix: correct the script for unbound var

* fix: typo

* fix: typo
2025-05-06 18:21:33 +02:00
Juana De La Cuesta
f68203549b Fix the verify allocs, missing echo (#25816)
* fix: typo

* fix: correct the script for unbound var

* fix: typo
2025-05-06 17:16:56 +02:00
Juana De La Cuesta
42d4067d55 Nojira upgrade3 (#25815)
* fix: typo

* fix: correct the script for unbound var
2025-05-06 16:57:44 +02:00
Juana De La Cuesta
da0ea9935d fix: typo (#25814) 2025-05-06 16:44:25 +02:00
Juana De La Cuesta
22921418b6 Check for allocs running before checking for IDs after a client upgrade (#25790)
* fix: wait for all allocs to be running before checking for their IDs after client upgrade

* style: linter fix

* fix: filter running allocs per client ID when checking for allocs after upgrade
2025-05-06 16:22:45 +02:00
dependabot[bot]
242ee16c81 chore(deps): bump github.com/docker/docker (#25810)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.0.4+incompatible to 28.1.1+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v28.0.4...v28.1.1)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-version: 28.1.1+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-05 09:04:32 -04:00
Tim Gross
da592ab1b7 testing: fix vault setup test's reliance on specific Raft index (#25806)
The test for `nomad setup vault` command expects a specific `CreateIndex` for the
job it creates. Any Raft write when a server comes up or establishes leadership
can cause this test to break. Interpolate the expected index as we've done for
other indexes on the job to make this test less brittle.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673#issuecomment-2847619747
2025-05-02 14:30:10 -04:00
James Rasell
21fd0bbb8a ci: Regenerate TLS certificates used for testing. (#25804) 2025-05-02 13:51:02 +01:00
James Rasell
449da5bc11 deps: Update mitchellh/colorstring to d06e56a500db (#25801) 2025-05-02 11:30:41 +01:00
James Rasell
01cd762d27 encrypter: Ignore wrapped key additions with zero wrapped keys. (#25791)
When a Nomad server restores its state via a snapshot and logs, it
is possible a legacy wrapped key object/log is found. This key
will not contain any wrapped keys and therefore should be ingored
within the encrypter.

It is theoretically possible without this change that a key which
generates zero decrypt tasks supersedes a running task and will
place itself in the tracked decrypt task tracker. This decrypt
task has no running work to remove its entry.
2025-05-01 14:54:41 +01:00
Juana De La Cuesta
dfc1412e22 Merge pull request #25721 from hashicorp/NMD-321-reload
Force an agent return if there is an error on reload
2025-05-01 14:43:08 +02:00
dependabot[bot]
f54804c16b chore(deps): bump github.com/miekg/dns from 1.1.64 to 1.1.65 (#25766) 2025-05-01 07:40:09 +01:00
Chris Roberts
a69baeea8c Merge pull request #25792 from hashicorp/b-pagination-tkn
paginator: fix tokenizer comparison of composite index and ID
2025-04-30 13:14:11 -07:00
Chris Roberts
ba1683f40e Update wording in the changelog entry 2025-04-30 11:17:19 -07:00
Chris Roberts
db360fc085 paginator: fix tokenizer comparison of composite index and ID
The `CreateIndexAndIDTokenizer` creates a composite token by
combining the create index value and ID from the object with
a `.`. Tokens are then compared lexicographically. The comparison
is appropriate for the ID segment of the token, but it is not for
the create index segement. Since the create index values are stored
with numeric ordering, using a lexicographical comparison can cause
unexpected results.

For example, when comparing the token `12.object-id` to `102.object-id`
the result will show `12.object-id` being greater. This is the
correct comparison but it is incorrect for the intention of the token.
With the knowledge of the composition of the token, the response
should be that `12.object-id` is less.

The unexpected behavior can be seen when performing lists (like listing
allocations). The behavior is encountered inconsistently due to
two requirements which must be met:

1. Create index values with a large enough span (ex: 12 and 102)
2. Correct per page value to get a "bad" next token (ex: prefix with 102)

To prevent the unexpected behavior, the target token is split
and the components are used individually to compare against the
object.

Fixes #25435
2025-04-30 09:51:24 -07:00
Juana De La Cuesta
dcaa96f0e5 Update website/content/docs/upgrade/upgrade-specific.mdx
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-04-30 15:03:49 +02:00
zouyu1026
18e508ff05 Update index.mdx (#25755)
the old link https://caravanproject.io/ point to Gambling website.
update the github wiki
2025-04-30 06:22:29 -05:00
Juana De La Cuesta
e8fb36f4d3 Style: typo 2025-04-30 13:01:57 +02:00
Juanadelacuesta
9288a3141a func and docs: Use the config from the client and not from the agent that is already parsed. Add the breaking change to the release notes 2025-04-30 10:53:02 +02:00
Tu Nguyen
bee2400958 update iframe to videoembed (#25783) 2025-04-29 10:58:04 -05:00
Aimee Ukasick
4075b0b8ba Docs: Add garbage collection page (#25715)
* add garbage collection page

* finish client; add resources section

* finish server section; task driver section

* add front matter description

* fix typos

* Address Tim's feedback
2025-04-28 08:37:23 -05:00
Adrian Todorov
a4dd1c962e docs: Update Nvidia device driver docs to link to list of supported cards and newer versions (#25531)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-04-28 08:32:58 +01:00
dependabot[bot]
4be69dddd4 chore(deps): bump github.com/hashicorp/vault/api from 1.15.0 to 1.16.0 (#25763) 2025-04-28 07:49:49 +01:00
dependabot[bot]
71065af720 chore(deps): bump github.com/hashicorp/go-discover (#25764) 2025-04-28 07:14:28 +01:00
Piotr Kazmierczak
3e688cf928 acl: add missing JWT auth method validation (#25757) 2025-04-25 14:53:25 +02:00
Piotr Kazmierczak
32ca833c70 client: unflake TestClient_ACL_ResolveToken_InvalidClaims (#25758) 2025-04-25 14:53:09 +02:00
James Rasell
e928131482 ui: Only show paused icon when allocs in pending state are paused. (#25742)
Jobs were being marked incorectly as having paused allocations
when termimal allocations were marked with the paused boolean. The
UI should only mark a job as including paused allocations when
these paused allocations are in the correct client state, which is
pending.

---------

Co-authored-by: Phil Renaud <phil@riotindustries.com>
2025-04-25 07:45:45 +01:00
Tim Gross
374e987b9b metrics: emit cache and rss stats on cgroup v2 (#25751)
In cgroups v2, a different map of memory stats is available from the kernel than
in v1. The Docker API reflects this change. But there are equivalent values in
the map for RSS (anonymously mapped memory) and cache (filesystem cache and
tmpfs), which the Docker driver is not currently emitting.

Fallback to these alternate values when the cgroups v1 values are not
available. Include the anonymous mapping in the "measured" allocation stats as
"RSS" so that they both show up in allocation metrics. We can do this on both
the `docker` driver and the Linux executor for `exec` and `java` drivers.

Fixes: https://github.com/hashicorp/nomad/issues/19185
Ref: https://hashicorp.atlassian.net/browse/NMD-437
Ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
Ref: https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
2025-04-24 12:48:18 -04:00
Matt McQuillan
bd12d55eae Jira - GH Sync Updates
updating fields for jira move/sync
2025-04-24 11:47:31 -04:00
Matt McQuillan
74b98a6e9b updating fields for jira move/sync 2025-04-24 11:27:42 -04:00
Tim Gross
c7cb49f205 testing: fix a panic in docker stats collection test (#25747)
When the context closes, the stats emitter closes its channel. It's possible
for the channel to be closed in the stats emitter goroutine before the `select`
in the test sees that the context has closed, which can result in a panic in the
test when we try to read the empty value off the channel.
2025-04-24 10:41:03 -04:00
Tim Gross
1e744db38e refactor alloc drain to make intent more clear (#25731)
While working on #25726, I found a method in the drainer code that records
creates a map of job IDs to allocations.

At first glance this looks like a bug because it effectively de-duplicates the
allocations per job. But the consumer of the map is only concerned with jobs,
not allocations, and simply reads the job off the allocation. Refactor this to
make it obvious we're looking at the job.

Ref: https://github.com/hashicorp/nomad/pull/25726
2025-04-24 09:54:44 -04:00
Tim Gross
5208ad4c2c scheduler: allow canaries to be migrated on node drain (#25726)
When a node is drained that has canaries that are not yet healthy, the canaries
may not be properly migrated and the deployment will halt. This happens only if
there are more than `migrate.max_parallel` canaries on the node and the canaries
are not yet healthy (ex. they have a long `update.min_healthy_time`). In this
circumstance, the first batch of canaries are marked for migration by the
drainer correctly. But then the reconciler counts these migrated canaries
against the total number of expected canaries and no longer progresses the
deployment. Because an insufficient number of allocations have reported they're
healthy, the deployment cannot be promoted.

When the reconciler looks for canaries to cancel, it leaves in the list any
canaries that are already terminal (because there shouldn't be any work to
do). But this ends up skipping the creation of a new canary to replace terminal
canaries that have been marked for migration. Add a conditional for this case to
cause the canary to be removed from the list of active canaries so we can
replace it.

Ref: https://hashicorp.atlassian.net/browse/NMD-560
Fixes: https://github.com/hashicorp/nomad/issues/17842
2025-04-24 09:24:28 -04:00
Piotr Kazmierczak
3ad0df71a8 docker: correct stat response for rss, cache and swap memory in cgroups v1 (#25741)
#25138 refactoring accidentally removed
some of the memory stats that weren't available as concrete types in
containerapi.
2025-04-24 15:17:56 +02:00
Tim Gross
4d7ed88a8d testing: use Docker Hub registry mirror for additional tests (#25733)
This image was missed in https://github.com/hashicorp/nomad/pull/25703 and is
resulting in rate limited in tests.
2025-04-24 08:50:32 -04:00
James Rasell
4b40e10e68 e2e: Update UI playwright version to 1.52.0 (#25740) 2025-04-24 13:38:26 +01:00
James Rasell
717207bce0 e2e: Fix TestDocker/testRedis with increased timeout on deployment (#25739)
The fresh deployment of the Redis job took around 20s which is
also the default context timeout on the e2e util that monitors and
waits for a deployment to complete.

The tight timing meant the test often timed out but sometimes
would complete successfully. Increasing the timeout for this
deployment will remove the flakiness.
2025-04-24 09:09:33 +01:00
Juanadelacuesta
949571e313 func: read the config from the agent, dont reparse 2025-04-24 05:01:53 +02:00
Juana De La Cuesta
4b95517734 Update .changelog/25721.txt
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2025-04-24 04:54:38 +02:00
Juanadelacuesta
46343ee56e func: use the client's configured drain deadline to calculate the graceful timeout when terminating an agent 2025-04-23 23:59:50 +02:00
Juanadelacuesta
c91f24681d style: add changelog 2025-04-23 23:28:54 +02:00
Juana De La Cuesta
9778a31e29 Update command/agent/command.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-23 23:18:09 +02:00
Juana De La Cuesta
39b3d63172 Update command/agent/command.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-23 23:18:02 +02:00
Juana De La Cuesta
313f430fdd Update command/agent/command.go
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2025-04-23 23:17:36 +02:00
Matt McQuillan
9a30372426 Testing Revised Jira Fields to get Jira/GH integration working
Testing Revised Jira Fields
2025-04-23 16:12:58 -04:00
Matt McQuillan
2b437fd733 Fixing ordering and ending bracket of extraFields 2025-04-23 15:59:27 -04:00
Matt McQuillan
1754fb1ed8 Update .github/workflows/jira-sync.yml
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-23 15:53:35 -04:00
Matt McQuillan
d9b0fdcb8e Testing Revised Jira Fields 2025-04-23 15:39:58 -04:00