Commit Graph

27023 Commits

Author SHA1 Message Date
Tu Nguyen
bee2400958 update iframe to videoembed (#25783) 2025-04-29 10:58:04 -05:00
Aimee Ukasick
4075b0b8ba Docs: Add garbage collection page (#25715)
* add garbage collection page

* finish client; add resources section

* finish server section; task driver section

* add front matter description

* fix typos

* Address Tim's feedback
2025-04-28 08:37:23 -05:00
Adrian Todorov
a4dd1c962e docs: Update Nvidia device driver docs to link to list of supported cards and newer versions (#25531)
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-04-28 08:32:58 +01:00
dependabot[bot]
4be69dddd4 chore(deps): bump github.com/hashicorp/vault/api from 1.15.0 to 1.16.0 (#25763) 2025-04-28 07:49:49 +01:00
dependabot[bot]
71065af720 chore(deps): bump github.com/hashicorp/go-discover (#25764) 2025-04-28 07:14:28 +01:00
Piotr Kazmierczak
3e688cf928 acl: add missing JWT auth method validation (#25757) 2025-04-25 14:53:25 +02:00
Piotr Kazmierczak
32ca833c70 client: unflake TestClient_ACL_ResolveToken_InvalidClaims (#25758) 2025-04-25 14:53:09 +02:00
James Rasell
e928131482 ui: Only show paused icon when allocs in pending state are paused. (#25742)
Jobs were being marked incorectly as having paused allocations
when termimal allocations were marked with the paused boolean. The
UI should only mark a job as including paused allocations when
these paused allocations are in the correct client state, which is
pending.

---------

Co-authored-by: Phil Renaud <phil@riotindustries.com>
2025-04-25 07:45:45 +01:00
Tim Gross
374e987b9b metrics: emit cache and rss stats on cgroup v2 (#25751)
In cgroups v2, a different map of memory stats is available from the kernel than
in v1. The Docker API reflects this change. But there are equivalent values in
the map for RSS (anonymously mapped memory) and cache (filesystem cache and
tmpfs), which the Docker driver is not currently emitting.

Fallback to these alternate values when the cgroups v1 values are not
available. Include the anonymous mapping in the "measured" allocation stats as
"RSS" so that they both show up in allocation metrics. We can do this on both
the `docker` driver and the Linux executor for `exec` and `java` drivers.

Fixes: https://github.com/hashicorp/nomad/issues/19185
Ref: https://hashicorp.atlassian.net/browse/NMD-437
Ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
Ref: https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
2025-04-24 12:48:18 -04:00
Matt McQuillan
bd12d55eae Jira - GH Sync Updates
updating fields for jira move/sync
2025-04-24 11:47:31 -04:00
Matt McQuillan
74b98a6e9b updating fields for jira move/sync 2025-04-24 11:27:42 -04:00
Tim Gross
c7cb49f205 testing: fix a panic in docker stats collection test (#25747)
When the context closes, the stats emitter closes its channel. It's possible
for the channel to be closed in the stats emitter goroutine before the `select`
in the test sees that the context has closed, which can result in a panic in the
test when we try to read the empty value off the channel.
2025-04-24 10:41:03 -04:00
Tim Gross
1e744db38e refactor alloc drain to make intent more clear (#25731)
While working on #25726, I found a method in the drainer code that records
creates a map of job IDs to allocations.

At first glance this looks like a bug because it effectively de-duplicates the
allocations per job. But the consumer of the map is only concerned with jobs,
not allocations, and simply reads the job off the allocation. Refactor this to
make it obvious we're looking at the job.

Ref: https://github.com/hashicorp/nomad/pull/25726
2025-04-24 09:54:44 -04:00
Tim Gross
5208ad4c2c scheduler: allow canaries to be migrated on node drain (#25726)
When a node is drained that has canaries that are not yet healthy, the canaries
may not be properly migrated and the deployment will halt. This happens only if
there are more than `migrate.max_parallel` canaries on the node and the canaries
are not yet healthy (ex. they have a long `update.min_healthy_time`). In this
circumstance, the first batch of canaries are marked for migration by the
drainer correctly. But then the reconciler counts these migrated canaries
against the total number of expected canaries and no longer progresses the
deployment. Because an insufficient number of allocations have reported they're
healthy, the deployment cannot be promoted.

When the reconciler looks for canaries to cancel, it leaves in the list any
canaries that are already terminal (because there shouldn't be any work to
do). But this ends up skipping the creation of a new canary to replace terminal
canaries that have been marked for migration. Add a conditional for this case to
cause the canary to be removed from the list of active canaries so we can
replace it.

Ref: https://hashicorp.atlassian.net/browse/NMD-560
Fixes: https://github.com/hashicorp/nomad/issues/17842
2025-04-24 09:24:28 -04:00
Piotr Kazmierczak
3ad0df71a8 docker: correct stat response for rss, cache and swap memory in cgroups v1 (#25741)
#25138 refactoring accidentally removed
some of the memory stats that weren't available as concrete types in
containerapi.
2025-04-24 15:17:56 +02:00
Tim Gross
4d7ed88a8d testing: use Docker Hub registry mirror for additional tests (#25733)
This image was missed in https://github.com/hashicorp/nomad/pull/25703 and is
resulting in rate limited in tests.
2025-04-24 08:50:32 -04:00
James Rasell
4b40e10e68 e2e: Update UI playwright version to 1.52.0 (#25740) 2025-04-24 13:38:26 +01:00
James Rasell
717207bce0 e2e: Fix TestDocker/testRedis with increased timeout on deployment (#25739)
The fresh deployment of the Redis job took around 20s which is
also the default context timeout on the e2e util that monitors and
waits for a deployment to complete.

The tight timing meant the test often timed out but sometimes
would complete successfully. Increasing the timeout for this
deployment will remove the flakiness.
2025-04-24 09:09:33 +01:00
Matt McQuillan
9a30372426 Testing Revised Jira Fields to get Jira/GH integration working
Testing Revised Jira Fields
2025-04-23 16:12:58 -04:00
Matt McQuillan
2b437fd733 Fixing ordering and ending bracket of extraFields 2025-04-23 15:59:27 -04:00
Matt McQuillan
1754fb1ed8 Update .github/workflows/jira-sync.yml
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-23 15:53:35 -04:00
Matt McQuillan
d9b0fdcb8e Testing Revised Jira Fields 2025-04-23 15:39:58 -04:00
Tim Gross
1ea3ffd311 testing: state store test improvements around deployments (#25732)
While working on #25726, I explored a hypothesis that the problem could be
in the state store, but this proved to be a dead end. While I was in this area
of the code I migrated the tests to `shoenig/test`.

Ref: https://github.com/hashicorp/nomad/pull/25726
2025-04-23 15:28:14 -04:00
scoss
01dad73a4e tls_verify fix (#25725) 2025-04-23 13:50:36 -05:00
Allison Larson
50513a87b7 Preserve core resources during inplace service alloc updates (#25705)
* Preserve core resources during inplace service alloc updates

When an alloc is running with the core resources specified, and the
alloc is able to be updated in place, the cores it is running on should
be preserved.

This fixes a bug where the allocation's task's core resources
(CPU.ReservedCores) would be recomputed each time the reconciler checked
that the allocation could continue to run on the given node. Under
circumstances where a different core on the node became available before
this check was made, the selection process could compute this new core
as the core to run on, regardless of core the allocation was already
running on. The check takes into account other allocations running on
the node with reserved cores, but cannot check itself.

When this would happen for multiple allocations being evaluated in a
single plan, the selection process would see the other cores being
previously reserved but be unaware of the one it ran on, resulting in
the same core being chosen over and over for each allocation that was
being checked, and updated in the state store (but not on the node).
Once those cores were chosen and committed for multiple allocs, the node
appears to be exhausted on the cores dimension, and it would prevent any
additional allocations from being started on the node.

The reconciler check/computation for allocations that are being updated
in place and have resources.cores defined is effectively a check that
the node has the available cores to run on, not a computation that
should be changed. The fix still performs the check, but once it is
successful any existing ReservedCores are preserved. Because any changes
to this resource is considered a "destructive change", this can be
confidently preserved during the inplace update.

* Adjust reservedCores scheduler test

* Add changelog entry
2025-04-23 10:38:47 -07:00
Daniel Bennett
dbf44a6ed3 Merge pull request #25720 - Combined dependencies PR 2025-04-23 11:11:25 -04:00
Matt McQuillan
c349f1ddd3 Updating Jira Project Sync
Updating Jira Project
2025-04-23 09:57:48 -04:00
Matt McQuillan
0fc8e68460 Updating Jira Project
Switching from NET to NMD
2025-04-23 09:36:59 -04:00
Daniel Bennett
1fe1167c58 Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/ui/ember-cli-sass-11.0.1' into combined-pr-branch 2025-04-22 15:13:03 -04:00
Daniel Bennett
06d4898005 Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/website/hashicorp/platform-cli-2.8.0' into combined-pr-branch 2025-04-22 15:13:02 -04:00
Daniel Bennett
2040bac97c Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/website/babel/traverse-7.24.7' into combined-pr-branch 2025-04-22 15:13:02 -04:00
Daniel Bennett
2bc8941c53 Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/ui/babel-loader-10.0.0' into combined-pr-branch 2025-04-22 15:13:01 -04:00
Daniel Bennett
4985884f54 Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/website/prettier-3.5.3' into combined-pr-branch 2025-04-22 15:13:00 -04:00
Daniel Bennett
75bca06b2f Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/ui/prismjs-1.30.0' into combined-pr-branch 2025-04-22 15:13:00 -04:00
Daniel Bennett
b3d234ef83 Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/ui/babel/helpers-7.26.10' into combined-pr-branch 2025-04-22 15:12:59 -04:00
Daniel Bennett
95e7dd5022 Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/ui/ember-qunit-9.0.2' into combined-pr-branch 2025-04-22 15:12:58 -04:00
Daniel Bennett
5e297e0622 Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/ui/image-size-1.2.1' into combined-pr-branch 2025-04-22 15:12:58 -04:00
Daniel Bennett
2ad0082d5d Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/ui/lint-staged-15.5.1' into combined-pr-branch 2025-04-22 15:12:57 -04:00
Daniel Bennett
cb733524d1 Merge remote-tracking branch 'origin/dependabot/go_modules/golang.org/x/crypto-0.37.0' into combined-pr-branch 2025-04-22 15:12:57 -04:00
Daniel Bennett
08023936e8 Merge remote-tracking branch 'origin/dependabot/go_modules/google.golang.org/protobuf-1.36.6' into combined-pr-branch 2025-04-22 15:12:56 -04:00
Daniel Bennett
c28ad7527f Merge remote-tracking branch 'origin/dependabot/go_modules/github.com/docker/cli-28.1.1incompatible' into combined-pr-branch 2025-04-22 15:12:56 -04:00
Daniel Bennett
0c4d8fa4e2 Merge remote-tracking branch 'origin/dependabot/go_modules/github.com/aws/aws-sdk-go-v2/config-1.29.14' into combined-pr-branch 2025-04-22 15:12:55 -04:00
Daniel Bennett
b51a37829f Merge remote-tracking branch 'origin/dependabot/go_modules/github.com/containernetworking/cni-1.3.0' into combined-pr-branch 2025-04-22 15:12:55 -04:00
Piotr Kazmierczak
df3b00bce0 acl: use WhoAmI RPC endpoint in /acl/token/self (#25547)
ResolveToken RPC endpoint was only used by the /acl/token/self API. We should migrate to the WI-aware WhoAmI instead.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-04-22 17:53:39 +02:00
Daniel Bennett
c46521a80d cli: operator debug: respect NOMAD_REGION env var (#25716)
properly filter out regions other than the one specified
like the -namespace flag does
2025-04-21 17:06:50 -04:00
Michael Smithhisler
6036ab8b40 client: close namespace file handle and defensively lazy unmount (#25714) 2025-04-21 16:25:05 -04:00
dependabot[bot]
79a459ba49 chore(deps): bump github.com/containernetworking/cni from 1.2.3 to 1.3.0
Bumps [github.com/containernetworking/cni](https://github.com/containernetworking/cni) from 1.2.3 to 1.3.0.
- [Release notes](https://github.com/containernetworking/cni/releases)
- [Commits](https://github.com/containernetworking/cni/compare/v1.2.3...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/containernetworking/cni
  dependency-version: 1.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-20 09:37:28 +00:00
dependabot[bot]
e9f1cebcf2 chore(deps): bump github.com/aws/aws-sdk-go-v2/config
Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.29.9 to 1.29.14.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.29.9...config/v1.29.14)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-version: 1.29.14
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-20 09:37:22 +00:00
dependabot[bot]
bf3116fb3b chore(deps): bump github.com/docker/cli
Bumps [github.com/docker/cli](https://github.com/docker/cli) from 28.0.4+incompatible to 28.1.1+incompatible.
- [Commits](https://github.com/docker/cli/compare/v28.0.4...v28.1.1)

---
updated-dependencies:
- dependency-name: github.com/docker/cli
  dependency-version: 28.1.1+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-20 09:36:21 +00:00
dependabot[bot]
eaa2ab810d chore(deps): bump google.golang.org/protobuf from 1.36.5 to 1.36.6
Bumps google.golang.org/protobuf from 1.36.5 to 1.36.6.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-version: 1.36.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-20 09:36:14 +00:00