Commit Graph

24984 Commits

Author SHA1 Message Date
Phil Renaud
ba7892d0d2 Trim variable path before save (#18198) 2023-08-17 10:46:44 -04:00
James Rasell
d23ee134c5 deps: update hashicorp/go-set to v0.1.14 (#18240) 2023-08-17 15:03:15 +01:00
Luiz Aoqui
bff5ef78ae csi: prevent panic on volume delete (#18234)
When a CSI volume is deleted while its plugin is not running, the
function `volAndPluginLookup` returns a `nil` plugin value resulting in a
panic in the request handler.
2023-08-17 09:47:40 -04:00
Luiz Aoqui
140159511b demo: update image for hostpath CSI plugin (#18236) 2023-08-17 09:36:28 -04:00
Phil Renaud
d1a24309e2 [ui] Preserve HCL2 on stop/start via the web UI (#18120)
* long walk for a ham sandwich

* testfix for service job start

* hold point, breaks identified

* Testfixes for job start/stop helper
2023-08-17 09:32:42 -04:00
Piotr Kazmierczak
53ef6391a5 drivers/docker: fix a hostConfigMemorySwappiness panic (#18238)
cgroupslib.MaybeDisableMemorySwappiness returned an incorrect type, and was
incorrectly typecast to int64 causing a panic on non-linux and non-windows hosts.
2023-08-17 14:45:31 +02:00
Luiz Aoqui
e21ab7d948 docs: fix job dispatch documentation (#18225) 2023-08-16 17:22:55 -04:00
Luiz Aoqui
6d1a2a0f81 docs: move glossary to a top-level menu item (#18223) 2023-08-16 17:22:32 -04:00
Luiz Aoqui
01d71ca70e docs: expand documentation on node pools (#18109) 2023-08-16 11:16:06 -04:00
hashicorp-copywrite[bot]
9af2a9b396 [COMPLIANCE] License update (#18218) 2023-08-16 15:59:33 +01:00
hashicorp-copywrite[bot]
4f55df8306 Adding explicit MPL license for sub-package (#18219)
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.

Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-08-16 09:59:07 -05:00
Seth Hoenig
6fca4fa715 test-e2e: no need to run vaultcomat tests as root (#18215)
6747ef8803 fixes the Nomad client to support using the raw_exec
driver while running as a non-root user. Remove the use of sudo
in the test-e2e workflow for running integration (vaultcompat)
tests.
2023-08-15 16:00:54 -05:00
Seth Hoenig
8833452d44 followup to numa/cgroups refactor (#18214)
* lang: note that Stack is not concurrency-safe

* client: use more descriptive name for wrangler hook in logs

* numalib: use correct name for receiver parameter
2023-08-15 14:12:17 -05:00
Tim Gross
f00bff09f1 fix multiple overflow errors in exponential backoff (#18200)
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.

Introduce a helper with a check on the cap before the bitshift to avoid overflow in all 
places this can occur.

Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
2023-08-15 14:38:18 -04:00
Seth Hoenig
6747ef8803 drivers/raw_exec: restore ability to run tasks without nomad running as root (#18206)
Although nomad officially does not support running the client as a non-root
user, doing so has been more or less possible with the raw_exec driver as
long as you don't expect features to work like networking or running tasks
as specific users. In the cgroups refactoring I bulldozed right over the
special casing we had in place for raw_exec to continue working if the cgroups
were unable to be created. This PR restores that behavior - you can now
(as before) run the nomad client as a non-root user and make use of the
raw_exec task driver.
2023-08-15 11:22:30 -05:00
Michael Schurter
0e22fc1a0b identity: add support for multiple identities + audiences (#18123)
Allows for multiple `identity{}` blocks for tasks along with user-specified audiences. This is a building block to allow workload identities to be used with Consul, Vault and 3rd party JWT based auth methods.

Expiration is still unimplemented and is necessary for JWTs to be used securely, so that's up next.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-08-15 09:11:53 -07:00
Seth Hoenig
77e139ea25 build: use modtool to format go.mod file (#18195) 2023-08-15 07:26:46 -05:00
Tim Gross
ac8604ede9 test: deflake node drain intergration test (#18171)
The `TestDrainer_AllTypes_NoDeadline` test has been flaky. It looks like this
might be because the final update of batch allocations to complete is improperly
updating the state store directly rather than by RPC. If the service jobs have
restarted in the meantime, the `allocClientStateSimulator` will have updated the
index on the allocations table and that will prevent the drainer from
unblocking (and being marked complete) when the batch jobs are written with an
earlier index.

This changeset attempts to fix that by making the update via RPC (as it normally
would be in real code).
2023-08-14 16:17:25 -04:00
Tim Gross
464062d602 test: deflake job endpoint registration test (#18170)
We've seen test flakiness in the `TestJobEndpoint_Register_NonOverlapping` test,
which asserts that we don't try to placed allocations for blocked evals until
resources have been actually freed by setting the client status of the previous
alloc to complete.

The flaky assertion includes sorting the two allocations by CreateIndex and this
appears to be a non-stable sort in the context of the test run, which results in
failures that shouldn't exist. There's no reason to sort the allocations instead
of just examining them by ID. This changeset does so.
2023-08-14 16:17:09 -04:00
Shantanu Gadgil
a170499c32 docs: ampersand and bash backgrounding problem (#18175)
the `&` symbol messes up the command when copy pasting into a shell
2023-08-14 15:11:09 -04:00
Esteban Barrios
65d562b760 config: add configurable content security policy (#18085) 2023-08-14 14:23:03 -04:00
dependabot[bot]
3c7a44daea build(deps): bump github.com/shoenig/test from 0.6.6 to 0.6.7 in /api (#18191)
Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.6.6 to 0.6.7.
- [Release notes](https://github.com/shoenig/test/releases)
- [Commits](https://github.com/shoenig/test/compare/v0.6.6...v0.6.7)

---
updated-dependencies:
- dependency-name: github.com/shoenig/test
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-14 09:21:29 -05:00
James Rasell
f9d70166e5 readme: update readme license badge (#18188)
* readme: update readme license badge

* tweak badge color

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-08-14 09:03:29 -05:00
Seth Hoenig
d9341f0664 update go1.21 (#18184)
* build: update to go1.21

* go: eliminate helpers in favor of min/max

* build: run go mod tidy

* build: swap depguard for semgrep

* command: fixup broken tls error check on go1.21
2023-08-14 08:43:27 -05:00
Sarah Thompson
fd1ae3427b update linux package license to BUSL-1.1 (#18192)
update copywrite.hcl to exclude MPL subdirs
2023-08-14 07:08:58 -05:00
Matt McQuillan
0ef5636d6e Merge pull request #18187 from hashicorp/compliance/license-changes
[COMPLIANCE] License changes
2023-08-10 18:55:13 -04:00
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
hashicorp-copywrite[bot]
2d35e32ec9 Update copyright file headers to BUSL-1.1 2023-08-10 17:27:15 -05:00
hashicorp-copywrite[bot]
f2acbdb49b Update copyright file headers to BUSL-1.1 2023-08-10 17:27:09 -05:00
hashicorp-copywrite[bot]
89e24d7405 Adding explicit MPL license for sub-package
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.
2023-08-10 17:27:01 -05:00
hashicorp-copywrite[bot]
b3e30b1dfa Updating the license from MPL to Business Source License
Going forward, this project will be licensed under the Business Source License v1.1. Please see our blog post for more details at https://hashi.co/bsl-blog, FAQ at https://hashi.co/license-faq, and details of the license at www.hashicorp.com/bsl.
2023-08-10 17:27:01 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00
Charlie Voiselle
5bc49e5208 unbreak the pre-push hook (#18185) 2023-08-10 10:38:18 -04:00
Charlie Voiselle
74f4381cb3 [chore] Update pre-push hook to handle more remote URL shapes (#17560)
* handle remotes without .git in their path
* Update check to use grep
2023-08-09 14:09:39 -04:00
Seth Hoenig
37dd4c4a69 e2e: modernize vaultcompat testing (#18179)
* e2e: modernize vaultcompat testing

* e2e: cr fixes for vaultcompat
2023-08-09 09:24:51 -05:00
Tim Gross
acfb4e679a docs: expand pprof documentation on goroutine profiles (#18172) 2023-08-08 08:33:42 -04:00
Devashish Taneja
472693d642 server: add config to tune job versions retention. #17635 (#17939) 2023-08-07 14:47:40 -04:00
Tim Gross
5d2c1d1f03 test: fix flaky RPC TLS enforcement test (#18155)
The RPC TLS enforcment test creates network connections to a server and these
are occassionally failing in testing with `write: broken pipe` errors. This has
been an ongoing issue where it'll appear to get fixed, then reoccur, and no one
seems to be able to reproduce outside of CI. The test assertion itself is
reliable, which is why it's been hard to spend effort to hunt this down.

The failing test cases are ones that are never supposed to work b/c they fail
our TLS cert role validation. The error message is coming from the TLS handshake
error. The RPC connection handler closes the connection immediately on getting
the error from the TLS handshake. The stdlib's TLS library flushes the
connection's buffer before returning the error. So the theory is that in the
failing case we don't get the error message before the connection is closed, but
do get the error return that allows the client to move on to a write, which
tries to write on the closed pipe.

I've been unable to reproduce this exactly, as the race is effectively between
the OS and the runtime. The equivalent test of the Raft TLS enforcement includes
handling of a EOF intead of the certificate error, so it appears this actually
expected (or at least known) behavior. Because the code under test is operating
as expected, this changeset updates the assertion to accept the error.
2023-08-07 11:17:06 -04:00
Abbas Yazdanpanah
388198abef CLI: make snapshot name requiered in creating volume snapshots (#17958)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-08-04 10:36:07 +01:00
Tim Gross
902f640c80 docs: fix URL in agent pprof examples (#18142) 2023-08-03 16:05:53 -04:00
dependabot[bot]
9551441dff build(deps): bump github.com/hashicorp/go-kms-wrapping/v2 (#17957)
Bumps [github.com/hashicorp/go-kms-wrapping/v2](https://github.com/hashicorp/go-kms-wrapping) from 2.0.8 to 2.0.12.
- [Commits](https://github.com/hashicorp/go-kms-wrapping/compare/v2.0.8...v2.0.12)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-kms-wrapping/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-03 15:43:14 -04:00
dependabot[bot]
02b572473b build(deps): bump github.com/opencontainers/runc from 1.1.5 to 1.1.8 (#18037)
Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.1.5 to 1.1.8.
- [Release notes](https://github.com/opencontainers/runc/releases)
- [Changelog](https://github.com/opencontainers/runc/blob/v1.1.8/CHANGELOG.md)
- [Commits](https://github.com/opencontainers/runc/compare/v1.1.5...v1.1.8)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/runc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-03 15:37:04 -04:00
dependabot[bot]
0d3f976a8a build(deps): bump github.com/hashicorp/consul/api from 1.18.0 to 1.23.0 (#18038)
Bumps [github.com/hashicorp/consul/api](https://github.com/hashicorp/consul) from 1.18.0 to 1.23.0.
- [Release notes](https://github.com/hashicorp/consul/releases)
- [Changelog](https://github.com/hashicorp/consul/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/consul/compare/api/v1.18.0...api/v1.23.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/consul/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-03 15:01:34 -04:00
Tim Gross
b1742c7015 scheduler: filter device instance IDs by constraints (#18141)
When the scheduler assigns a device instance, it iterates over the feasible
devices and then picks the first instance with availability. If the jobspec uses
a constraint on device ID, this can lead to buggy/surprising behavior where the
node's device matches the constraint but then the individual device instance
does not.

Add a second filter based on the `${device.ids}` constraint after selecting a
node's device to ensure the device instance ID falls within the constraint as
well.

Fixes: #18112
2023-08-03 14:58:30 -04:00
James Rasell
9707aafc5b test: add tests for allocNameIndex core funcs (#18136) 2023-08-03 15:43:50 +01:00
Karuppiah Natarajan
2fd508d4f1 docs: fix link for stopping an agent (#18130) 2023-08-02 11:51:45 -04:00
Tim Gross
8ad663d1de allocwatcher: don't destroy local allocdir after migration (#18108)
When ephemeral disks are migrated from an allocation on the same node,
allocation logs for the previous allocation are lost.

There are two workflows for the best-effort attempt to migrate the allocation
data between the old and new allocations. For previous allocations on other
clients (the "remote" workflow), we create a local allocdir and download the
data from the previous client into it. That data is then moved into the new
allocdir and we delete the allocdir of the previous alloc.

For "local" previous allocations we don't need to create an extra directory for
the previous allocation and instead move the files directly from one to the
other. But we still delete the old allocdir _entirely_, which includes all the
logs!

There doesn't seem to be any reason to destroy the local previous allocdir, as
the usual client garbage collection should destroy it later on when needed. By
not deleting it, the previous allocation's logs are still available for the user
to read.

Fixes: #18034
2023-08-02 09:41:46 -04:00
Charlie Voiselle
585b0533c0 [dep] bump golang.org/x/exp (#18102)
There are some refactorings that have to be made in the getter and state
where the api changed in `slices`

* Bump golang.org/x/exp
* Bump golang.org/x/exp in api
* Update job_endpoint_test
* [feedback] unexport sort function
2023-08-01 11:50:17 -04:00
Luiz Aoqui
768978883d cli: search all namespaces for node volumes (#17925)
When looking for CSI volumes to display in the `node status` command the
CLI needs to search all namespaces.
2023-08-01 09:55:39 -04:00
Kevin Schoonover
4841791c86 fingerprint: fix 'default' alias not added to interface specified by network_interface (#18096) 2023-08-01 08:35:31 -04:00