844 Commits

Author SHA1 Message Date
Tim Gross
139a96ad12 e2e: fix bind name to allow Connect reachability (#18878)
The `BindName` for JWT authentication should always bind to the `nomad_service` field in the JWT and not include the namespace, as the `nomad_service` is what's actually registered in Consul. 

* Fix the binding rule for the `consulcompat` test 
* Add a reachability assertion so that we don't miss regressions.
* Ensure we have a clean shutdown so that we don't leak state (containers and iptables) between tests.
2023-10-27 10:15:17 -04:00
Piotr Kazmierczak
7f62dec473 consul WI: rename default auth method for services (#18867)
It should be called nomad-services instead of nomad-workloads.
2023-10-26 09:43:33 +02:00
Tim Gross
6c2d5a0fbb E2E: Consul compatibility matrix tests (#18799)
Set up a new test suite that exercises Nomad's compatibility with Consul. This
suite installs all currently supported versions of Consul, spins up a Consul
agent with appropriate configuration, and a Nomad agent running in dev
mode. Then it runs a Connect job against each pair.
2023-10-24 16:03:53 -04:00
Kerim Satirli
5e1bbf90fc docs: update all URLs to developer.hashicorp.com (#16247) 2023-10-24 11:00:11 -04:00
Luiz Aoqui
70b1862026 test: add E2E vaultcompat test for JWT auth flow (#18822)
Test the JWT auth flow using real Nomad and Vault agents.
2023-10-23 20:00:55 -04:00
Seth Hoenig
e3c8700ded deps: upgrade to go-set/v2 (#18638)
No functional changes, just cleaning up deprecated usages that are
removed in v2 and replace one call of .Slice with .ForEach to avoid
making the intermediate copy.
2023-10-05 11:56:17 -05:00
Daniel Bennett
a51d46c65c e2e: packer windows from "ECS_Optimized" image (#18453)
"Containers" AMIs evaporated at some point...
https://aws.amazon.com/marketplace/pp/prodview-yfve3zjgfjtug
> This version has been removed and is no longer
> available to new customers.
2023-09-11 12:26:32 -05:00
Seth Hoenig
f5b0da1d55 all: swap exp packages for maps, slices (#18311) 2023-08-23 15:42:13 -05:00
James Rasell
6108f5c4c3 admin: rename _oss files to _ce (#18209) 2023-08-18 07:47:24 +01:00
Seth Hoenig
6fca4fa715 test-e2e: no need to run vaultcomat tests as root (#18215)
6747ef8803 fixes the Nomad client to support using the raw_exec
driver while running as a non-root user. Remove the use of sudo
in the test-e2e workflow for running integration (vaultcompat)
tests.
2023-08-15 16:00:54 -05:00
Seth Hoenig
6747ef8803 drivers/raw_exec: restore ability to run tasks without nomad running as root (#18206)
Although nomad officially does not support running the client as a non-root
user, doing so has been more or less possible with the raw_exec driver as
long as you don't expect features to work like networking or running tasks
as specific users. In the cgroups refactoring I bulldozed right over the
special casing we had in place for raw_exec to continue working if the cgroups
were unable to be created. This PR restores that behavior - you can now
(as before) run the nomad client as a non-root user and make use of the
raw_exec task driver.
2023-08-15 11:22:30 -05:00
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
Seth Hoenig
37dd4c4a69 e2e: modernize vaultcompat testing (#18179)
* e2e: modernize vaultcompat testing

* e2e: cr fixes for vaultcompat
2023-08-09 09:24:51 -05:00
Seth Hoenig
8d28946993 e2e podman private registry (#17642)
* e2e: add tests for using private registry with podman driver

This PR adds e2e tests that stands up a private docker registry
and has a podman tasks run a container from an image in that private
registry.

Tests
 - user:password set in task config
 - auth_soft_fail works for public images when auth is set in driver
 - credentials helper is set in driver auth config
 - config auth.json file is set in driver auth config

* packer: use nomad-driver-podman v0.5.0

* e2e: eliminate unnecessary chmod

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* cr: no need to install nomad twice

* cl: no need to install docker twice

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-07-19 15:59:36 -05:00
Seth Hoenig
159bf51120 e2e: add some e2e tests for pledge task driver (#17909)
* e2e: setup nomad for pledge driver

* e2e: add some e2e tests for pledge task driver
2023-07-12 11:56:08 -05:00
Seth Hoenig
fd50f2bcb8 e2e: do not set a user for raw_exec tasks (#17901)
Cannot set a user for raw_exec tasks, because doing so does not work
with the 0700 root owned client data directory that we setup in the e2e
cluster in accordance with the Nomad hardening guide.
2023-07-11 16:00:15 -05:00
James Rasell
f43a3c9f37 e2e: respect timeout value when waiting for allocs in v3. (#17800) 2023-07-10 09:47:10 +01:00
Daniel Bennett
6bd509869b e2e: use DNS instead of HTTP to get my_public_ipv4 (#17759) 2023-06-28 13:11:57 -05:00
hashicorp-copywrite[bot]
d778ecfc7d [COMPLIANCE] Add Copyright and License Headers (#17732)
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-06-26 11:11:17 -05:00
Seth Hoenig
37df529e7a e2e: refactor pids isolation tests (#17717)
This PR refactors some old PID isolation tests to make use of the e2e/v3
packages. Should be quite a bit easier to read. Adds 'alloc exec' capability
to the jobs3 package.
2023-06-26 09:51:18 -05:00
Seth Hoenig
5b5fbc0881 e2e: create a v3/ set of packages for creating Nomad e2e tests (#17620)
* e2e: create a v3/ set of packages for creating Nomad e2e tests

This PR creates an experimental set of packages under `e2e/v3/` for crafting
Nomad e2e tests. Unlike previous generations, this is an attempt at providing
a way to create tests in a declarative (ish) pattern, with a focus on being
easy to use, easy to cleanup, and easy to debug.

@shoenig is just trying this out to see how it goes.

Lots of features need to be implemented.
Many more docs need to be written.
Breaking changes are to be expected.
There are known and unknown bugs.
No warranty.

Quick run of `example` with verbose logging.

```shell
➜ NOMAD_E2E_VERBOSE=1 go test -v
=== RUN   TestExample
=== RUN   TestExample/testSleep
    util3.go:25: register (service) job: "sleep-809"
    util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: pending
    util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: complete
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: successful
    util3.go:25: deployment a85ad2f8-269c-6620-d390-8eac7a9c397d was a success
    util3.go:25: deregister job "sleep-809"
    util3.go:25: system gc
=== RUN   TestExample/testNamespace
    util3.go:25: apply namespace "example-291"
    util3.go:25: register (service) job: "sleep-967"
    util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: pending
    util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: complete
    util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
    util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
    util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
    util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: successful
    util3.go:25: deployment 3395e9a8-3ffc-8990-d5b8-cc0ce311f302 was a success
    util3.go:25: deregister job "sleep-967"
    util3.go:25: system gc
    util3.go:25: cleanup namespace "example-291"
=== RUN   TestExample/testEnv
    util3.go:25: register (batch) job: "env-582"
    util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: pending
    util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: complete
    util3.go:25: deregister job "env-582"
    util3.go:25: system gc
--- PASS: TestExample (10.08s)
    --- PASS: TestExample/testSleep (5.02s)
    --- PASS: TestExample/testNamespace (4.02s)
    --- PASS: TestExample/testEnv (1.03s)
PASS
ok      github.com/hashicorp/nomad/e2e/example  10.079s
```

* cluster3: use filter for kernel.name instead of filtering manually
2023-06-23 09:10:49 -05:00
Daniel Bennett
748aea1c61 e2e: fix windows client docker (#17572)
the windows docker install script stopped working.

after trying various things to fix the script,
I opted instead for a base image that comes with
docker already installed.

error output during build was:
  Installing Docker.
  WARNING: Cannot find path 'C:\Users\Administrator\AppData\Local\Temp\DockerMsftProvider\DockerDefault_DockerSearchIndex.json' because it does not exist.
  WARNING: Cannot bind argument to parameter 'downloadURL' because it is an empty string.
  WARNING: The property 'AbsoluteUri' cannot be found on this object. Verify that the property exists.
  WARNING: The property 'RequestMessage' cannot be found on this object. Verify that the property exists.
  Failed to install Docker.
  Install-Package : No match was found for the specified search criteria and package name 'docker'.
2023-06-20 10:17:16 -05:00
hashicorp-copywrite[bot]
4e2d131d39 [COMPLIANCE] Add Copyright and License Headers (#17596)
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-06-19 12:23:28 -04:00
Seth Hoenig
f5fcaba1c7 e2e: modernize podman test suite (#17564)
Use the new style of e2e test for the podman suite ... which is all of
one test case that was skipped out. Turn the case back on, and we will
add more tests in the near future.
2023-06-16 10:36:17 -05:00
Seth Hoenig
6975409386 e2e: cleanup podman installation in jammy image (#17558)
* e2e: cleanup podman installation in jammy image

The original steps were copied over from the bionic image and does a lot
of hoop jumping we do not need anymore.

For the moment just hard-code installing the v0.4.2 version of the driver,
but I may follow up and modify hc-install to support installing @latest
like go itself.

* use releases for hc-install
2023-06-15 18:17:31 -05:00
Seth Hoenig
6b2834559f e2e: purge bionic packer image scripts (#17559)
Bionic is dead, long live the Jammy!
2023-06-15 15:15:01 -05:00
Patric Stout
a1a5241606 Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 (#17535)
* Fix DevicesSets being removed when cpusets are reloaded with cgroup v2

This meant that if any allocation was created or removed, all
active DevicesSets were removed from all cgroups of all tasks.

This was most noticeable with "exec" and "raw_exec", as it meant
they no longer had access to /dev files.

* e2e: add test for verifying cgroups do not interfere with access to devices

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-15 09:39:36 -05:00
Seth Hoenig
60e0404bb5 compliance: add headers with fixed copywrite tool (#17353)
Closes #17117
2023-05-30 09:20:32 -05:00
Seth Hoenig
3cc25949fa client: ignore restart issued to terminal allocations (#17175)
* client: ignore restart issued to terminal allocations

This PR fixes a bug where issuing a restart to a terminal allocation
would cause the allocation to run its hooks anyway. This was particularly
apparent with group_service_hook who would then register services but
then never deregister them - as the allocation would be effectively in
a "zombie" state where it is prepped to run tasks but never will.

* e2e: add e2e test for alloc restart zombies

* cl: tweak text

Co-authored-by: Tim Gross <tgross@hashicorp.com>

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-05-16 10:19:41 -05:00
Seth Hoenig
5d6409bfab cli: upload var file(s) content on job submission (#17128)
This PR makes it so that the content of any -var-file files is uploaded
to Nomad on job run.
2023-05-11 08:04:33 -05:00
Seth Hoenig
8dff1f758a api: set the job submission during job reversion (#17097)
* api: set the job submission during job reversion

This PR fixes a bug where the job submission would always be nil when
a job goes through a reversion to a previous version. Basically we need
to detect when this happens, lookup the submission of the job version
being reverted to, and set that as the submission of the new job being
created.

* e2e: add e2e test for job submissions during reversion

This e2e test ensures a reverted job inherits the job submission
associated with the version of the job being reverted to.
2023-05-08 14:18:34 -05:00
Seth Hoenig
889c5aa0f7 services: un-mark group services as deregistered if restart hook runs (#16905)
* services: un-mark group services as deregistered if restart hook runs

This PR may fix a bug where group services will never be deregistered if the
group undergoes a task restart.

* e2e: add test case for restart and deregister group service

* cl: add cl

* e2e: add wait for service list call
2023-04-24 14:24:51 -05:00
Tim Gross
f91bf84e12 drain: use client status to determine drain is complete (#14348)
If an allocation is slow to stop because of `kill_timeout` or `shutdown_delay`,
the node drain is marked as complete prematurely, even though drain monitoring
will continue to report allocation migrations. This impacts the UI or API
clients that monitor node draining to shut down nodes.

This changeset updates the behavior to wait until the client status of all
drained allocs are terminal before marking the node as done draining.
2023-04-13 08:55:28 -04:00
Shawn
9898e85d09 fix: typo (#16873) 2023-04-12 16:18:13 -04:00
Tim Gross
6a90e8320f E2E: clarify drain -deadline and -force flag behaviors (#16868)
The `-deadline` and `-force` flag for the `nomad node drain` command only cause
the draining to ignore the `migrate` block's healthy deadline, max parallel,
etc. These flags don't have anything to do with the `kill_timeout` or
`shutdown_delay` options of the jobspec.

This changeset fixes the skipped E2E tests so that they validate the intended
behavior, and updates the docs for more clarity.
2023-04-12 15:27:24 -04:00
Seth Hoenig
a1ebd075c4 e2e: add e2e tests for job submission api (#16841)
* e2e: add e2e tests for job submission api

* e2e: fixup callers of AllocLogs

* fix typo
2023-04-12 08:36:17 -05:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Tim Gross
b8a472d692 ephemeral disk: migrate should imply sticky (#16826)
The `ephemeral_disk` block's `migrate` field allows for best-effort migration of
the ephemeral disk data to new nodes. The documentation says the `migrate` field
is only respected if `sticky=true`, but in fact if client ACLs are not set the
data is migrated even if `sticky=false`.

The existing behavior when client ACLs are disabled has existed since the early
implementation, so "fixing" that case now would silently break backwards
compatibility. Additionally, having `migrate` not imply `sticky` seems
nonsensical: it suggests that if we place on a new node we migrate the data but
if we place on the same node, we throw the data away!

Update so that `migrate=true` implies `sticky=true` as follows:

* The failure mode when client ACLs are enabled comes from the server not passing
  along a migration token. Update the server so that the server provides a
  migration token whenever `migrate=true` and not just when `sticky=true` too.
* Update the scheduler so that `migrate` implies `sticky`.
* Update the client so that we check for `migrate || sticky` where appropriate.
* Refactor the E2E tests to move them off the old framework and make the intention
  of the test more clear.
2023-04-07 16:33:45 -04:00
Tim Gross
bca03b6c9c E2E: update subset of node drain tests off the old framework (#16823)
While working on several open drain issues, I'm fixing up the E2E tests. This
subset of tests being refactored are existing ones that already work. I'm
shipping these as their own PR to keep review sizes manageable when I push up
PRs in the next few days for #9902, #12314, and #12915.
2023-04-07 09:17:19 -04:00
Seth Hoenig
9ed5aa13c9 e2e/acl: export ACL resource Cleanup helpers (#16822)
The e2e/acl package has some nice helpers for tracking and cleaning up ACL
objects, but they are currently private. Export them so I can abuse them in
other e2e tests.
2023-04-06 14:35:22 -05:00
Seth Hoenig
beb480f332 e2e: swap assert for test package in e2eutil/jobs.go (#16820) 2023-04-06 10:02:27 -05:00
Tim Gross
6cb69e5609 E2E: test enforcement of ACL system (#16796)
This changeset provides a matrix test of ACL enforcement across several
dimensions:
  * anonymous vs bogus vs valid tokens
  * permitted vs not permitted by policy
  * request sent to server vs sent to client (and forwarded)
2023-04-06 09:11:20 -04:00
James Rasell
4848bfadec cli: stream both stdout and stderr when following an alloc. (#16556)
This update changes the behaviour when following logs from an
allocation, so that both stdout and stderr files streamed when the
operator supplies the follow flag. The previous behaviour is held
when all other flags and situations are provided.

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-04 10:42:27 +01:00
Michael Schurter
337a8d2153 e2e: sleep to ensure logs are picked up (#16596)
:(
2023-03-21 14:10:50 -07:00
Tim Gross
aece7b061c E2E: fix events tests (#16595)
In #12916 we updated the events test as part of a larger set of changes around
mapstructure serialization fixes. But the changes to the jobs we're deploying in
the tests had invalid task configs so they never result in good deployments and
the test will always fail. Make the before/after jobs identical (except for the
version bump) and make them valid. Also wait for allocations for the 2nd job run
to appear before checking the deployment list, so that we don't race with the
scheduler.
2023-03-21 14:01:40 -07:00
Michael Schurter
a73a399162 Windows fixes for e2e tests (#16592)
* e2e: skip task api test when windows too old

* e2e: don't run proxy on windows
2023-03-21 13:55:32 -07:00
Michael Schurter
282e3bcfcc Enable ACLs on E2E test clients (#16530)
* e2e: uniformly enable acls across all agents

* docs: clarify that acls should be set everywhere
2023-03-16 14:22:41 -07:00
Seth Hoenig
098650e36c artifact: use specific version link for zipbomb artifact (#16513)
Fix the e2e case where we download the go-getter bomb.zip test file, which
is being removed on main. We can still get it from the version tag - yay git!
2023-03-16 10:18:46 -05:00
Michael Schurter
9fefc18b77 e2e fixes: cli output, timing issue, and some cleanups (#16418)
* e2e: job expects alloc to run until stopped

* e2e: fix case changed by #16306

* e2e: couldn't find a bug but improved test+jobspecs
2023-03-10 13:14:51 -08:00
Seth Hoenig
40ab325594 e2e: setup nomad permissions correctly (client vs. server) (#16399)
This PR configures

- server nodes with a systemd unit running the agent as the nomad service user
- client nodes with a root owned nomad data directory
2023-03-08 14:41:08 -06:00