Commit Graph

672 Commits

Author SHA1 Message Date
Seth Hoenig
5b3416bd97 e2e: set e2e/v3 debug logging on metrics test (#19263) 2023-12-01 10:03:55 -06:00
Tim Gross
05fe2ad191 E2E: fix assertion in CT native service lookup test (#19249)
When porting the `ConsulTemplate` test, I made a last-minute refactor to the
assertions for waiting on files, and accidentally inverted the test assertion in
the process.

Also, when running `jobs3.Submit` you need to include the `Namespace` option so
that the cleanup function that gets return deletes the job from the correct
namespace. This was causing the namespace cleanup to fail because the job
deletion had failed.
2023-12-01 08:54:55 -05:00
Daniel Bennett
639c3f53c9 e2e: give node drain KillTimeout test more time (#19226)
and error more verbosely if it fails

also, add extra information to a failed evaluation
for more error visibility in other tests

---------

Co-authored-by: Juanadelacuesta <juanita.delacuestamorales@hashicorp.com>
2023-11-30 10:37:20 -06:00
Luiz Aoqui
969cdb0f46 test: add consul namespace rules to consulcompat (#19227)
When configuring Consul for multi-namespace support, the JWT auth method
needs to specify namespace rules. This attribute is set to `nil` in CE
but is used in Nomad ENT.
2023-11-30 10:13:08 -05:00
Tim Gross
4e7ad58d2d E2E: modernize ConsulTemplate test and fix some assertions (#19126)
The `TestTemplateUpdateTriggers` is flaky because of what turned out to be
incompatibility between the Consul agent on the E2E cluster and the HCP Consul
server we were running but hadn't upgraded in a while. Upgrading the HCP Consul
server seems to have fixed the tests, but while I'm in here I've updated this
test suite:

* Port all the consul template test suite off of the old framework, and upgrade to
  using e2e "v3" where feasible.
* Clean up some of the assertions in the update triggers test to make the
  purpose of the test more clear.
* Remove unnecessary default fields from the job specs.

Closes: #19075
2023-11-29 12:16:41 -05:00
Daniel Bennett
f7adcefbb3 e2e: refactor vault secrets test (#19152)
fixes VaultSecrets test - it was failing due to a
regex mismatch (`^job` stopped matching when
copywrite headers got prepended to the jobspec).

but RegisterFromJobspec (which had the bug)
was only used in the one spot, so instead this
refactors the whole test to the v3 format
with testing.T and some additional fun stuff
that we can take advantage of with it.

some improvements:
* use a namespace
* use and extend existing test helpers
* add more test helpers
2023-11-28 10:00:27 -06:00
Piotr Kazmierczak
248b2ba5cd WI: use single auth method for Consul by default (#19169)
This simplifies the default setup of Nomad workloads WI-based
authentication for Consul by using a single auth method with 2 binding rules.

Users can still specify separate auth methods for services and tasks.
2023-11-28 12:22:27 +01:00
Daniel Bennett
eb56fce393 e2e: fix ui tests (#19138) 2023-11-27 12:26:19 -06:00
Tim Gross
ff928a8045 E2E: remove assertion from ACL role test (#19121)
The ACL role test asserts that the role has various permissions by listing jobs
in namespaces. It never creates jobs, because we can make all the assertions we
need by checking the error. But the test included an assertion that the
namespace was empty. Usually this will be the case, but if the previous test
case has not completed its GC (which is sync), then it's possible a stopped job
will be in the namespace. Because this assertion is irrelevant for this test,
remove it.
2023-11-17 14:35:20 -05:00
Tim Gross
a3f8a52fd4 E2E: fix assertion in dynamic node metadata test (#19120)
In #18664 we change how null values worked with dynamic node metadata so that
they were no longer returned if there wasn't also a static value for that
key. The test assertion in E2E was not updated to match the new behavior.

Fixes: #19112
2023-11-17 14:35:07 -05:00
Tim Gross
c144af2823 E2E: fix expected error message from token expiration test (#19119)
In Nomad 1.5 we started masking the specific error returned from the
authentication method and returned the "permission denied" error instead. Update
the E2E test that covers token expiration to ensure we're asserting the correct
error here.

Fixes: https://github.com/hashicorp/nomad/issues/16803
2023-11-17 14:00:47 -05:00
Daniel Bennett
4ec9343447 e2e: use tf variable defaults (#19108) 2023-11-16 14:50:11 -06:00
Tim Gross
da61d278d0 e2e: fix and modernize rescheduling test (#19105)
The E2E test suite for rescheduling had a few bugs:
* Using the command line to stop a job with a failing deployment returns a non-zero exit
  code, which would cause an otherwise passing test to fail.
* Two of the input jobs were actually invalid but were only correctly detected
  as such because of #17342

This changeset also updates the whole test suite to move it off the v1
"framework". A few test assertions are also de-flaked.

Fixes: #19076
2023-11-16 15:39:18 -05:00
Tim Gross
8fac70c92c E2E: refactor vaultcompat to allow for ENT tests (#19081)
We want to run the Vault compatibility E2E test with Vault Enterprise binaries
and use Vault namespaces. Refactor the `vaultcompat` test so as to parameterize
most of the test setup logic with the namespace, and add the appropriate build
tag for the CE version of the test.
2023-11-14 09:54:47 -05:00
Tim Gross
1c9c75cc83 E2E: refactor consulcompat to allow for ENT tests (#19068)
We want to run the Consul compatibility E2E test with Consul Enterprise binaries
and use Consul namespaces. Refactor the `consulcompat` test so as to
parameterize most of the test setup logic with the namespace, and add the
appropriate build tag for the CE version of the test.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1305
2023-11-10 15:05:51 -05:00
Seth Hoenig
5987ba434f e2ev3: wait for logs to become ready (#19067)
Just because an alloc is running does not mean nomad is ready to serve
task logs. In a test case where you immediatly read logs after starting
a task, it could be that nomad responds with "no logs found" when you
try to read logs, in which case you just need to wait longer. Do so in
the v3 TaskLogs helper function.
2023-11-10 12:43:16 -06:00
Seth Hoenig
c17333d74a e2e refactor oversubscription (#19060)
* e2e: remove old oversubscription test

* e2e: fixup and cleanup oversubscription test suite

Fix and cleanup this old oversubscription test.

* use t.Cleanup instead of defer in tests
2023-11-10 09:25:32 -06:00
Tim Gross
4e38b41d9d E2E: add template block to consulcompat test (#19055)
The Consul compatibility test focuses on Connect, but it'd be a good idea to
ensure we can successfully get template data out of Consul as well.

Also tightens up the test's Consul ACL policy for the Nomad agent.
2023-11-10 09:25:37 -05:00
Seth Hoenig
1f957947b4 e2e: refactor nomadexec test suite (#19054) 2023-11-10 07:09:24 -06:00
Seth Hoenig
2f8d94ae3e e2e: more cpu and memory for java tasks and some scripts (#19057) 2023-11-10 07:08:14 -06:00
Seth Hoenig
f211a0ab7c e2e: update terrform lock file for 1.6.3 (#19049)
Using the latest version of terraform, the lock file is not the same
as when it was generated. Seems like the http module is not needed?
versioned? present? anymore.
2023-11-09 10:49:49 -06:00
Seth Hoenig
402540f7fb e2e: bump packer build instances because faster (#19046) 2023-11-09 09:33:30 -06:00
Seth Hoenig
a28e5b6965 e2e: refactor metrics test to use NSD and WI (#19022)
* e2e: remove old metrics suite

* e2e: install stress on e2e jammy image

* e2e: overhaul metrics test to use nomad service discovery, workload identity

* e2e: format metrics hcl files and copywrite

* e2e: undo tf lock file

* e2e: undo reg auth file perms

* e2e: format cpustress.hcl
2023-11-09 08:21:16 -06:00
Tim Gross
9d075c44b2 config: remove old Vault/Consul config blocks from parser (#18997)
Remove the now-unused original configuration blocks for Consul and Vault from
the agent configuration parsing. When the agent needs to refer to a Consul or
Vault block it will always be for a specific cluster for the task/service (or
the default cluster for the agent's own use).

This is third of three changesets for this work.

Fixes: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Ref: https://github.com/hashicorp/nomad/pull/18994
2023-11-08 09:30:08 -05:00
Seth Hoenig
63da22063b e2e: update pledge driver to 0.3.0 (#19020) 2023-11-08 06:58:59 -06:00
Seth Hoenig
a2f7ab2645 e2e disable windows (#19012)
* e2e: disable windows client

* e2e: disable windows artifact test
2023-11-07 09:34:18 -06:00
Luiz Aoqui
bfb2dcd172 Vault small fixes (#18942)
* vault: remove `token_ttl` from `vaultcompat` setup

Since Nomad uses periodic tokens, the right value to set in the role is
`token_period`, not `token_ttl`.

* vault: set 1.11.0 as min version for JWT auth

In order to use workload identities JWT auth with Vault it's required to
have a Vault cluster running v1.11.0+, which the version where
`user_claim_json_pointer` was introduced.
2023-11-01 08:23:19 -04:00
Michael Schurter
66fbc0f67e identity: default to RS256 for new workload ids (#18882)
OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256.

Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again.

**Test Updates**

Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed.

I added a new `testutil.WaitForKeyring` helper to replace `testutil.WaitForLeader` in cases where the keyring must be initialized before the test may continue. However this is mostly used in the `nomad/` package.

In the `api` and `command/agent` packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners.

**TODO**

- Docs and changelog entries.
- Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days.
- Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this.

**Requiem for ed25519**

When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset.

EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties:

1. Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key.
2. Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious.
3. No parameters. No choices of encodings. It's all well-defined by [RFC 8032](https://datatracker.ietf.org/doc/html/rfc8032).
4. Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus!
5. Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too.

Life was good with ed25519, but sadly it could not last.

[IDPs](https://en.wikipedia.org/wiki/Identity_provider), such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256:

- [PayPal](https://www.paypalobjects.com/.well-known/openid-configuration) supports HS256
- [Yahoo](https://api.login.yahoo.com/.well-known/openid-configuration) supports ES256

RS256 only:

- [GitHub](https://token.actions.githubusercontent.com/.well-known/openid-configuration)
- [GitLab](https://gitlab.com/.well-known/openid-configuration)
- [Google](https://accounts.google.com/.well-known/openid-configuration)
- [Intuit](https://developer.api.intuit.com/.well-known/openid_configuration)
- [Microsoft](https://login.microsoftonline.com/fabrikamb2c.onmicrosoft.com/v2.0/.well-known/openid-configuration)
- [SalesForce](https://login.salesforce.com/.well-known/openid-configuration)
- [SimpleLogin (acquired by ProtonMail)](https://app.simplelogin.io/.well-known/openid-configuration/)
- [TFC](https://app.terraform.io/.well-known/openid-configuration)
2023-10-31 11:25:20 -07:00
Tim Gross
139a96ad12 e2e: fix bind name to allow Connect reachability (#18878)
The `BindName` for JWT authentication should always bind to the `nomad_service` field in the JWT and not include the namespace, as the `nomad_service` is what's actually registered in Consul. 

* Fix the binding rule for the `consulcompat` test 
* Add a reachability assertion so that we don't miss regressions.
* Ensure we have a clean shutdown so that we don't leak state (containers and iptables) between tests.
2023-10-27 10:15:17 -04:00
Piotr Kazmierczak
7f62dec473 consul WI: rename default auth method for services (#18867)
It should be called nomad-services instead of nomad-workloads.
2023-10-26 09:43:33 +02:00
Tim Gross
6c2d5a0fbb E2E: Consul compatibility matrix tests (#18799)
Set up a new test suite that exercises Nomad's compatibility with Consul. This
suite installs all currently supported versions of Consul, spins up a Consul
agent with appropriate configuration, and a Nomad agent running in dev
mode. Then it runs a Connect job against each pair.
2023-10-24 16:03:53 -04:00
Kerim Satirli
5e1bbf90fc docs: update all URLs to developer.hashicorp.com (#16247) 2023-10-24 11:00:11 -04:00
Luiz Aoqui
70b1862026 test: add E2E vaultcompat test for JWT auth flow (#18822)
Test the JWT auth flow using real Nomad and Vault agents.
2023-10-23 20:00:55 -04:00
Seth Hoenig
e3c8700ded deps: upgrade to go-set/v2 (#18638)
No functional changes, just cleaning up deprecated usages that are
removed in v2 and replace one call of .Slice with .ForEach to avoid
making the intermediate copy.
2023-10-05 11:56:17 -05:00
Daniel Bennett
a51d46c65c e2e: packer windows from "ECS_Optimized" image (#18453)
"Containers" AMIs evaporated at some point...
https://aws.amazon.com/marketplace/pp/prodview-yfve3zjgfjtug
> This version has been removed and is no longer
> available to new customers.
2023-09-11 12:26:32 -05:00
Seth Hoenig
f5b0da1d55 all: swap exp packages for maps, slices (#18311) 2023-08-23 15:42:13 -05:00
James Rasell
6108f5c4c3 admin: rename _oss files to _ce (#18209) 2023-08-18 07:47:24 +01:00
Seth Hoenig
6fca4fa715 test-e2e: no need to run vaultcomat tests as root (#18215)
6747ef8803 fixes the Nomad client to support using the raw_exec
driver while running as a non-root user. Remove the use of sudo
in the test-e2e workflow for running integration (vaultcompat)
tests.
2023-08-15 16:00:54 -05:00
Seth Hoenig
6747ef8803 drivers/raw_exec: restore ability to run tasks without nomad running as root (#18206)
Although nomad officially does not support running the client as a non-root
user, doing so has been more or less possible with the raw_exec driver as
long as you don't expect features to work like networking or running tasks
as specific users. In the cgroups refactoring I bulldozed right over the
special casing we had in place for raw_exec to continue working if the cgroups
were unable to be created. This PR restores that behavior - you can now
(as before) run the nomad client as a non-root user and make use of the
raw_exec task driver.
2023-08-15 11:22:30 -05:00
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
Seth Hoenig
37dd4c4a69 e2e: modernize vaultcompat testing (#18179)
* e2e: modernize vaultcompat testing

* e2e: cr fixes for vaultcompat
2023-08-09 09:24:51 -05:00
Seth Hoenig
8d28946993 e2e podman private registry (#17642)
* e2e: add tests for using private registry with podman driver

This PR adds e2e tests that stands up a private docker registry
and has a podman tasks run a container from an image in that private
registry.

Tests
 - user:password set in task config
 - auth_soft_fail works for public images when auth is set in driver
 - credentials helper is set in driver auth config
 - config auth.json file is set in driver auth config

* packer: use nomad-driver-podman v0.5.0

* e2e: eliminate unnecessary chmod

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* cr: no need to install nomad twice

* cl: no need to install docker twice

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-07-19 15:59:36 -05:00
Seth Hoenig
159bf51120 e2e: add some e2e tests for pledge task driver (#17909)
* e2e: setup nomad for pledge driver

* e2e: add some e2e tests for pledge task driver
2023-07-12 11:56:08 -05:00
Seth Hoenig
fd50f2bcb8 e2e: do not set a user for raw_exec tasks (#17901)
Cannot set a user for raw_exec tasks, because doing so does not work
with the 0700 root owned client data directory that we setup in the e2e
cluster in accordance with the Nomad hardening guide.
2023-07-11 16:00:15 -05:00
James Rasell
f43a3c9f37 e2e: respect timeout value when waiting for allocs in v3. (#17800) 2023-07-10 09:47:10 +01:00
Daniel Bennett
6bd509869b e2e: use DNS instead of HTTP to get my_public_ipv4 (#17759) 2023-06-28 13:11:57 -05:00
hashicorp-copywrite[bot]
d778ecfc7d [COMPLIANCE] Add Copyright and License Headers (#17732)
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-06-26 11:11:17 -05:00
Seth Hoenig
37df529e7a e2e: refactor pids isolation tests (#17717)
This PR refactors some old PID isolation tests to make use of the e2e/v3
packages. Should be quite a bit easier to read. Adds 'alloc exec' capability
to the jobs3 package.
2023-06-26 09:51:18 -05:00
Seth Hoenig
5b5fbc0881 e2e: create a v3/ set of packages for creating Nomad e2e tests (#17620)
* e2e: create a v3/ set of packages for creating Nomad e2e tests

This PR creates an experimental set of packages under `e2e/v3/` for crafting
Nomad e2e tests. Unlike previous generations, this is an attempt at providing
a way to create tests in a declarative (ish) pattern, with a focus on being
easy to use, easy to cleanup, and easy to debug.

@shoenig is just trying this out to see how it goes.

Lots of features need to be implemented.
Many more docs need to be written.
Breaking changes are to be expected.
There are known and unknown bugs.
No warranty.

Quick run of `example` with verbose logging.

```shell
➜ NOMAD_E2E_VERBOSE=1 go test -v
=== RUN   TestExample
=== RUN   TestExample/testSleep
    util3.go:25: register (service) job: "sleep-809"
    util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: pending
    util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: complete
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
    util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: successful
    util3.go:25: deployment a85ad2f8-269c-6620-d390-8eac7a9c397d was a success
    util3.go:25: deregister job "sleep-809"
    util3.go:25: system gc
=== RUN   TestExample/testNamespace
    util3.go:25: apply namespace "example-291"
    util3.go:25: register (service) job: "sleep-967"
    util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: pending
    util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: complete
    util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
    util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
    util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
    util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: successful
    util3.go:25: deployment 3395e9a8-3ffc-8990-d5b8-cc0ce311f302 was a success
    util3.go:25: deregister job "sleep-967"
    util3.go:25: system gc
    util3.go:25: cleanup namespace "example-291"
=== RUN   TestExample/testEnv
    util3.go:25: register (batch) job: "env-582"
    util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: pending
    util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: complete
    util3.go:25: deregister job "env-582"
    util3.go:25: system gc
--- PASS: TestExample (10.08s)
    --- PASS: TestExample/testSleep (5.02s)
    --- PASS: TestExample/testNamespace (4.02s)
    --- PASS: TestExample/testEnv (1.03s)
PASS
ok      github.com/hashicorp/nomad/e2e/example  10.079s
```

* cluster3: use filter for kernel.name instead of filtering manually
2023-06-23 09:10:49 -05:00
Daniel Bennett
748aea1c61 e2e: fix windows client docker (#17572)
the windows docker install script stopped working.

after trying various things to fix the script,
I opted instead for a base image that comes with
docker already installed.

error output during build was:
  Installing Docker.
  WARNING: Cannot find path 'C:\Users\Administrator\AppData\Local\Temp\DockerMsftProvider\DockerDefault_DockerSearchIndex.json' because it does not exist.
  WARNING: Cannot bind argument to parameter 'downloadURL' because it is an empty string.
  WARNING: The property 'AbsoluteUri' cannot be found on this object. Verify that the property exists.
  WARNING: The property 'RequestMessage' cannot be found on this object. Verify that the property exists.
  Failed to install Docker.
  Install-Package : No match was found for the specified search criteria and package name 'docker'.
2023-06-20 10:17:16 -05:00