Commit Graph

750 Commits

Author SHA1 Message Date
Juana De La Cuesta
526c6375ad Make paths in e2e/terraform/ directory relative to the module (#24664)
* func: make paths relative

* func: make paths relative to the module inside the e2e terraform folder

* fix: add license files to gitignore

* func: move /etc and update all paths

* Uncomment forgotten code

* fix: update the path to the tls certificates to be local to the instance
2024-12-13 17:33:59 +01:00
Juana De La Cuesta
a9a0f71213 Remove sockaddr and use native tools (#24665)
* func: remove sockaddr and use native tools

* Update setup.sh
2024-12-13 17:24:53 +01:00
Yucong Sun
642e33ae41 CSI: fix topology matching logic (#24522)
Some plugins emit multiple topology segment entries for the same segment (ex. newer versions of AWS EBS) to accommodate convention changes in k8s. Check that segments are a superset instead of exactly equal to the plugin's topology segments.
2024-11-22 09:22:36 -05:00
Phil Renaud
0023edd3ec Updates Playwright in response to an E2E nightly failure (#24487) 2024-11-20 09:33:27 -05:00
Juana De La Cuesta
270b4f97a6 Update some details of the terraform readme file for e2e provisioning (#24451)
* docs: update instructions to provision e2e cluster

* Update e2e/terraform/README.md

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>

* Update e2e/terraform/terraform.tfvars

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>

* Update e2e/terraform/README.md

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>

---------

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2024-11-18 13:36:51 +01:00
Juana De La Cuesta
1f944196d9 Allow scaling system jobs to 0 (#24363)
* func: remove validation scaling for system jobs and dont canonicalize to 1

* test: update test to validate for 0 and improve error message

* func: remove the canonicalization to 1 from system jobs

* docs: add changelog

* func: add test for scaling system jobs

* temp: add logging to debug test

* fix: clean up after test is done

* fix: scaled down jobs will still have the stop allocation, update test to account for it

* Update the e2e test to accomodate for system jobs to have an alloc per node

* fix: filter to only count ready nodes on the node count

* fix: remove the datacenter constrain from the system job definition

* fix: compare alloc IDs to avoid flaky tests when verifying no alloc was stoped

* fix: remove duplicated code
2024-11-18 13:35:47 +01:00
Piotr Kazmierczak
73383ee755 e2e: unflake testDockerExecStdin (#24385) 2024-11-07 13:35:32 +01:00
Seth Hoenig
b18851617f docker: close response connection once stdin is exhausted (#24202) 2024-10-17 11:07:23 -05:00
Piotr Kazmierczak
a22e56390e e2e: fix failing tests due to docker plugin settings (#24234) 2024-10-17 11:12:59 +02:00
Piotr Kazmierczak
f9cbaaf6c7 docker: fix a bug where auth for private registries wasn't parsed correctly (#24215)
In #23966 we introduced an official Docker client and did not notice that in
contrast to our previous 3rd party client, the official SDK PullOptions object
expects a base64 encoded JSON with username and password, instead of username/
password pair.
2024-10-16 22:04:54 +02:00
Tim Gross
d261d58ea2 build: update hc-install to current (#24199)
Installing Vault and Consul from releases.hashicorp.com via `hc-install` has
been failing intermittently. Update the `hc-install` binaries to be current and
add one retry to downloads for our compat tests so that we can get builds more
reliably green while the underlying issue is being debugged.
2024-10-15 10:07:58 -04:00
Daniel Bennett
278a2df3af e2e: ui: update playwright to 1.48.0 (#24158)
steps to update:
 * edit run.sh IMAGE variable manually
 * run ./run.sh test
2024-10-09 10:34:53 -05:00
Tim Gross
e9ba630639 docker: fix script check execution (#24098)
In #24095 we made a fix for non-streaming exec into Docker tasks for script
checks and `change_mode = "script"`, but didn't complete E2E testing. We need to
use `ContainerExecAttach` in the new API in order to get stdout/stderr from
tasklets, but the previous `ContainerExecStart` call will prevent this from
running successfully with an error that the exec has already run.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
* This should fix the remaining issues in nightly E2E for Docker.
2024-10-01 16:41:38 -04:00
Michael Smithhisler
6b6aa7cc26 identity: adds ability to specify custom filepath for saving workload identities (#24038) 2024-09-23 10:27:00 -04:00
Tim Gross
9247dc9108 E2E: allow Consul version to omit tags (#24024)
When we start the Consul agent in the `consulcompat` test package, we check that
the version matches the version we expect. But Consul agents may omit non-core
parts of the version string (ex. `1.20.0-rc1` displays `1.20.0`). Compare only
the core portions of the version string.
2024-09-20 14:46:01 -04:00
Seth Hoenig
51215bf102 deps: update to go-set/v3 and refactor to use custom iterators (#23971)
* deps: update to go-set/v3

* deps: use custom set iterators for looping
2024-09-16 13:40:10 -05:00
Tim Gross
8739d7738c E2E: remove invalid HCLv1 field on submissions test (#23936)
HCLv1 support was removed entirely in #23912, but I missed this one test and
documentation reference.
2024-09-09 09:57:25 -04:00
Phil Renaud
faf95ef7b9 Update the pinned playwright version (#23929) 2024-09-06 15:57:19 -04:00
Tim Gross
a9beef7edd jobspec: remove HCL1 support (#23912)
This changeset removes support for parsing jobspecs via the long-deprecated
HCLv1.

Fixes: https://github.com/hashicorp/nomad/issues/20195
Ref: https://hashicorp.atlassian.net/browse/NET-10220
2024-09-05 09:02:45 -04:00
Seth Hoenig
4aeb279534 e2e: fix module name of an artifact we download (#23843)
Because this will definitely never change again, for sure, trust me.
2024-08-19 10:25:35 -05:00
Seth Hoenig
db0642099e build: update golangci-lint to 1.60.1 (#23807)
* build: update golangci-lint to 1.60.1

* ci: update golangci-lint to v1.60.1

Helps with go1.23 compatability. Introduces some breaking changes / newly
enforced linter patterns so those are fixed as well.
2024-08-14 10:09:31 -05:00
Tim Gross
bc50eebebd workload identity: add support for extra claims config for Vault (#23675)
Although we encourage users to use Vault roles, sometimes they're going to want
to assign policies based on entity and pre-create entities and aliases based on
claims. This allows them to use single default role (or at least small number of
them) that has a templated policy, but have an escape hatch from that.

When defining Vault entities the `user_claim` must be unique. When writing Vault
binding rules for use with Nomad workload identities the binding rule won't be
able to create a 1:1 mapping because the selector language allows accessing only
a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a
job because of namespaces. It's possible to create a JWT auth role with
`bound_claims` to avoid this becoming a security problem, but this doesn't allow
for correct accounting of user claims.

Add support for an `extra_claims` block on the server's `default_identity`
blocks for Vault. This allows a cluster administrator to add a custom claim on
all allocations. The values for these claims are interpolatable with a limited
subset of fields, similar to how we interpolate the task environment.

Fixes: https://github.com/hashicorp/nomad/issues/23510
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
2024-08-05 15:01:54 -04:00
Daniel Bennett
10d3f1749b e2e: test all cni config formats (#23650) 2024-07-22 10:17:03 -05:00
Tim Gross
a29f9b6fc0 keyring: E2E testing for KMS/rotation (#23601)
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the E2E
infrastructure and testing from that PR to keep the review manageable.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: https://github.com/hashicorp/nomad/issues/14852
Ref: https://github.com/hashicorp/nomad/pull/23580
2024-07-19 13:49:48 -04:00
Daniel Bennett
de10efa3fa e2e: hc-install consul-cni (#23612)
now that the version with tproxy CNI_ARGS is on releases.hashicorp.com
2024-07-17 14:26:40 -05:00
Daniel Bennett
afbd283c1b e2e: skip missing windows ami if windows clients=0 (#23610)
and tweak Makefile to generate a custom.tfvars
instead of specifying vars separately via CLI.
hoping this makes it a little more obvious
if there is no consul/nomad license.
2024-07-17 12:45:41 -05:00
Martina Santangelo
bc81c85ec7 e2e: cni args tests (#23597)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-07-15 17:08:50 -04:00
Piotr Kazmierczak
ddbb307fa6 e2e: purge the job in the UI stop_proxy() script (#23565)
otherwise namespace deletion fails due to non-terminal allocations
2024-07-12 10:13:51 +02:00
Deniz Onur Duzgun
c82dd76a1b security: update tls cipher suites (#23551) 2024-07-11 14:01:45 -04:00
Daniel Bennett
c84b4ad67b e2e: add test for task schedule{} (#23382) 2024-06-20 11:18:53 -05:00
Daniel Bennett
2da38ba9c4 e2e: jobs3 hcl vars differently (#23363)
and include jobspec and vars in registrations
(so they show up in the UI under job Definition)
2024-06-17 13:20:51 -05:00
Daniel Bennett
5a6e3d5ef0 e2e: add Enterprise Option for cluster3.Establish (#23362) 2024-06-17 12:59:37 -05:00
Tim Gross
288a048a2e e2e: add prerelease builds to Consul/Vault compatibility tests (#23287)
Update the Consul/Vault build downloader functions so that we include the
current prerelease build (if any) in our E2E compatibility testing we do on each
PR. This will automatically cycle out when the GA build is released, because
that build is "higher" in the sorted set.
2024-06-11 08:54:27 -04:00
Seth Hoenig
2054e87158 e2e: add tests for exec2 task driver (#22406)
* e2e: add tests for exec2 task driver

* e2e: use envoy 1.29.4 because consul

* e2e: add a bridge networking http test for exec driver

* e2e: split up http test so curl always starts after the server
2024-05-31 09:22:39 -05:00
Seth Hoenig
9fb2b10ab6 e2e: no lnoger need consul terraform module (#22396) 2024-05-28 08:04:03 -05:00
Tim Gross
91d422ec21 E2E: document how the AMIs are tagged and how those tags are used (#22237)
The process by which we tag AMIs with the commit SHA of the Packer directory
isn't documented in this repository, which makes it easy to accidentally build
an AMI that will break nightly E2E.
2024-05-24 11:11:00 -05:00
James Rasell
04ba358266 client: expose network namespace CNI config as task env vars. (#11810)
This change exposes CNI configuration details of a network
namespace as environment variables. This allows a task to use
these value to configure itself; a potential use case is to run
a Raft application binding to IP and Port details configured using
the bridge network mode.
2024-05-14 09:02:06 +01:00
Piotr Kazmierczak
abe9c0803a e2e: unflake TestWorkloadIdentity/testNobody (#20499)
sometimes the container quits too fast
2024-04-30 18:17:14 +02:00
Tim Gross
ff2d9de592 Revert "E2E: skip Vault 1.16.1 for JWT compatibility test (#20301)" (#20484)
This reverts commit 45b36371a12ffae5b5bfaaeadb08f801fb6bc98d. Now that Vault
1.16.2 has shipped, the E2E test will pick up only a working version.

Closes: https://github.com/hashicorp/nomad/issues/20298
2024-04-26 09:36:09 -04:00
Tim Gross
d40e23f939 E2E: clean up go mod cache after building consul-cni (#20378)
In #20296 we added a Go tool chain to the AMI we use for E2E tests, so that we
can build `consul-cni` for tproxy testing. This is intended to be temporary
until `consul-k8s` 1.4.2 is officially released. But the Go cache from building
`consul-k8s` uses up roughly 1.5GiB of space and the test machines have fairly
small disks. This causes the Nomad clients to aggressively GC client allocations
that stop, which breaks tests that run batch workloads and then read their logs.
2024-04-12 11:52:46 -04:00
Tim Gross
8298d39e78 Connect transparent proxy support
Add support for Consul Connect transparent proxies

Fixes: https://github.com/hashicorp/nomad/issues/10628
2024-04-10 11:00:18 -04:00
Tim Gross
548adb0fd4 tproxy: E2E tests (#20296)
Add the `consul-cni` plugin to the Linux AMI for E2E, and add a test case that
covers the transparent proxy feature. Add test assertions to the Connect tests
for upstream reachability

Ref: https://github.com/hashicorp/nomad/pull/20175
2024-04-05 14:23:26 -04:00
Tim Gross
2382ab8776 E2E: ensure periodic test can't fail due to cron conflicts (#20300)
The E2E test for periodic dispatch jobs has a `cron` trigger for once a
minute. If the test happens to run at the top of the minute, it's possible for
the forced dispatch to run from the test code, then the periodic timer triggers
and leaves a running child job. This fails the test because it expects only a
single job in the "dead" state.

Make it so that the `cron` expression is implausible to run during our test
window, and migrate the test off the old framework while we're at it.
2024-04-05 08:45:35 -04:00
Tim Gross
648daceca1 E2E: skip Vault 1.16.1 for JWT compatibility test (#20301)
Vault 1.16.1 has a known issue around the JWT auth configuration that will
prevent this test from ever passing. Skip testing the JWT code path on
1.16.1. Once 1.16.2 ships it will no longer get skipped.

Ref: https://github.com/hashicorp/nomad/issues/20298
2024-04-04 17:00:35 -04:00
Tim Gross
c1f020d60f E2E: refactor Connect tests to use stdlib testing (#20278)
Migrate our E2E tests for Connect off the old framework in preparation for
writing E2E tests for transparent proxy and the updated workload identity
workflow. Mark the tests that cover the legacy Consul token submitted workflow.

Ref: https://github.com/hashicorp/nomad/pull/20175
2024-04-04 10:48:10 -04:00
Tim Gross
4ce728afbd E2E: make vault.create_from_role unique per cluster (#20267)
If a E2E cluster is destroyed after a different one has been created, the role
and policy we create in Vault for the cluster will be deleted and Vault-related
tests will fail. Note that before 1.9, we should figure out a way to give HCP
Vault access to the JWKS endpoint and have a different set of policies, but
we'll need to have a role-per-cluster in that case as well.

Fixes: https://github.com/hashicorp/nomad-e2e/issues/138 (internal)
2024-04-03 08:45:01 -04:00
Tim Gross
cf25cf5cd5 E2E: use a self-hosted Consul for easier WI testing (#20256)
Our `consulcompat` tests exercise both the Workload Identity and legacy Consul
token workflow, but they are limited to running single node tests. The E2E
cluster is network isolated, so using our HCP Consul cluster runs into a
problem validating WI tokens because it can't reach the JWKS endpoint. In real
production environments, you'd solve this with a CNAME pointing to a public IP
pointing to a proxy with a real domain name. But that's logisitcally
impractical for our ephemeral nightly cluster.

Migrate the HCP Consul to a single-node Consul cluster on AWS EC2 alongside our
Nomad cluster. Bootstrap TLS and ACLs in Terraform and ensure all nodes can
reach each other. This will allow us to update our Consul tests so they can use
Workload Identity, in a separate PR.

Ref: #19698
2024-04-02 15:24:51 -04:00
Tim Gross
de218d1919 E2E: change timing of vaultsecrets test to guarantee lease window (#20200)
We've been getting a couple of errors from this test on nightly where the
template hasn't rendered by the time we expect it to. I've run some tests
locally and this may be a timing issue introduced by recent code changes to
templates.

Move the start of the timer to after we're guaranteed that we've got a secret
lease TTL started, to eliminate this as a source of flakiness. In my tests this
adds another ~5s to a test that already takes over a minute to run anyways.
2024-03-22 16:12:00 -04:00
Daniel Bennett
e059adef98 e2e: PreCleanup and other jobs3 helpers (#19844) 2024-01-29 17:54:54 -06:00
Piotr Kazmierczak
543ba16e61 e2e: more retries for RequireConsulDeregistered (#19801) 2024-01-22 20:11:48 +01:00