Commit Graph

755 Commits

Author SHA1 Message Date
James Rasell
359571df01 e2e: Account for non-default region in Prometheus scrape config. (#24807) 2025-01-08 14:08:17 +00:00
Juana De La Cuesta
2eb2b6c739 fix: update the dnsconfig script to handle multiple interfaces (#24800) 2025-01-07 21:12:18 +01:00
James Rasell
8bb7c1315d e2e: fix failing tests due to region name change. (#24713) 2024-12-19 14:21:17 +00:00
Tim Gross
abeae5c47b E2E: use a variable for region (#24693)
In #24644 we set the region to "e2e" but forgot to setup the TLS certificate
names appropriately. Swap the region out for a variable instead.
2024-12-17 10:28:22 -05:00
Tim Gross
75b0202f7f api: don't copy previously parsed URL when setting new address (#24644)
In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d867), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: https://github.com/hashicorp/nomad/issues/24635
Fixes: https://github.com/hashicorp/nomad/issues/24609
Ref: https://github.com/hashicorp/nomad/pull/16872
Ref: https://github.com/hashicorp/nomad/pull/23785
Ref: 4d3b75d867
2024-12-16 11:05:29 -05:00
Juana De La Cuesta
526c6375ad Make paths in e2e/terraform/ directory relative to the module (#24664)
* func: make paths relative

* func: make paths relative to the module inside the e2e terraform folder

* fix: add license files to gitignore

* func: move /etc and update all paths

* Uncomment forgotten code

* fix: update the path to the tls certificates to be local to the instance
2024-12-13 17:33:59 +01:00
Juana De La Cuesta
a9a0f71213 Remove sockaddr and use native tools (#24665)
* func: remove sockaddr and use native tools

* Update setup.sh
2024-12-13 17:24:53 +01:00
Yucong Sun
642e33ae41 CSI: fix topology matching logic (#24522)
Some plugins emit multiple topology segment entries for the same segment (ex. newer versions of AWS EBS) to accommodate convention changes in k8s. Check that segments are a superset instead of exactly equal to the plugin's topology segments.
2024-11-22 09:22:36 -05:00
Phil Renaud
0023edd3ec Updates Playwright in response to an E2E nightly failure (#24487) 2024-11-20 09:33:27 -05:00
Juana De La Cuesta
270b4f97a6 Update some details of the terraform readme file for e2e provisioning (#24451)
* docs: update instructions to provision e2e cluster

* Update e2e/terraform/README.md

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>

* Update e2e/terraform/terraform.tfvars

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>

* Update e2e/terraform/README.md

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>

---------

Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2024-11-18 13:36:51 +01:00
Juana De La Cuesta
1f944196d9 Allow scaling system jobs to 0 (#24363)
* func: remove validation scaling for system jobs and dont canonicalize to 1

* test: update test to validate for 0 and improve error message

* func: remove the canonicalization to 1 from system jobs

* docs: add changelog

* func: add test for scaling system jobs

* temp: add logging to debug test

* fix: clean up after test is done

* fix: scaled down jobs will still have the stop allocation, update test to account for it

* Update the e2e test to accomodate for system jobs to have an alloc per node

* fix: filter to only count ready nodes on the node count

* fix: remove the datacenter constrain from the system job definition

* fix: compare alloc IDs to avoid flaky tests when verifying no alloc was stoped

* fix: remove duplicated code
2024-11-18 13:35:47 +01:00
Piotr Kazmierczak
73383ee755 e2e: unflake testDockerExecStdin (#24385) 2024-11-07 13:35:32 +01:00
Seth Hoenig
b18851617f docker: close response connection once stdin is exhausted (#24202) 2024-10-17 11:07:23 -05:00
Piotr Kazmierczak
a22e56390e e2e: fix failing tests due to docker plugin settings (#24234) 2024-10-17 11:12:59 +02:00
Piotr Kazmierczak
f9cbaaf6c7 docker: fix a bug where auth for private registries wasn't parsed correctly (#24215)
In #23966 we introduced an official Docker client and did not notice that in
contrast to our previous 3rd party client, the official SDK PullOptions object
expects a base64 encoded JSON with username and password, instead of username/
password pair.
2024-10-16 22:04:54 +02:00
Tim Gross
d261d58ea2 build: update hc-install to current (#24199)
Installing Vault and Consul from releases.hashicorp.com via `hc-install` has
been failing intermittently. Update the `hc-install` binaries to be current and
add one retry to downloads for our compat tests so that we can get builds more
reliably green while the underlying issue is being debugged.
2024-10-15 10:07:58 -04:00
Daniel Bennett
278a2df3af e2e: ui: update playwright to 1.48.0 (#24158)
steps to update:
 * edit run.sh IMAGE variable manually
 * run ./run.sh test
2024-10-09 10:34:53 -05:00
Tim Gross
e9ba630639 docker: fix script check execution (#24098)
In #24095 we made a fix for non-streaming exec into Docker tasks for script
checks and `change_mode = "script"`, but didn't complete E2E testing. We need to
use `ContainerExecAttach` in the new API in order to get stdout/stderr from
tasklets, but the previous `ContainerExecStart` call will prevent this from
running successfully with an error that the exec has already run.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
* This should fix the remaining issues in nightly E2E for Docker.
2024-10-01 16:41:38 -04:00
Michael Smithhisler
6b6aa7cc26 identity: adds ability to specify custom filepath for saving workload identities (#24038) 2024-09-23 10:27:00 -04:00
Tim Gross
9247dc9108 E2E: allow Consul version to omit tags (#24024)
When we start the Consul agent in the `consulcompat` test package, we check that
the version matches the version we expect. But Consul agents may omit non-core
parts of the version string (ex. `1.20.0-rc1` displays `1.20.0`). Compare only
the core portions of the version string.
2024-09-20 14:46:01 -04:00
Seth Hoenig
51215bf102 deps: update to go-set/v3 and refactor to use custom iterators (#23971)
* deps: update to go-set/v3

* deps: use custom set iterators for looping
2024-09-16 13:40:10 -05:00
Tim Gross
8739d7738c E2E: remove invalid HCLv1 field on submissions test (#23936)
HCLv1 support was removed entirely in #23912, but I missed this one test and
documentation reference.
2024-09-09 09:57:25 -04:00
Phil Renaud
faf95ef7b9 Update the pinned playwright version (#23929) 2024-09-06 15:57:19 -04:00
Tim Gross
a9beef7edd jobspec: remove HCL1 support (#23912)
This changeset removes support for parsing jobspecs via the long-deprecated
HCLv1.

Fixes: https://github.com/hashicorp/nomad/issues/20195
Ref: https://hashicorp.atlassian.net/browse/NET-10220
2024-09-05 09:02:45 -04:00
Seth Hoenig
4aeb279534 e2e: fix module name of an artifact we download (#23843)
Because this will definitely never change again, for sure, trust me.
2024-08-19 10:25:35 -05:00
Seth Hoenig
db0642099e build: update golangci-lint to 1.60.1 (#23807)
* build: update golangci-lint to 1.60.1

* ci: update golangci-lint to v1.60.1

Helps with go1.23 compatability. Introduces some breaking changes / newly
enforced linter patterns so those are fixed as well.
2024-08-14 10:09:31 -05:00
Tim Gross
bc50eebebd workload identity: add support for extra claims config for Vault (#23675)
Although we encourage users to use Vault roles, sometimes they're going to want
to assign policies based on entity and pre-create entities and aliases based on
claims. This allows them to use single default role (or at least small number of
them) that has a templated policy, but have an escape hatch from that.

When defining Vault entities the `user_claim` must be unique. When writing Vault
binding rules for use with Nomad workload identities the binding rule won't be
able to create a 1:1 mapping because the selector language allows accessing only
a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a
job because of namespaces. It's possible to create a JWT auth role with
`bound_claims` to avoid this becoming a security problem, but this doesn't allow
for correct accounting of user claims.

Add support for an `extra_claims` block on the server's `default_identity`
blocks for Vault. This allows a cluster administrator to add a custom claim on
all allocations. The values for these claims are interpolatable with a limited
subset of fields, similar to how we interpolate the task environment.

Fixes: https://github.com/hashicorp/nomad/issues/23510
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
2024-08-05 15:01:54 -04:00
Daniel Bennett
10d3f1749b e2e: test all cni config formats (#23650) 2024-07-22 10:17:03 -05:00
Tim Gross
a29f9b6fc0 keyring: E2E testing for KMS/rotation (#23601)
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the E2E
infrastructure and testing from that PR to keep the review manageable.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: https://github.com/hashicorp/nomad/issues/14852
Ref: https://github.com/hashicorp/nomad/pull/23580
2024-07-19 13:49:48 -04:00
Daniel Bennett
de10efa3fa e2e: hc-install consul-cni (#23612)
now that the version with tproxy CNI_ARGS is on releases.hashicorp.com
2024-07-17 14:26:40 -05:00
Daniel Bennett
afbd283c1b e2e: skip missing windows ami if windows clients=0 (#23610)
and tweak Makefile to generate a custom.tfvars
instead of specifying vars separately via CLI.
hoping this makes it a little more obvious
if there is no consul/nomad license.
2024-07-17 12:45:41 -05:00
Martina Santangelo
bc81c85ec7 e2e: cni args tests (#23597)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2024-07-15 17:08:50 -04:00
Piotr Kazmierczak
ddbb307fa6 e2e: purge the job in the UI stop_proxy() script (#23565)
otherwise namespace deletion fails due to non-terminal allocations
2024-07-12 10:13:51 +02:00
Deniz Onur Duzgun
c82dd76a1b security: update tls cipher suites (#23551) 2024-07-11 14:01:45 -04:00
Daniel Bennett
c84b4ad67b e2e: add test for task schedule{} (#23382) 2024-06-20 11:18:53 -05:00
Daniel Bennett
2da38ba9c4 e2e: jobs3 hcl vars differently (#23363)
and include jobspec and vars in registrations
(so they show up in the UI under job Definition)
2024-06-17 13:20:51 -05:00
Daniel Bennett
5a6e3d5ef0 e2e: add Enterprise Option for cluster3.Establish (#23362) 2024-06-17 12:59:37 -05:00
Tim Gross
288a048a2e e2e: add prerelease builds to Consul/Vault compatibility tests (#23287)
Update the Consul/Vault build downloader functions so that we include the
current prerelease build (if any) in our E2E compatibility testing we do on each
PR. This will automatically cycle out when the GA build is released, because
that build is "higher" in the sorted set.
2024-06-11 08:54:27 -04:00
Seth Hoenig
2054e87158 e2e: add tests for exec2 task driver (#22406)
* e2e: add tests for exec2 task driver

* e2e: use envoy 1.29.4 because consul

* e2e: add a bridge networking http test for exec driver

* e2e: split up http test so curl always starts after the server
2024-05-31 09:22:39 -05:00
Seth Hoenig
9fb2b10ab6 e2e: no lnoger need consul terraform module (#22396) 2024-05-28 08:04:03 -05:00
Tim Gross
91d422ec21 E2E: document how the AMIs are tagged and how those tags are used (#22237)
The process by which we tag AMIs with the commit SHA of the Packer directory
isn't documented in this repository, which makes it easy to accidentally build
an AMI that will break nightly E2E.
2024-05-24 11:11:00 -05:00
James Rasell
04ba358266 client: expose network namespace CNI config as task env vars. (#11810)
This change exposes CNI configuration details of a network
namespace as environment variables. This allows a task to use
these value to configure itself; a potential use case is to run
a Raft application binding to IP and Port details configured using
the bridge network mode.
2024-05-14 09:02:06 +01:00
Piotr Kazmierczak
abe9c0803a e2e: unflake TestWorkloadIdentity/testNobody (#20499)
sometimes the container quits too fast
2024-04-30 18:17:14 +02:00
Tim Gross
ff2d9de592 Revert "E2E: skip Vault 1.16.1 for JWT compatibility test (#20301)" (#20484)
This reverts commit 45b36371a12ffae5b5bfaaeadb08f801fb6bc98d. Now that Vault
1.16.2 has shipped, the E2E test will pick up only a working version.

Closes: https://github.com/hashicorp/nomad/issues/20298
2024-04-26 09:36:09 -04:00
Tim Gross
d40e23f939 E2E: clean up go mod cache after building consul-cni (#20378)
In #20296 we added a Go tool chain to the AMI we use for E2E tests, so that we
can build `consul-cni` for tproxy testing. This is intended to be temporary
until `consul-k8s` 1.4.2 is officially released. But the Go cache from building
`consul-k8s` uses up roughly 1.5GiB of space and the test machines have fairly
small disks. This causes the Nomad clients to aggressively GC client allocations
that stop, which breaks tests that run batch workloads and then read their logs.
2024-04-12 11:52:46 -04:00
Tim Gross
8298d39e78 Connect transparent proxy support
Add support for Consul Connect transparent proxies

Fixes: https://github.com/hashicorp/nomad/issues/10628
2024-04-10 11:00:18 -04:00
Tim Gross
548adb0fd4 tproxy: E2E tests (#20296)
Add the `consul-cni` plugin to the Linux AMI for E2E, and add a test case that
covers the transparent proxy feature. Add test assertions to the Connect tests
for upstream reachability

Ref: https://github.com/hashicorp/nomad/pull/20175
2024-04-05 14:23:26 -04:00
Tim Gross
2382ab8776 E2E: ensure periodic test can't fail due to cron conflicts (#20300)
The E2E test for periodic dispatch jobs has a `cron` trigger for once a
minute. If the test happens to run at the top of the minute, it's possible for
the forced dispatch to run from the test code, then the periodic timer triggers
and leaves a running child job. This fails the test because it expects only a
single job in the "dead" state.

Make it so that the `cron` expression is implausible to run during our test
window, and migrate the test off the old framework while we're at it.
2024-04-05 08:45:35 -04:00
Tim Gross
648daceca1 E2E: skip Vault 1.16.1 for JWT compatibility test (#20301)
Vault 1.16.1 has a known issue around the JWT auth configuration that will
prevent this test from ever passing. Skip testing the JWT code path on
1.16.1. Once 1.16.2 ships it will no longer get skipped.

Ref: https://github.com/hashicorp/nomad/issues/20298
2024-04-04 17:00:35 -04:00
Tim Gross
c1f020d60f E2E: refactor Connect tests to use stdlib testing (#20278)
Migrate our E2E tests for Connect off the old framework in preparation for
writing E2E tests for transparent proxy and the updated workload identity
workflow. Mark the tests that cover the legacy Consul token submitted workflow.

Ref: https://github.com/hashicorp/nomad/pull/20175
2024-04-04 10:48:10 -04:00