Commit Graph

195 Commits

Author SHA1 Message Date
Seth Hoenig
2054e87158 e2e: add tests for exec2 task driver (#22406)
* e2e: add tests for exec2 task driver

* e2e: use envoy 1.29.4 because consul

* e2e: add a bridge networking http test for exec driver

* e2e: split up http test so curl always starts after the server
2024-05-31 09:22:39 -05:00
Seth Hoenig
9fb2b10ab6 e2e: no lnoger need consul terraform module (#22396) 2024-05-28 08:04:03 -05:00
Tim Gross
91d422ec21 E2E: document how the AMIs are tagged and how those tags are used (#22237)
The process by which we tag AMIs with the commit SHA of the Packer directory
isn't documented in this repository, which makes it easy to accidentally build
an AMI that will break nightly E2E.
2024-05-24 11:11:00 -05:00
Tim Gross
d40e23f939 E2E: clean up go mod cache after building consul-cni (#20378)
In #20296 we added a Go tool chain to the AMI we use for E2E tests, so that we
can build `consul-cni` for tproxy testing. This is intended to be temporary
until `consul-k8s` 1.4.2 is officially released. But the Go cache from building
`consul-k8s` uses up roughly 1.5GiB of space and the test machines have fairly
small disks. This causes the Nomad clients to aggressively GC client allocations
that stop, which breaks tests that run batch workloads and then read their logs.
2024-04-12 11:52:46 -04:00
Tim Gross
548adb0fd4 tproxy: E2E tests (#20296)
Add the `consul-cni` plugin to the Linux AMI for E2E, and add a test case that
covers the transparent proxy feature. Add test assertions to the Connect tests
for upstream reachability

Ref: https://github.com/hashicorp/nomad/pull/20175
2024-04-05 14:23:26 -04:00
Tim Gross
4ce728afbd E2E: make vault.create_from_role unique per cluster (#20267)
If a E2E cluster is destroyed after a different one has been created, the role
and policy we create in Vault for the cluster will be deleted and Vault-related
tests will fail. Note that before 1.9, we should figure out a way to give HCP
Vault access to the JWKS endpoint and have a different set of policies, but
we'll need to have a role-per-cluster in that case as well.

Fixes: https://github.com/hashicorp/nomad-e2e/issues/138 (internal)
2024-04-03 08:45:01 -04:00
Tim Gross
cf25cf5cd5 E2E: use a self-hosted Consul for easier WI testing (#20256)
Our `consulcompat` tests exercise both the Workload Identity and legacy Consul
token workflow, but they are limited to running single node tests. The E2E
cluster is network isolated, so using our HCP Consul cluster runs into a
problem validating WI tokens because it can't reach the JWKS endpoint. In real
production environments, you'd solve this with a CNAME pointing to a public IP
pointing to a proxy with a real domain name. But that's logisitcally
impractical for our ephemeral nightly cluster.

Migrate the HCP Consul to a single-node Consul cluster on AWS EC2 alongside our
Nomad cluster. Bootstrap TLS and ACLs in Terraform and ensure all nodes can
reach each other. This will allow us to update our Consul tests so they can use
Workload Identity, in a separate PR.

Ref: #19698
2024-04-02 15:24:51 -04:00
Piotr Kazmierczak
8226a85263 e2e: remove deprecated template_file dependency for tf (#19313)
This also allows running tf for our e2e suite locally on darwin.
2024-01-15 18:42:28 +01:00
Piotr Kazmierczak
858a805d7d e2e: add a note about provisioning the infrastructure on macOS/Apple Silicon (#19727) 2024-01-12 14:09:50 +01:00
Matt Robenolt
656bb5cafa drivers/executor: set oom_score_adj for raw_exec (#19515)
* drivers/executor: set oom_score_adj for raw_exec

This might not be wholly true since I don't know all configurations of
Nomad, but in our use cases, we run some of our tasks as `raw_exec` for
reasons.

We observed that our tasks were running with `oom_score_adj = -1000`,
which prevents them from being OOM'd. This value is being inherited from
the nomad agent parent process, as configured by systemd.

Similar to #10698, we also were shocked to have this value inherited
down to every child process and believe that we should also set this
value to 0 explicitly.

I have no idea if there are other paths that might leverage this or
other ways that `raw_exec` can manifest, but this is how I was able to
observe and fix in one of our configurations.

We have been running in production our tasks wrapped in a script that
does: `echo 0 > /proc/self/oom_score_adj` to avoid this issue.

* drivers/executor: minor cleanup of setting oom adjustment

* e2e: add test for raw_exec oom adjust score

* e2e: set oom score adjust to -999

* cl: add cl

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-01-02 13:35:09 -06:00
Daniel Bennett
c7d01705f5 e2e: push nomad token to servers (#19312)
so humans with root shell access can use it to debug

not ideal security, but this is a short-lived test cluster
2023-12-05 08:54:57 -06:00
Daniel Bennett
4ec9343447 e2e: use tf variable defaults (#19108) 2023-11-16 14:50:11 -06:00
Seth Hoenig
f211a0ab7c e2e: update terrform lock file for 1.6.3 (#19049)
Using the latest version of terraform, the lock file is not the same
as when it was generated. Seems like the http module is not needed?
versioned? present? anymore.
2023-11-09 10:49:49 -06:00
Seth Hoenig
402540f7fb e2e: bump packer build instances because faster (#19046) 2023-11-09 09:33:30 -06:00
Seth Hoenig
a28e5b6965 e2e: refactor metrics test to use NSD and WI (#19022)
* e2e: remove old metrics suite

* e2e: install stress on e2e jammy image

* e2e: overhaul metrics test to use nomad service discovery, workload identity

* e2e: format metrics hcl files and copywrite

* e2e: undo tf lock file

* e2e: undo reg auth file perms

* e2e: format cpustress.hcl
2023-11-09 08:21:16 -06:00
Seth Hoenig
63da22063b e2e: update pledge driver to 0.3.0 (#19020) 2023-11-08 06:58:59 -06:00
Seth Hoenig
a2f7ab2645 e2e disable windows (#19012)
* e2e: disable windows client

* e2e: disable windows artifact test
2023-11-07 09:34:18 -06:00
Daniel Bennett
a51d46c65c e2e: packer windows from "ECS_Optimized" image (#18453)
"Containers" AMIs evaporated at some point...
https://aws.amazon.com/marketplace/pp/prodview-yfve3zjgfjtug
> This version has been removed and is no longer
> available to new customers.
2023-09-11 12:26:32 -05:00
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
Seth Hoenig
8d28946993 e2e podman private registry (#17642)
* e2e: add tests for using private registry with podman driver

This PR adds e2e tests that stands up a private docker registry
and has a podman tasks run a container from an image in that private
registry.

Tests
 - user:password set in task config
 - auth_soft_fail works for public images when auth is set in driver
 - credentials helper is set in driver auth config
 - config auth.json file is set in driver auth config

* packer: use nomad-driver-podman v0.5.0

* e2e: eliminate unnecessary chmod

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>

* cr: no need to install nomad twice

* cl: no need to install docker twice

---------

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-07-19 15:59:36 -05:00
Seth Hoenig
159bf51120 e2e: add some e2e tests for pledge task driver (#17909)
* e2e: setup nomad for pledge driver

* e2e: add some e2e tests for pledge task driver
2023-07-12 11:56:08 -05:00
Daniel Bennett
6bd509869b e2e: use DNS instead of HTTP to get my_public_ipv4 (#17759) 2023-06-28 13:11:57 -05:00
Daniel Bennett
748aea1c61 e2e: fix windows client docker (#17572)
the windows docker install script stopped working.

after trying various things to fix the script,
I opted instead for a base image that comes with
docker already installed.

error output during build was:
  Installing Docker.
  WARNING: Cannot find path 'C:\Users\Administrator\AppData\Local\Temp\DockerMsftProvider\DockerDefault_DockerSearchIndex.json' because it does not exist.
  WARNING: Cannot bind argument to parameter 'downloadURL' because it is an empty string.
  WARNING: The property 'AbsoluteUri' cannot be found on this object. Verify that the property exists.
  WARNING: The property 'RequestMessage' cannot be found on this object. Verify that the property exists.
  Failed to install Docker.
  Install-Package : No match was found for the specified search criteria and package name 'docker'.
2023-06-20 10:17:16 -05:00
Seth Hoenig
6975409386 e2e: cleanup podman installation in jammy image (#17558)
* e2e: cleanup podman installation in jammy image

The original steps were copied over from the bionic image and does a lot
of hoop jumping we do not need anymore.

For the moment just hard-code installing the v0.4.2 version of the driver,
but I may follow up and modify hc-install to support installing @latest
like go itself.

* use releases for hc-install
2023-06-15 18:17:31 -05:00
Seth Hoenig
6b2834559f e2e: purge bionic packer image scripts (#17559)
Bionic is dead, long live the Jammy!
2023-06-15 15:15:01 -05:00
Shawn
9898e85d09 fix: typo (#16873) 2023-04-12 16:18:13 -04:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Tim Gross
6cb69e5609 E2E: test enforcement of ACL system (#16796)
This changeset provides a matrix test of ACL enforcement across several
dimensions:
  * anonymous vs bogus vs valid tokens
  * permitted vs not permitted by policy
  * request sent to server vs sent to client (and forwarded)
2023-04-06 09:11:20 -04:00
Michael Schurter
282e3bcfcc Enable ACLs on E2E test clients (#16530)
* e2e: uniformly enable acls across all agents

* docs: clarify that acls should be set everywhere
2023-03-16 14:22:41 -07:00
Seth Hoenig
40ab325594 e2e: setup nomad permissions correctly (client vs. server) (#16399)
This PR configures

- server nodes with a systemd unit running the agent as the nomad service user
- client nodes with a root owned nomad data directory
2023-03-08 14:41:08 -06:00
Seth Hoenig
24af468b67 e2e: fix permissions on nomad data directory (#16376)
This PR updates the provisioning step where we create /opt/nomad/data,
such that it is with 0700 permissions in line with our security guidance.
2023-03-07 14:41:54 -06:00
Tim Gross
517ad9c5bf E2E: add multi-home networking to test infrastructure (#16218)
Add an Elastic Network Interface (ENI) to each Linux host, on a secondary subnet
we have provisioned in each AZ. Revise security groups as follows:

* Split out client security groups from servers so that we can't have clients
  accidentally accessing serf addresses or other unexpected cross-talk.
* Add new security groups for the secondary subnet that only allows
  communication within the security group so we can exercise behaviors with
  multiple IPs.

This changeset doesn't include any Nomad configuration changes needed to take
advantage of the extra network interface. I'll include those with testing for
PR #16217.
2023-02-20 10:08:28 +01:00
Seth Hoenig
6e4410a9b1 e2e: fix 1 of 4 client disconnect tests (#15357)
This PR modifies the disconnect helper job to run as root, which is necesary
for manipulating iptables as it does. Also re-organizes the final test logic
to wait for client re-connect before looking for the replacement (3rd) allocation
in case that client was needed to run the alloc (also giving the sheduler more
time to do its thing).

Skips the other 3 tests, which fail and I cannot yet figure out what is going on.
2022-11-22 08:51:53 -06:00
Seth Hoenig
78593daaee e2e: jammy image needs latest java lts (#15323) 2022-11-18 14:36:36 -06:00
Seth Hoenig
7c254ccdb8 e2e: disable systemd stub dns in jammy image (#15286) 2022-11-17 09:50:44 -06:00
Seth Hoenig
0e3606afa0 e2e: swap bionic image for jammy (#15220) 2022-11-16 10:37:18 -06:00
James Rasell
785b4dfad7 e2e: add acl test for token expiration. (#14418)
In order to add an E2E test to cover token expiration, the server
config has been updated to include a low minimum allowed TTL
value. For ease of reading, the max value is also set.
2022-09-01 09:36:09 +02:00
James Rasell
264d2dd375 e2e: add terraform init commands to readme doc. (#13655) 2022-07-08 16:52:35 +02:00
Tim Gross
aafcf97984 E2E: provide options for reverse proxy for web UI (#12671)
Our E2E test environment is deployed with mTLS, but it's impractical
for us to use mTLS in headless browsers for automated testing (or even
in manual testing). Provide certificates for proxying the web UI via
Nginx. This proxy uses client certs for proxying to the HTTP endpoint
and a self-signed cert for the browser-facing endpoint. We can accept
certificate errors in the automated tests we'll be adding in the next
step of this work.
2022-04-19 16:55:05 -04:00
Tim Gross
e2a8d45f2d E2E: terraform provisioner upgrades (#12652)
While working on infrastructure for testing the UI in E2E, we needed
to upgrade the certificate provider. Performing a provider upgrade via
the TF `init -upgrade` brought in updates for the file and AWS
providers as well. These updates include deprecating the use of
`sensitive_content` fields, removing CA algorithm parameters that can
be inferred from keys, and removing the requirement to manually
specify AWS assume role parameters in the provider config if they're
available in the calling environment's AWS config file (as they are
via doormat or our E2E environment).
2022-04-19 14:27:14 -04:00
Derek Strickland
8f7abae89f Update E2E terraform output command (#12561) 2022-04-13 16:46:09 -04:00
Tim Gross
247e20e10b scripts: fix interpreter for bash (#12549)
Many of our scripts have a non-portable interpreter line for bash and
use bash-specific variables like `BASH_SOURCE`. Update the interpreter
line to be portable between various Linuxes and macOS without
complaint from posix shell users.
2022-04-12 10:08:21 -04:00
Tim Gross
e8da15cae5 E2E: test exercising node drain behavior for CSI volumes (#12384) 2022-03-29 11:19:23 -04:00
Tim Gross
37d831712f E2E: namespace HCP vault and consul policies to avoid collisions (#12386)
Concurrent E2E runs can collide when provisioning policies on HCP
Consul and HCP Vault. Namespace these by the test run name, as we do
for most everything else.
2022-03-25 16:05:59 -04:00
Tim Gross
020fa6f8ba E2E with HCP Consul/Vault (#12267)
Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits:

* Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting.
* Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting
* Our E2E exercises HCP, which is important to us as an organization
* With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner.
* Packer builds for Linux and Windows become much simpler.

tl;dr way less janky shell scripting!
2022-03-18 09:27:28 -04:00
Tim Gross
31b7de78fd e2e: configure prometheus for mTLS for Metrics suite (#12181)
The `Metrics` suite uses prometheus to scrape Nomad metrics so that
we're testing the full user experience of extracting metrics from
Nomad. With the addition of mTLS, we need to make sure prometheus also
has mTLS configuration because the metrics endpoint is protected.

Update the Nomad client configuration and prometheus job to bind-mount
the client's certs into the task so that the job can use these certs
to scrape the server. This is a temporary solution that gets the job
passing; we should give the job its own certificates (issued by
Vault?) when we've done some of the infrastructure rework we'd like.
2022-03-04 08:55:06 -05:00
Tim Gross
03a8d72dba CSI: implement support for topology (#12129) 2022-03-01 10:15:46 -05:00
Luiz Aoqui
827cdca490 e2e: enable Consul HTTPS port and always restart Nomad systemd unit 2022-01-18 16:56:26 -05:00
Tim Gross
1391c37ef5 hclfmt on some config files (#11611) 2021-12-02 15:25:46 -05:00
Derek Strickland
bfac8d8456 Fix Vault E2E TLS config (#11483)
* Update e2e/terraform configuration for Vault and default to mtls=true
2021-12-02 12:20:09 -05:00