nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-07 19:05:42 +03:00

Author	SHA1	Message	Date
Seth Hoenig	2054e87158	e2e: add tests for exec2 task driver (#22406 ) * e2e: add tests for exec2 task driver * e2e: use envoy 1.29.4 because consul * e2e: add a bridge networking http test for exec driver * e2e: split up http test so curl always starts after the server	2024-05-31 09:22:39 -05:00
Seth Hoenig	9fb2b10ab6	e2e: no lnoger need consul terraform module (#22396 )	2024-05-28 08:04:03 -05:00
Tim Gross	91d422ec21	E2E: document how the AMIs are tagged and how those tags are used (#22237 ) The process by which we tag AMIs with the commit SHA of the Packer directory isn't documented in this repository, which makes it easy to accidentally build an AMI that will break nightly E2E.	2024-05-24 11:11:00 -05:00
Tim Gross	d40e23f939	E2E: clean up go mod cache after building `consul-cni` (#20378 ) In #20296 we added a Go tool chain to the AMI we use for E2E tests, so that we can build `consul-cni` for tproxy testing. This is intended to be temporary until `consul-k8s` 1.4.2 is officially released. But the Go cache from building `consul-k8s` uses up roughly 1.5GiB of space and the test machines have fairly small disks. This causes the Nomad clients to aggressively GC client allocations that stop, which breaks tests that run batch workloads and then read their logs.	2024-04-12 11:52:46 -04:00
Tim Gross	548adb0fd4	tproxy: E2E tests (#20296 ) Add the `consul-cni` plugin to the Linux AMI for E2E, and add a test case that covers the transparent proxy feature. Add test assertions to the Connect tests for upstream reachability Ref: https://github.com/hashicorp/nomad/pull/20175	2024-04-05 14:23:26 -04:00
Tim Gross	4ce728afbd	E2E: make `vault.create_from_role` unique per cluster (#20267 ) If a E2E cluster is destroyed after a different one has been created, the role and policy we create in Vault for the cluster will be deleted and Vault-related tests will fail. Note that before 1.9, we should figure out a way to give HCP Vault access to the JWKS endpoint and have a different set of policies, but we'll need to have a role-per-cluster in that case as well. Fixes: https://github.com/hashicorp/nomad-e2e/issues/138 (internal)	2024-04-03 08:45:01 -04:00
Tim Gross	cf25cf5cd5	E2E: use a self-hosted Consul for easier WI testing (#20256 ) Our `consulcompat` tests exercise both the Workload Identity and legacy Consul token workflow, but they are limited to running single node tests. The E2E cluster is network isolated, so using our HCP Consul cluster runs into a problem validating WI tokens because it can't reach the JWKS endpoint. In real production environments, you'd solve this with a CNAME pointing to a public IP pointing to a proxy with a real domain name. But that's logisitcally impractical for our ephemeral nightly cluster. Migrate the HCP Consul to a single-node Consul cluster on AWS EC2 alongside our Nomad cluster. Bootstrap TLS and ACLs in Terraform and ensure all nodes can reach each other. This will allow us to update our Consul tests so they can use Workload Identity, in a separate PR. Ref: #19698	2024-04-02 15:24:51 -04:00
Piotr Kazmierczak	8226a85263	e2e: remove deprecated template_file dependency for tf (#19313 ) This also allows running tf for our e2e suite locally on darwin.	2024-01-15 18:42:28 +01:00
Piotr Kazmierczak	858a805d7d	e2e: add a note about provisioning the infrastructure on macOS/Apple Silicon (#19727 )	2024-01-12 14:09:50 +01:00
Matt Robenolt	656bb5cafa	drivers/executor: set oom_score_adj for raw_exec (#19515 ) * drivers/executor: set oom_score_adj for raw_exec This might not be wholly true since I don't know all configurations of Nomad, but in our use cases, we run some of our tasks as `raw_exec` for reasons. We observed that our tasks were running with `oom_score_adj = -1000`, which prevents them from being OOM'd. This value is being inherited from the nomad agent parent process, as configured by systemd. Similar to #10698, we also were shocked to have this value inherited down to every child process and believe that we should also set this value to 0 explicitly. I have no idea if there are other paths that might leverage this or other ways that `raw_exec` can manifest, but this is how I was able to observe and fix in one of our configurations. We have been running in production our tasks wrapped in a script that does: `echo 0 > /proc/self/oom_score_adj` to avoid this issue. * drivers/executor: minor cleanup of setting oom adjustment * e2e: add test for raw_exec oom adjust score * e2e: set oom score adjust to -999 * cl: add cl --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-01-02 13:35:09 -06:00
Daniel Bennett	c7d01705f5	e2e: push nomad token to servers (#19312 ) so humans with root shell access can use it to debug not ideal security, but this is a short-lived test cluster	2023-12-05 08:54:57 -06:00
Daniel Bennett	4ec9343447	e2e: use tf variable defaults (#19108 )	2023-11-16 14:50:11 -06:00
Seth Hoenig	f211a0ab7c	e2e: update terrform lock file for 1.6.3 (#19049 ) Using the latest version of terraform, the lock file is not the same as when it was generated. Seems like the http module is not needed? versioned? present? anymore.	2023-11-09 10:49:49 -06:00
Seth Hoenig	402540f7fb	e2e: bump packer build instances because faster (#19046 )	2023-11-09 09:33:30 -06:00
Seth Hoenig	a28e5b6965	e2e: refactor metrics test to use NSD and WI (#19022 ) * e2e: remove old metrics suite * e2e: install stress on e2e jammy image * e2e: overhaul metrics test to use nomad service discovery, workload identity * e2e: format metrics hcl files and copywrite * e2e: undo tf lock file * e2e: undo reg auth file perms * e2e: format cpustress.hcl	2023-11-09 08:21:16 -06:00
Seth Hoenig	63da22063b	e2e: update pledge driver to 0.3.0 (#19020 )	2023-11-08 06:58:59 -06:00
Seth Hoenig	a2f7ab2645	e2e disable windows (#19012 ) * e2e: disable windows client * e2e: disable windows artifact test	2023-11-07 09:34:18 -06:00
Daniel Bennett	a51d46c65c	e2e: packer windows from "ECS_Optimized" image (#18453 ) "Containers" AMIs evaporated at some point... https://aws.amazon.com/marketplace/pp/prodview-yfve3zjgfjtug > This version has been removed and is no longer > available to new customers.	2023-09-11 12:26:32 -05:00
hashicorp-copywrite[bot]	a9d61ea3fd	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:29 -05:00
Seth Hoenig	8d28946993	e2e podman private registry (#17642 ) * e2e: add tests for using private registry with podman driver This PR adds e2e tests that stands up a private docker registry and has a podman tasks run a container from an image in that private registry. Tests - user:password set in task config - auth_soft_fail works for public images when auth is set in driver - credentials helper is set in driver auth config - config auth.json file is set in driver auth config * packer: use nomad-driver-podman v0.5.0 * e2e: eliminate unnecessary chmod Co-authored-by: Daniel Bennett <dbennett@hashicorp.com> * cr: no need to install nomad twice * cl: no need to install docker twice --------- Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-07-19 15:59:36 -05:00
Seth Hoenig	159bf51120	e2e: add some e2e tests for pledge task driver (#17909 ) * e2e: setup nomad for pledge driver * e2e: add some e2e tests for pledge task driver	2023-07-12 11:56:08 -05:00
Daniel Bennett	6bd509869b	e2e: use DNS instead of HTTP to get my_public_ipv4 (#17759 )	2023-06-28 13:11:57 -05:00
Daniel Bennett	748aea1c61	e2e: fix windows client docker (#17572 ) the windows docker install script stopped working. after trying various things to fix the script, I opted instead for a base image that comes with docker already installed. error output during build was: Installing Docker. WARNING: Cannot find path 'C:\Users\Administrator\AppData\Local\Temp\DockerMsftProvider\DockerDefault_DockerSearchIndex.json' because it does not exist. WARNING: Cannot bind argument to parameter 'downloadURL' because it is an empty string. WARNING: The property 'AbsoluteUri' cannot be found on this object. Verify that the property exists. WARNING: The property 'RequestMessage' cannot be found on this object. Verify that the property exists. Failed to install Docker. Install-Package : No match was found for the specified search criteria and package name 'docker'.	2023-06-20 10:17:16 -05:00
Seth Hoenig	6975409386	e2e: cleanup podman installation in jammy image (#17558 ) * e2e: cleanup podman installation in jammy image The original steps were copied over from the bionic image and does a lot of hoop jumping we do not need anymore. For the moment just hard-code installing the v0.4.2 version of the driver, but I may follow up and modify hc-install to support installing @latest like go itself. * use releases for hc-install	2023-06-15 18:17:31 -05:00
Seth Hoenig	6b2834559f	e2e: purge bionic packer image scripts (#17559 ) Bionic is dead, long live the Jammy!	2023-06-15 15:15:01 -05:00
Shawn	9898e85d09	fix: typo (#16873 )	2023-04-12 16:18:13 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	6cb69e5609	E2E: test enforcement of ACL system (#16796 ) This changeset provides a matrix test of ACL enforcement across several dimensions: * anonymous vs bogus vs valid tokens * permitted vs not permitted by policy * request sent to server vs sent to client (and forwarded)	2023-04-06 09:11:20 -04:00
Michael Schurter	282e3bcfcc	Enable ACLs on E2E test clients (#16530 ) * e2e: uniformly enable acls across all agents * docs: clarify that acls should be set everywhere	2023-03-16 14:22:41 -07:00
Seth Hoenig	40ab325594	e2e: setup nomad permissions correctly (client vs. server) (#16399 ) This PR configures - server nodes with a systemd unit running the agent as the nomad service user - client nodes with a root owned nomad data directory	2023-03-08 14:41:08 -06:00
Seth Hoenig	24af468b67	e2e: fix permissions on nomad data directory (#16376 ) This PR updates the provisioning step where we create /opt/nomad/data, such that it is with 0700 permissions in line with our security guidance.	2023-03-07 14:41:54 -06:00
Tim Gross	517ad9c5bf	E2E: add multi-home networking to test infrastructure (#16218 ) Add an Elastic Network Interface (ENI) to each Linux host, on a secondary subnet we have provisioned in each AZ. Revise security groups as follows: * Split out client security groups from servers so that we can't have clients accidentally accessing serf addresses or other unexpected cross-talk. * Add new security groups for the secondary subnet that only allows communication within the security group so we can exercise behaviors with multiple IPs. This changeset doesn't include any Nomad configuration changes needed to take advantage of the extra network interface. I'll include those with testing for PR #16217.	2023-02-20 10:08:28 +01:00
Seth Hoenig	6e4410a9b1	e2e: fix 1 of 4 client disconnect tests (#15357 ) This PR modifies the disconnect helper job to run as root, which is necesary for manipulating iptables as it does. Also re-organizes the final test logic to wait for client re-connect before looking for the replacement (3rd) allocation in case that client was needed to run the alloc (also giving the sheduler more time to do its thing). Skips the other 3 tests, which fail and I cannot yet figure out what is going on.	2022-11-22 08:51:53 -06:00
Seth Hoenig	78593daaee	e2e: jammy image needs latest java lts (#15323 )	2022-11-18 14:36:36 -06:00
Seth Hoenig	7c254ccdb8	e2e: disable systemd stub dns in jammy image (#15286 )	2022-11-17 09:50:44 -06:00
Seth Hoenig	0e3606afa0	e2e: swap bionic image for jammy (#15220 )	2022-11-16 10:37:18 -06:00
James Rasell	785b4dfad7	e2e: add acl test for token expiration. (#14418 ) In order to add an E2E test to cover token expiration, the server config has been updated to include a low minimum allowed TTL value. For ease of reading, the max value is also set.	2022-09-01 09:36:09 +02:00
James Rasell	264d2dd375	e2e: add terraform init commands to readme doc. (#13655 )	2022-07-08 16:52:35 +02:00
Tim Gross	aafcf97984	E2E: provide options for reverse proxy for web UI (#12671 ) Our E2E test environment is deployed with mTLS, but it's impractical for us to use mTLS in headless browsers for automated testing (or even in manual testing). Provide certificates for proxying the web UI via Nginx. This proxy uses client certs for proxying to the HTTP endpoint and a self-signed cert for the browser-facing endpoint. We can accept certificate errors in the automated tests we'll be adding in the next step of this work.	2022-04-19 16:55:05 -04:00
Tim Gross	e2a8d45f2d	E2E: terraform provisioner upgrades (#12652 ) While working on infrastructure for testing the UI in E2E, we needed to upgrade the certificate provider. Performing a provider upgrade via the TF `init -upgrade` brought in updates for the file and AWS providers as well. These updates include deprecating the use of `sensitive_content` fields, removing CA algorithm parameters that can be inferred from keys, and removing the requirement to manually specify AWS assume role parameters in the provider config if they're available in the calling environment's AWS config file (as they are via doormat or our E2E environment).	2022-04-19 14:27:14 -04:00
Derek Strickland	8f7abae89f	Update E2E terraform output command (#12561 )	2022-04-13 16:46:09 -04:00
Tim Gross	247e20e10b	scripts: fix interpreter for bash (#12549 ) Many of our scripts have a non-portable interpreter line for bash and use bash-specific variables like `BASH_SOURCE`. Update the interpreter line to be portable between various Linuxes and macOS without complaint from posix shell users.	2022-04-12 10:08:21 -04:00
Tim Gross	e8da15cae5	E2E: test exercising node drain behavior for CSI volumes (#12384 )	2022-03-29 11:19:23 -04:00
Tim Gross	37d831712f	E2E: namespace HCP vault and consul policies to avoid collisions (#12386 ) Concurrent E2E runs can collide when provisioning policies on HCP Consul and HCP Vault. Namespace these by the test run name, as we do for most everything else.	2022-03-25 16:05:59 -04:00
Tim Gross	020fa6f8ba	E2E with HCP Consul/Vault (#12267 ) Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits: * Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting. * Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting * Our E2E exercises HCP, which is important to us as an organization * With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner. * Packer builds for Linux and Windows become much simpler. tl;dr way less janky shell scripting!	2022-03-18 09:27:28 -04:00
Tim Gross	31b7de78fd	e2e: configure prometheus for mTLS for `Metrics` suite (#12181 ) The `Metrics` suite uses prometheus to scrape Nomad metrics so that we're testing the full user experience of extracting metrics from Nomad. With the addition of mTLS, we need to make sure prometheus also has mTLS configuration because the metrics endpoint is protected. Update the Nomad client configuration and prometheus job to bind-mount the client's certs into the task so that the job can use these certs to scrape the server. This is a temporary solution that gets the job passing; we should give the job its own certificates (issued by Vault?) when we've done some of the infrastructure rework we'd like.	2022-03-04 08:55:06 -05:00
Tim Gross	03a8d72dba	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
Luiz Aoqui	827cdca490	e2e: enable Consul HTTPS port and always restart Nomad systemd unit	2022-01-18 16:56:26 -05:00
Tim Gross	1391c37ef5	hclfmt on some config files (#11611 )	2021-12-02 15:25:46 -05:00
Derek Strickland	bfac8d8456	Fix Vault E2E TLS config (#11483 ) * Update e2e/terraform configuration for Vault and default to mtls=true	2021-12-02 12:20:09 -05:00

1 2 3 4

195 Commits