nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 01:15:43 +03:00

Author	SHA1	Message	Date
Tim Gross	288a048a2e	e2e: add prerelease builds to Consul/Vault compatibility tests (#23287 ) Update the Consul/Vault build downloader functions so that we include the current prerelease build (if any) in our E2E compatibility testing we do on each PR. This will automatically cycle out when the GA build is released, because that build is "higher" in the sorted set.	2024-06-11 08:54:27 -04:00
Seth Hoenig	2054e87158	e2e: add tests for exec2 task driver (#22406 ) * e2e: add tests for exec2 task driver * e2e: use envoy 1.29.4 because consul * e2e: add a bridge networking http test for exec driver * e2e: split up http test so curl always starts after the server	2024-05-31 09:22:39 -05:00
Seth Hoenig	9fb2b10ab6	e2e: no lnoger need consul terraform module (#22396 )	2024-05-28 08:04:03 -05:00
Tim Gross	91d422ec21	E2E: document how the AMIs are tagged and how those tags are used (#22237 ) The process by which we tag AMIs with the commit SHA of the Packer directory isn't documented in this repository, which makes it easy to accidentally build an AMI that will break nightly E2E.	2024-05-24 11:11:00 -05:00
James Rasell	04ba358266	client: expose network namespace CNI config as task env vars. (#11810 ) This change exposes CNI configuration details of a network namespace as environment variables. This allows a task to use these value to configure itself; a potential use case is to run a Raft application binding to IP and Port details configured using the bridge network mode.	2024-05-14 09:02:06 +01:00
Piotr Kazmierczak	abe9c0803a	e2e: unflake TestWorkloadIdentity/testNobody (#20499 ) sometimes the container quits too fast	2024-04-30 18:17:14 +02:00
Tim Gross	ff2d9de592	Revert "E2E: skip Vault 1.16.1 for JWT compatibility test (#20301 )" (#20484 ) This reverts commit 45b36371a12ffae5b5bfaaeadb08f801fb6bc98d. Now that Vault 1.16.2 has shipped, the E2E test will pick up only a working version. Closes: https://github.com/hashicorp/nomad/issues/20298	2024-04-26 09:36:09 -04:00
Tim Gross	d40e23f939	E2E: clean up go mod cache after building `consul-cni` (#20378 ) In #20296 we added a Go tool chain to the AMI we use for E2E tests, so that we can build `consul-cni` for tproxy testing. This is intended to be temporary until `consul-k8s` 1.4.2 is officially released. But the Go cache from building `consul-k8s` uses up roughly 1.5GiB of space and the test machines have fairly small disks. This causes the Nomad clients to aggressively GC client allocations that stop, which breaks tests that run batch workloads and then read their logs.	2024-04-12 11:52:46 -04:00
Tim Gross	8298d39e78	Connect transparent proxy support Add support for Consul Connect transparent proxies Fixes: https://github.com/hashicorp/nomad/issues/10628	2024-04-10 11:00:18 -04:00
Tim Gross	548adb0fd4	tproxy: E2E tests (#20296 ) Add the `consul-cni` plugin to the Linux AMI for E2E, and add a test case that covers the transparent proxy feature. Add test assertions to the Connect tests for upstream reachability Ref: https://github.com/hashicorp/nomad/pull/20175	2024-04-05 14:23:26 -04:00
Tim Gross	2382ab8776	E2E: ensure periodic test can't fail due to cron conflicts (#20300 ) The E2E test for periodic dispatch jobs has a `cron` trigger for once a minute. If the test happens to run at the top of the minute, it's possible for the forced dispatch to run from the test code, then the periodic timer triggers and leaves a running child job. This fails the test because it expects only a single job in the "dead" state. Make it so that the `cron` expression is implausible to run during our test window, and migrate the test off the old framework while we're at it.	2024-04-05 08:45:35 -04:00
Tim Gross	648daceca1	E2E: skip Vault 1.16.1 for JWT compatibility test (#20301 ) Vault 1.16.1 has a known issue around the JWT auth configuration that will prevent this test from ever passing. Skip testing the JWT code path on 1.16.1. Once 1.16.2 ships it will no longer get skipped. Ref: https://github.com/hashicorp/nomad/issues/20298	2024-04-04 17:00:35 -04:00
Tim Gross	c1f020d60f	E2E: refactor Connect tests to use stdlib testing (#20278 ) Migrate our E2E tests for Connect off the old framework in preparation for writing E2E tests for transparent proxy and the updated workload identity workflow. Mark the tests that cover the legacy Consul token submitted workflow. Ref: https://github.com/hashicorp/nomad/pull/20175	2024-04-04 10:48:10 -04:00
Tim Gross	4ce728afbd	E2E: make `vault.create_from_role` unique per cluster (#20267 ) If a E2E cluster is destroyed after a different one has been created, the role and policy we create in Vault for the cluster will be deleted and Vault-related tests will fail. Note that before 1.9, we should figure out a way to give HCP Vault access to the JWKS endpoint and have a different set of policies, but we'll need to have a role-per-cluster in that case as well. Fixes: https://github.com/hashicorp/nomad-e2e/issues/138 (internal)	2024-04-03 08:45:01 -04:00
Tim Gross	cf25cf5cd5	E2E: use a self-hosted Consul for easier WI testing (#20256 ) Our `consulcompat` tests exercise both the Workload Identity and legacy Consul token workflow, but they are limited to running single node tests. The E2E cluster is network isolated, so using our HCP Consul cluster runs into a problem validating WI tokens because it can't reach the JWKS endpoint. In real production environments, you'd solve this with a CNAME pointing to a public IP pointing to a proxy with a real domain name. But that's logisitcally impractical for our ephemeral nightly cluster. Migrate the HCP Consul to a single-node Consul cluster on AWS EC2 alongside our Nomad cluster. Bootstrap TLS and ACLs in Terraform and ensure all nodes can reach each other. This will allow us to update our Consul tests so they can use Workload Identity, in a separate PR. Ref: #19698	2024-04-02 15:24:51 -04:00
Tim Gross	de218d1919	E2E: change timing of `vaultsecrets` test to guarantee lease window (#20200 ) We've been getting a couple of errors from this test on nightly where the template hasn't rendered by the time we expect it to. I've run some tests locally and this may be a timing issue introduced by recent code changes to templates. Move the start of the timer to after we're guaranteed that we've got a secret lease TTL started, to eliminate this as a source of flakiness. In my tests this adds another ~5s to a test that already takes over a minute to run anyways.	2024-03-22 16:12:00 -04:00
Daniel Bennett	e059adef98	e2e: PreCleanup and other jobs3 helpers (#19844 )	2024-01-29 17:54:54 -06:00
Piotr Kazmierczak	543ba16e61	e2e: more retries for RequireConsulDeregistered (#19801 )	2024-01-22 20:11:48 +01:00
Piotr Kazmierczak	8a4bd61caf	e2e: WaitForJobStopped correction (#19749 )	2024-01-22 11:38:22 +01:00
Piotr Kazmierczak	8226a85263	e2e: remove deprecated template_file dependency for tf (#19313 ) This also allows running tf for our e2e suite locally on darwin.	2024-01-15 18:42:28 +01:00
Piotr Kazmierczak	609f3a60b5	e2e: purging jobs removes all allocs (#19744 ) There's no need to wait for allocs since #19609, in fact waiting for allocs to stop will always fail leading to e2e failures.	2024-01-15 17:54:35 +01:00
Piotr Kazmierczak	858a805d7d	e2e: add a note about provisioning the infrastructure on macOS/Apple Silicon (#19727 )	2024-01-12 14:09:50 +01:00
Seth Hoenig	a58f0eca8e	e2e: move rawexec oversub tests into oversubscription e2e test suite (#19717 ) * e2e: move rawexec oversub tests into oversubscription e2e test suite This PR moves two tests for raw_exec and memory oversubscription into the oversubscription test suite, which has the necessary plumbing to activate and restore the oversubscription configuration of the scheduler during the test. * cr: rename files for better readability	2024-01-11 14:27:05 -06:00
Piotr Kazmierczak	930339a0fa	e2e: remove broken Consul WI test (#19697 )	2024-01-10 21:31:18 +01:00
Seth Hoenig	cb7d078c1d	drivers/raw_exec: enable configuring raw_exec task to have no memory limit (#19670 ) * drivers/raw_exec: enable configuring raw_exec task to have no memory limit This PR makes it possible to configure a raw_exec task to not have an upper memory limit, which is how the driver would behave pre-1.7. This is done by setting memory_max = -1. The cluster (or node pool) must have memory oversubscription enabled. * cl: add cl	2024-01-09 14:57:13 -06:00
Seth Hoenig	ccfb13a72d	e2e: add test for raw_exec memory_max configuration (#19596 ) * e2e: add test for raw_exec memory_max configuration * docs: note raw_exec supports memory_max in resources documentation	2024-01-04 08:25:56 -06:00
Piotr Kazmierczak	aa197cf824	e2e: pass Nomad address to Consul WI test (#19603 )	2024-01-04 08:52:39 +01:00
Piotr Kazmierczak	a87aa71f55	e2e: fix typo in Consul e2e (#19589 )	2024-01-03 09:34:38 +01:00
Matt Robenolt	656bb5cafa	drivers/executor: set oom_score_adj for raw_exec (#19515 ) * drivers/executor: set oom_score_adj for raw_exec This might not be wholly true since I don't know all configurations of Nomad, but in our use cases, we run some of our tasks as `raw_exec` for reasons. We observed that our tasks were running with `oom_score_adj = -1000`, which prevents them from being OOM'd. This value is being inherited from the nomad agent parent process, as configured by systemd. Similar to #10698, we also were shocked to have this value inherited down to every child process and believe that we should also set this value to 0 explicitly. I have no idea if there are other paths that might leverage this or other ways that `raw_exec` can manifest, but this is how I was able to observe and fix in one of our configurations. We have been running in production our tasks wrapped in a script that does: `echo 0 > /proc/self/oom_score_adj` to avoid this issue. * drivers/executor: minor cleanup of setting oom adjustment * e2e: add test for raw_exec oom adjust score * e2e: set oom score adjust to -999 * cl: add cl --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-01-02 13:35:09 -06:00
Piotr Kazmierczak	bb3d2227a2	e2e: add a test for checking default WI Consul workflow for services and tasks (#19500 )	2024-01-02 16:02:32 +01:00
Daniel Bennett	eb23add189	e2e: sleep in docker job (#19434 )	2023-12-11 15:38:14 -06:00
Tom Davies	c983a8f0ad	Fixes Consul token checking when policies exist within namespaces (#18516 ) * e2e/connect: adds test for namespace policies * consul: use token namespace when fetching policies * changelog * fixup! e2e/connect: adds test for namespace policies	2023-12-11 10:07:32 -06:00
Seth Hoenig	f3cbe2e29a	e2e: sleep a bit in short lived docker jobs (#19384 )	2023-12-08 10:44:43 -06:00
Daniel Bennett	e9ff6d74d3	e2e: unflake oversubscription.testExec (#19373 ) poll with must.Wait() instead of hard-coded sleep waiting for poststart task to run, and wait for longer	2023-12-08 10:20:18 -06:00
Daniel Bennett	7baf3c012c	e2e: even more time for exec+java tests (#19347 )	2023-12-07 10:23:39 -06:00
Seth Hoenig	8cde7a4f70	e2e: turn of extreme verbose metrics test logging (#19330 )	2023-12-06 16:08:49 -06:00
Tim Gross	340c9ebd47	E2E: extend timeout on CSI snapshot test (#19338 ) The EBS snapshot operation can take a long time to complete. Recent runs have shown we sometimes get up to the 10s timeout on the context we're giving the CLI command. Extend this so that we're not getting spurious timeouts. Fixes: https://github.com/hashicorp/nomad/issues/19118	2023-12-06 16:34:54 -05:00
Daniel Bennett	36f69a8e88	e2e: more occasionally slow exec tasks (#19337 )	2023-12-06 15:22:15 -06:00
Daniel Bennett	9fe1f0aadc	e2e: fix ConsulNamespaces tests (#19325 ) * cleanup consul tokens by accessor id rather than secret id, which has been failing for some time with: > 404 (Cannot find token to delete) * expect subset of consul namespaces the consul test cluster may have namespaces from other unrelated tests	2023-12-06 12:21:27 -06:00
Seth Hoenig	87e7bf4ab2	e2e: skip connect test that does a restart of nomad agent (#19316 )	2023-12-05 09:15:09 -06:00
Seth Hoenig	35ccb7ecdb	e2e: use correct url to download zip file from go-getter repository (#19315 )	2023-12-05 09:11:08 -06:00
Seth Hoenig	cc65f39c82	e2e/v3: dump eval if detected as cancelled (#19310 )	2023-12-05 09:07:12 -06:00
Daniel Bennett	c7d01705f5	e2e: push nomad token to servers (#19312 ) so humans with root shell access can use it to debug not ideal security, but this is a short-lived test cluster	2023-12-05 08:54:57 -06:00
Seth Hoenig	6779d7c7b4	e2e: add a ShowState() option to cluster3.Establish options (#19303 ) This will dump much of the interesting parts of cluster state, including available nodes and their status, existing allocations and their status, and existing evaluations and their status.	2023-12-04 12:37:21 -06:00
Daniel Bennett	d34788896f	e2e: jobs3-submitted jobs automatically cleanup (#19284 ) so that cleanup occurs even if the job fails to run (unless configured not to)	2023-12-01 15:57:23 -06:00
Daniel Bennett	bfb2263f30	e2e: give isolation test jobs more time to start (#19276 )	2023-12-01 14:03:40 -06:00
Seth Hoenig	5b3416bd97	e2e: set e2e/v3 debug logging on metrics test (#19263 )	2023-12-01 10:03:55 -06:00
Tim Gross	05fe2ad191	E2E: fix assertion in CT native service lookup test (#19249 ) When porting the `ConsulTemplate` test, I made a last-minute refactor to the assertions for waiting on files, and accidentally inverted the test assertion in the process. Also, when running `jobs3.Submit` you need to include the `Namespace` option so that the cleanup function that gets return deletes the job from the correct namespace. This was causing the namespace cleanup to fail because the job deletion had failed.	2023-12-01 08:54:55 -05:00
Daniel Bennett	639c3f53c9	e2e: give node drain KillTimeout test more time (#19226 ) and error more verbosely if it fails also, add extra information to a failed evaluation for more error visibility in other tests --------- Co-authored-by: Juanadelacuesta <juanita.delacuestamorales@hashicorp.com>	2023-11-30 10:37:20 -06:00
Luiz Aoqui	969cdb0f46	test: add consul namespace rules to consulcompat (#19227 ) When configuring Consul for multi-namespace support, the JWT auth method needs to specify namespace rules. This attribute is set to `nil` in CE but is used in Nomad ENT.	2023-11-30 10:13:08 -05:00

1 2 3 4 5 ...

718 Commits