nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
James Rasell	9e893ef2ad	e2e: Add Client Intro test framework and initial test. (#26639 ) The new client intro test mimics the Consul and Vault compat tests and uses local agents to perform the required setup. This method allows us the flexibility moving forward to test when enforcement mode is in strict. The test suite will now be triggered from the test-e2e CI run and can also be called by a make target.	2025-08-28 09:53:07 +01:00
James Rasell	d0ffb31fea	e2e: Add Client Identity get and renew tests. (#26632 )	2025-08-26 13:49:06 +01:00
Allison Larson	3fff1aa3cc	Support IMDSv2 on windows e2e runners (#26629 )	2025-08-25 15:37:50 -07:00
Tim Gross	767683ce3e	E2E: allow setting instance_type variable (#26607 ) When we refactored the E2E provisioning to allow it to be reused by the upgrade testing, we didn't thread the `instance_type` variable from the main module down into the `provision-infra` module. This prevents you from setting a custom instance size when deploying the E2E cluster manually.	2025-08-22 15:22:10 -04:00
Allison Larson	f6a078c7e5	Disable IMDSv2 on windows test instances (#26606 )	2025-08-21 16:29:35 -07:00
Allison Larson	694e0ac2e3	Require IMDSv2 for e2e EC2 instances (#26585 ) Re-enables this now that go-discover is updated in all the right places.	2025-08-20 14:47:43 -07:00
Daniel Bennett	8675fba382	e2e: install exec2 driver v0.1.0 (#26578 ) for auto-unveil of NOMAD_SECRETS_DIR following `f3e08d8aa9`	2025-08-19 11:28:57 -04:00
Daniel Bennett	f3e08d8aa9	e2e: exec2: envoy binary version and tidying (#26558 ) * e2e: update standalone envoy binary version fix for: > === FAIL: e2e/exec2 TestExec2/testCountdash (21.25s) > exec2_test.go:71: > ... > [warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:155] DeltaAggregatedResources gRPC config stream to local_agent closed: 3, Envoy 1.29.4 is too old and is not supported by Consul there's also this warning, but it doesn't seem so fatal: > [warning][main] [source/server/server.cc:910] There is no configured limit to the number of allowed active downstream connections. Configure a limit in `envoy.resource_monitors.downstream_connections` resource monitor. picked latest supported from latest consul (1.21.4): ``` $ curl -s localhost:8500/v1/agent/self \| jq .xDS.SupportedProxies { "envoy": [ "1.34.1", "1.33.2", "1.32.5", "1.31.8" ] } ``` * e2e: exec2: remove extraneous bits * reschedule: no reschedule for batch jobs * unveil: nomad paths get auto-unveiled with unveil_defaults https://github.com/hashicorp/nomad-driver-exec2/blob/v0.1.0/plugin/driver.go#L514-L522	2025-08-18 14:58:00 -04:00
Daniel Bennett	2c699b9794	sysbatch: fix panic from reschedule block (#26534 ) * fix panic from nil ReschedulePolicy commit `279775082c` (pr #26279) intended to return an error for sysbatch jobs with a reschedule block, but in bypassing populating the `ReschedulePolicy`'s pointer fields, a nil pointer panic occurred before the job could get rejected with the intended error. in particular, in `command/agent/job_endpoint.go`, `func ApiTgToStructsTG`, ``` if taskGroup.ReschedulePolicy != nil { tg.ReschedulePolicy = &structs.ReschedulePolicy{ Attempts: taskGroup.ReschedulePolicy.Attempts, Interval: taskGroup.ReschedulePolicy.Interval, ``` `taskGroup.ReschedulePolicy.Interval` was a nil pointer. fix e2e test jobs	2025-08-18 10:19:14 -04:00
Aimee Ukasick	a30cb2f137	Update UI, code comment, and README links to docs, tutorials (#26429 ) * Update UI, code comment, and README links to docs, tutorials * fix typo in ephemeral disks learn more link url * feedback on typo Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-08-06 09:40:23 -05:00
Tim Gross	6e5ecb6bb0	E2E: update Consul/Vault compat versions tested (#26369 ) Update our E2E compatibility test for Consul and Vault to only include back to the oldest-supported LTS versions of Consul and Vault. This will still leave a few unsupported non-LTS versions in the matrix between the two oldest LTS, but this is a small number of tests and fixing it would mean hard-coding the LTS support matrix in our tests.	2025-07-28 12:03:30 -04:00
James Rasell	5989d5862a	ci: Update golangci-lint to v2 and fix highlighted issues. (#26334 )	2025-07-25 10:44:08 +01:00
Daniel Bennett	949b23602c	e2e: ui: bump playwright version (#26119 )	2025-06-23 13:31:11 -04:00
Tim Gross	7bfc04576a	E2E: disable sdnotify for Consul agents (#26078 ) In our E2E environment we've seen some flakiness with the Consul-related tests. As it turns out, the Consul agents are getting restarted every 90s or so because they're timing out their systemd notification. > consul.service: start operation timed out. Terminating. This appears to be a known issue in Consul and we'll try to contribute some help to hunt down the cause if they want help, but in the meantime let's remove it from our systemd unit files for the Consul agents. Ref: https://github.com/hashicorp/consul/issues/16844#issuecomment-1913282248	2025-06-18 17:03:32 -04:00
Tim Gross	976ea854b0	E2E: fix scaling test assertion for extra Windows host (#26077 ) * E2E: fix scaling test assertion for extra Windows host The scaling test assumes that all nodes will receive the system job. But the job can only run on Linux hosts, so the count will be wrong if we're running a Windows host as part of the cluster. Filter the expected count by the OS. While we're touching this test, let's also migrate it off the legacy framework. * address comments from code review	2025-06-18 17:03:17 -04:00
Tim Gross	3c67ba0516	E2E: update TaskAPI test for Windows (#26074 ) The current version of Windows we're using ships with curl, so we don't need to download it as an artifact anymore. Remove the broken reference to this in the TaskAPI test for Windows. Ref: https://github.com/hashicorp/nomad-e2e/actions/runs/15708894856/job/44267973319	2025-06-17 16:03:50 -04:00
Tim Gross	d6800c41c1	E2E: include Windows 2022 host in test targets (#26003 ) Some time ago the Windows host we were using as a Nomad client agent test target started failing to allow ssh connections. The underlying problem appears to be with sysprep but I wasn't able to debug the exact cause as it's not an area I have a lot of expertise in. Swap out the deprecated Windows 2016 host for a Windows 2022 host. This will use a base image provided by Amazon and then we'll use a userdata script to bootstrap ssh and some target directories for Terraform to upload files to. The more modern Windows will let us drop some of extra powershell scripts we were using as well. Fixes: https://hashicorp.atlassian.net/browse/NMD-151 Fixes: https://github.com/hashicorp/nomad-e2e/issues/125	2025-06-16 12:12:15 -04:00
Daniel Bennett	7519df8d06	task env: add NOMAD_UNIX_ADDR var (#25598 ) for easier setup when using workload identity + task api	2025-06-11 15:56:51 -04:00
Piotr Kazmierczak	348177d118	e2e: correct `TestSingleAffinities` behavior (#25943 ) TestSingleAffinities never expected a node with affinity score set to 0 in the set of returned nodes. However, since #25800, this can happen. What the test should be checking for instead is that the node with the highest normalized score has the right affinity.	2025-05-30 19:46:08 +02:00
Michael Smithhisler	4c8257d0c7	client: add once mode to template block (#25922 )	2025-05-28 11:45:11 -04:00
Piotr Kazmierczak	a10c2f6de7	e2e: mention in the terraform readme that we require a local Consul binary (#25944 )	2025-05-28 17:12:57 +02:00
Tim Gross	0e728b87db	E2E: remove dnsmasq and references to ECS plugin (#25892 ) The DNS configuration for our E2E cluster uses dnsmasq to pass all DNS through Consul. But there's a circular reference in systemd configurations that sometimes causes the Docker service to fail, this is causing test flakes during upgrade testing because we count the number of nodes and expect `system` jobs using Docker to run on all nodes. We no longer have any tests that require Consul DNS, so remove the complication of dnsmasq to break the reference cycle. Also, while I was looking at this I noticed we still had setup that would configure the ECS remote task driver plugin, which is archived. Remove this as well. Ref: https://hashicorp.atlassian.net/browse/NMD-162	2025-05-20 08:26:22 -04:00
James Rasell	4b40e10e68	e2e: Update UI playwright version to 1.52.0 (#25740 )	2025-04-24 13:38:26 +01:00
James Rasell	717207bce0	e2e: Fix TestDocker/testRedis with increased timeout on deployment (#25739 ) The fresh deployment of the Redis job took around 20s which is also the default context timeout on the e2e util that monitors and waits for a deployment to complete. The tight timing meant the test often timed out but sometimes would complete successfully. Increasing the timeout for this deployment will remove the flakiness.	2025-04-24 09:09:33 +01:00
Tim Gross	88dc842729	testing: use Docker Hub registry mirror for CI (#25703 ) As of April 1, Docker Hub rate limits tightened. With only 10 pulls/hr/IP, we're likely to encounter test failures. Switch all Docker images getting pulled from this repository to use the HashiCorp managed registry mirror. Note that most of our tests in `drivers/docker` don't pull from the remote registry but load a local image, while others will need to pull from the remote and fetch different images depending on OS/arch. Refactor the definition of test task configuration to make it clear which is which, and de-factor some false sharing of setup functions. Updates the E2E tests to use that registry by configuring the Docker daemon. This required changing out a few container images that we don't have in the registry, but these new images are all smaller. There are a couple of tests that still use explicitly-tagged `docker.io` images or other third-party registries, which have been left in place. Ref: https://hashicorp.atlassian.net/browse/NET-12233 update E2E images to those in the registry mirror fix windows and docklog test build fix stopsignal test mop-up more mop-up	2025-04-18 14:21:49 -04:00
James Rasell	311a83d706	e2e: Ensure UI is enabled. (#25620 ) The `ui.enabled` parameter is a non-pointer bool which means the merge function is unable to differentiate between false and not set. When e2e introduced the `ui.show_cli_hints` configuration parameter, the way we merge meant the UI became disabled.	2025-04-08 13:57:29 +01:00
James Rasell	6c39285538	e2e: Ensure test resources are cleaned. (#25611 ) I couldn't find any reason the exec2 HTTP jobs were not being run with a generated cleanup function, so I added this. The deletion of the DHV ACL policy does not seem like it would have any negative impact.	2025-04-07 14:15:29 +01:00
Phil Renaud	afa9e65afa	Update playwright to 1.51.0 for e2e ui tests (#25585 )	2025-04-02 15:12:00 +01:00
Michael Smithhisler	c8cc519f54	e2e: disable cli hints for command parsing (#25584 )	2025-04-02 09:12:36 -04:00
Michael Smithhisler	95c9029df0	e2e: update consul task policy and add empty consul block to task groups (#25580 )	2025-04-01 16:29:47 -04:00
Michael Smithhisler	077c1921ef	e2e: disable IMDSv2 in tests (#25564 ) Consul needs to use a newer version of go-discover that can query IMDSv2 in order for our test infrastructure to be enabled with it.	2025-03-31 12:07:45 -04:00
Michael Smithhisler	8e3625a716	e2e: create consul policies and roles in respective namespaces (#25546 )	2025-03-28 13:52:49 -04:00
Piotr Kazmierczak	a1fd9da705	e2e: require IMDSv2 for ec2 instances (#25541 ) Require Instance Metadata Service v2 to access EC2 instance metadata for all VMs that run our e2e suite.	2025-03-28 09:58:51 +01:00
Michael Smithhisler	f0e0215d56	e2e: fix consul e2e enterprise logic in bootstrapping (#25532 )	2025-03-26 14:08:20 -04:00
Michael Smithhisler	c66269f8d0	e2e: fixes node write policy for consul agents (#25418 )	2025-03-17 15:18:30 -04:00
Juana De La Cuesta	9b9d16421e	Merge branch 'main' into NET-11546-enos-drain	2025-03-17 16:14:18 +01:00
Juanadelacuesta	4b0903789e	func: add check script for vault workload	2025-03-14 17:03:35 +01:00
Juanadelacuesta	3af2da7362	fix: add default policy to consul acl configurations for the e2e cluster	2025-03-14 16:46:03 +01:00
Juanadelacuesta	4c1ba45d48	func: add workload to test vault workload identity	2025-03-13 17:55:59 +01:00
Tim Gross	5cc1b4e606	upgrade tests: add transparent proxy workload (#25176 ) Add an upgrade test workload for Consul service mesh with transparent proxy. Note this breaks from the "countdash" demo. The dashboard application only can verify the backend is up by making a websocket connection, which we can't do as a health check, and the health check it exposes for that purpose only passes once the websocket connection has been made. So replace the dashboard with a minimal nginx reverse proxy to the count-api instead. Ref: https://hashicorp.atlassian.net/browse/NET-12217	2025-03-07 15:25:26 -05:00
Tim Gross	c3e2d4a652	E2E: remove outdated legacy token workflow tests (#25315 ) In https://github.com/hashicorp/nomad/pull/25217 we removed the legacy Consul token workflow, and in https://github.com/hashicorp/nomad/pull/25174 we removed the related E2E tests. But we missed the tests in the `e2e/connect` package. After removing these tests, Consul-related E2E tests in this repo pass.	2025-03-07 15:09:36 -05:00
Michael Smithhisler	5c4d0e923d	consul: Remove legacy token based authentication workflow (#25217 )	2025-03-05 15:38:11 -05:00
Tim Gross	916fe2c7fa	upgrade testing: rework CSI test to use self-contained workload (#25285 ) Getting the CSI test to work with AWS EFS or EBS has proven to be awkward because we're having to deal with external APIs with their own consistency guarantees, as well as challenges around teardown. Make the CSI test entirely self-contained by using a userland NFS server and the rocketduck CSI plugin. Ref: https://hashicorp.atlassian.net/browse/NET-12217 Ref: https://gitlab.com/rocketduck/csi-plugin-nfs	2025-03-05 11:48:19 -05:00
Michael Smithhisler	25cea5c16b	e2e: allow consul access to nomad cluster (#25277 )	2025-03-04 09:06:50 -05:00
Michael Smithhisler	7867957811	e2e: remove legacy consul token tests (#25174 )	2025-02-28 11:31:33 -05:00
Tim Gross	3b9290a11e	E2E: fix column parsing for dynamic host volumes test (#25228 ) In #25185 we changed the output of `volume status` to include both DHV and CSI volumes by default. When the E2E test parses the output, it's not expecting the new section header. Ref: https://github.com/hashicorp/nomad/pull/25185	2025-02-26 09:52:47 -05:00
James Rasell	8bce0b0954	e2e: Migrate legacy Vault token based workflow to workload ID (#25139 ) Nomad 1.10.0 is removing the legacy Vault token based workflow which means the legacy e2e compatibility tests will fail and not work. The Nomad e2e cluster was using the legacy Vault token based workflow for initial cluster build. This change migrates to using the workload identity flow which utilizes authentication methods, roles, and policies. The Nomad server network has been modified to allow traffic from the HCP Vault HVN which is a private network peered into our AWS account. This is required, so that Vault can pull JWKS information from the Nomad API without going over the public internet. The cluster build will now also configure a Vault KV v2 mount at a unique indentifier for the e2e cluster. This allows all Nomad workloads and tests to use this if required. The vaultsecrets suite has been updated to accommodate the new changes and extended to test the default workload ID flow for allocations which use Vault for secrets.	2025-02-20 14:06:25 +00:00
Tim Gross	86e1d6da52	E2E: use repo root to find correct git sha for AMI (#25151 ) The nightly E2E run only builds a new AMI when required by changes to the build. The AMI is tagged with the SHA of the commit that forced that build, which may not be the commit that's spawning a particular test run. So we have a resource in the `provision-infra` module that finds that SHA. But when we run upgrade testing via Enos, we're running the E2E Terraform configuration from outside the `e2e/terraform` folder. So the script that resource runs will fail and prevent us from getting the AMI. Fix the script so it can be run from any folder. We also have duplicate resources for the "ubuntu jammy" AMI, but this is because the Enos matrix might (in the near future) test with ARM64. For now, we'll pin the Consul server to AMD64. Rename the resource appropriately to make the source of the duplicate obvious.	2025-02-19 08:59:22 -05:00
Juana De La Cuesta	af2ac87409	Simplify binary overrides on e2e provision (#25122 ) * func: remove the lists to override the nomad_local_binary for servers and clients * docs: add a note to the terraform e2e readme * fix: remove the extra 'windows' from the aws_ami filter * style: hcl fmt	2025-02-17 16:13:32 +01:00
Daniel Bennett	92c90af542	e2e: task schedule: pauses vs restarts (#25085 ) CE side of ENT PR: task schedule: pauses are not restart "attempts" distinguish between these two cases: 1. task dies because we "paused" it (on purpose) - should not count against restarts, because nothing is wrong. 2. task dies because it didn't work right - should count against restart attempts, so users can address application issues. with this, the restart{} block is back to its normal behavior, so its documentation applies without caveat.	2025-02-11 09:46:58 -06:00

1 2 3 4 5 ...

823 Commits