Commit Graph

547 Commits

Author SHA1 Message Date
James Rasell
785b4dfad7 e2e: add acl test for token expiration. (#14418)
In order to add an E2E test to cover token expiration, the server
config has been updated to include a low minimum allowed TTL
value. For ease of reading, the max value is also set.
2022-09-01 09:36:09 +02:00
James Rasell
91e7d8497b e2e: add ACL test suite with ACL Role test. (#14398)
This adds a new ACL test suite to the e2e framework which includes
an initial test for ACL roles. The ACL test includes a helper to
track and clean created Nomad resources which keeps the test
cluster clean no matter if the test fails early or not.
2022-08-31 10:11:28 +02:00
Seth Hoenig
f8beb53471 e2e: add e2e tests for nomad service disco checks
This PR adds 2 e2e tests for ensuring nomad service discovery checks
get created and produce status results as expected.
2022-08-22 15:31:13 -05:00
Piotr Kazmierczak
c4be2c6078 cleanup: replace TypeToPtr helper methods with pointer.Of (#14151)
Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.
2022-08-17 18:26:34 +02:00
Seth Hoenig
0c62f445c3 build: run gofmt on all go source files
Go 1.19 will forecefully format all your doc strings. To get this
out of the way, here is one big commit with all the changes gofmt
wants to make.
2022-08-16 11:14:11 -05:00
Tim Gross
c182b40408 e2e: move namespaces test out of legacy framework (#13934)
This PR continues work we've started on other test suites to use the native
golang test runner instead of the custom framework.
2022-08-01 13:24:34 -04:00
Seth Hoenig
2d83f130fe e2e: add nsd simple load balancing test 2022-07-14 15:07:19 -05:00
James Rasell
264d2dd375 e2e: add terraform init commands to readme doc. (#13655) 2022-07-08 16:52:35 +02:00
James Rasell
24220d0a02 core: allow pausing and un-pausing of leader broker routine (#13045)
* core: allow pause/un-pause of eval broker on region leader.

* agent: add ability to pause eval broker via scheduler config.

* cli: add operator scheduler commands to interact with config.

* api: add ability to pause eval broker via scheduler config

* e2e: add operator scheduler test for eval broker pause.

* docs: include new opertor scheduler CLI and pause eval API info.
2022-07-06 16:13:48 +02:00
Derek Strickland
e78a5908b9 docker: update images to reference hashicorpdev Docker organization (#12903)
docker: update images to reference hashicorpdev dockerhub organization
generate job_init.bindata_assetfs.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-06-08 15:06:00 -04:00
James Rasell
78553f9aad e2e: use longer wait in template update triggers to avoid flake. (#13271) 2022-06-07 14:49:03 +02:00
Tim Gross
e0c290794d e2e: upgrade playwright package and container image (#13080)
The nightly playwright tests are currently failing because of a
mismatch between the expected version of Chromium and what's in the
container image. Unfortunately the previous specific tag we were using
for the container image is no longer tagged on the registry. With some
testing, I was able to find an image tag that results in a good run.
2022-05-20 08:41:07 -04:00
Derek Strickland
de59d73009 e2e: Wait for deployment to finish before disconnect (#12795)
* Wait for deployment to finish
* Don't reschedule disconnect or restart-node jobs
2022-04-27 12:27:03 -04:00
Tim Gross
3671ea6a8f remove pre-0.9 driver code and related E2E test (#12791)
This test exercises upgrades between 0.8 and Nomad versions greater
than 0.9. We have not supported 0.8.x in a very long time and in any
case the test has been marked to skip because the downloader doesn't
work.
2022-04-27 09:53:37 -04:00
Tim Gross
059c89dff0 E2E: move volume mounts test to use golang's stdlib test runner (#12788)
Part of ongoing work to remove the old E2E framework code.
2022-04-26 14:28:20 -04:00
Tim Gross
26b0e04717 E2E: remove old CLI for driving provisioning (#12787)
We moved off the old provisioning process for nightly E2E to one driven
entirely by Terraform quite a while back now. We're in the slow
process of removing the framework code for this test-by-test, but this
chunk of code no longer has any callers.
2022-04-26 13:43:25 -04:00
Tim Gross
512338f0aa E2E: remove platform specific realpath code from UI run script (#12750)
We don't need the absolute path for any of the commands in this script
so long as we `cd` into the source directory path. Doing this removes
the need for weird platform-specific tricks we have to do with
realpath vs GNU realpath.
2022-04-22 10:10:18 -04:00
Tim Gross
a29023ef69 E2E: fix debug logging on disconnected clients test (#12621) 2022-04-22 09:07:05 -04:00
Tim Gross
cf913ba66b E2E: make UIs runnable from any working directory (#12739)
The E2E test runner is running from the root of the Nomad
repository. Make this run independent of the working directory for
convenience of developers and the test runner.
2022-04-21 17:00:01 -04:00
Tim Gross
5c17f91117 E2E: set longer timeout for CSI plugin alloc start (#12732)
The CSI plugin allocations take a while to be marked healthy,
sometimes causing E2E test flakes during the setup phase of the
tests. There's nothing CSI specific about marking plugin allocs
healthy, as the plugin supervisor hook does all the fingerprinting in
the postrun hook (the prestart hook just makes a couple of empty
directories). The timeouts we're seeing may be because of where we're
pulling the images from; most our jobs pull from a CDN-backed public
registry whereas these are pulling from ECR. Set a 1min timeout for
these to make sure we have enough time to pull the image and start the
task.
2022-04-21 11:11:43 -04:00
Tim Gross
d1aa801407 E2E: playwright configuration and smoke test (#12721)
Scripts for running playwright tests in a Docker container that has
chromium and webkit preinstalled. Includes a basic smoke test for
authentication so that we can be sure the test rig is working
end-to-end. Wiring this up in CI will be in an upcoming PR.
2022-04-21 09:13:10 -04:00
Tim Gross
aafcf97984 E2E: provide options for reverse proxy for web UI (#12671)
Our E2E test environment is deployed with mTLS, but it's impractical
for us to use mTLS in headless browsers for automated testing (or even
in manual testing). Provide certificates for proxying the web UI via
Nginx. This proxy uses client certs for proxying to the HTTP endpoint
and a self-signed cert for the browser-facing endpoint. We can accept
certificate errors in the automated tests we'll be adding in the next
step of this work.
2022-04-19 16:55:05 -04:00
Tim Gross
e2a8d45f2d E2E: terraform provisioner upgrades (#12652)
While working on infrastructure for testing the UI in E2E, we needed
to upgrade the certificate provider. Performing a provider upgrade via
the TF `init -upgrade` brought in updates for the file and AWS
providers as well. These updates include deprecating the use of
`sensitive_content` fields, removing CA algorithm parameters that can
be inferred from keys, and removing the requirement to manually
specify AWS assume role parameters in the provider config if they're
available in the calling environment's AWS config file (as they are
via doormat or our E2E environment).
2022-04-19 14:27:14 -04:00
Tim Gross
4ca980311c E2E: add debugging outputs for disconnected clients test (#12572)
This test has a failure that's happening only occassionally and not
very reproducibly. Print out the allocation status on test failure so
that we can do some post-mortum debugging of the test on nightly.
2022-04-14 17:03:57 -04:00
Derek Strickland
8f7abae89f Update E2E terraform output command (#12561) 2022-04-13 16:46:09 -04:00
Tim Gross
247e20e10b scripts: fix interpreter for bash (#12549)
Many of our scripts have a non-portable interpreter line for bash and
use bash-specific variables like `BASH_SOURCE`. Update the interpreter
line to be portable between various Linuxes and macOS without
complaint from posix shell users.
2022-04-12 10:08:21 -04:00
Tim Gross
86ca8f7e73 E2E: fix flaky event stream test (#12548)
This changeset fixes two sources of flakiness in the event stream test.

First, the stream request gets the event *closest* to the index, not
the exact match. Although events are written before raft entries
they're written asynchronously, so it's possible to race and get a
raft index from this query higher than the current head of the event
buffer. Ensure the job is running before we try to get the index, so
that we've given the event enough time to land in the buffer.

Second, the assertion that the found index is greater than the start
index is only true if the `PlanResult` event manages to land before we
do the second registration. Although it should now with the first fix
above, it's not a correct assertion for what we're testing.
2022-04-12 08:35:39 -04:00
Tim Gross
1c13deec86 E2E: oversubscription assertion needs to wait for stats (#12540)
The oversubscription test expects an output that requires the client
has polled the task for stats at least once. Wait long enough to
ensure that we've polled the stats before failing the test.
2022-04-11 11:40:51 -04:00
Tim Gross
62f2cd77fa E2E: test for nodes disconnected by netsplit (#12407) 2022-04-11 11:34:27 -04:00
James Rasell
bd415dfd85 e2e: add initial service discovery tests. (#12512)
Some tests may chose to deregister jobs to check Nomad cleanup
logic, however, it is still possible for the test to fail and exit
before this is hit. This therefore adds a cancellable cleanup func
which can be deferred, using context to control whether it gets
run or not.
2022-04-11 11:12:24 +02:00
James Rasell
a30c3f36bf e2e: fix eventual consistency failure within consultemplate suite. (#12494) 2022-04-07 17:03:10 +02:00
James Rasell
9dc0b88cb5 client: add Nomad template service functionality to runner. (#12458)
This change modifies the template task runner to utilise the
new consul-template which includes Nomad service lookup template
funcs.

In order to provide security and auth to consul-template, we use
a custom HTTP dialer which is passed to consul-template when
setting up the runner. This method follows Vault implementation.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-04-06 19:17:05 +02:00
Seth Hoenig
1764a4da4c Merge pull request #12446 from shoenig/no-pkg-err
cleanup: purge github.com/pkg/errors
2022-04-04 09:22:44 -05:00
Tim Gross
b91d0e73cb E2E: ensure that CSI EBS tests are isolated from each other (#12443)
Tear down the volume-consuming job between subtests, rather than after
all the tests are complete. For good measure, use a different ID for
the volume-consuming job as well.
2022-04-04 09:44:55 -04:00
Seth Hoenig
6f37b28b87 cleanup: purge github.com/pkg/errors 2022-04-01 19:24:02 -05:00
Tim Gross
43eba61e4f E2E disconnected clients test refactor (#12402)
* Wait longer for node to go down in disconnected clients test.
  The existing helper only waits 10s, but there's a jitter on heartbeats
  that we need to account for. Wait for 30s for node to go down to give
  us plenty of room
* Port disconnected clients to stdlib-style test
2022-03-30 09:12:44 -04:00
Tim Gross
e8da15cae5 E2E: test exercising node drain behavior for CSI volumes (#12384) 2022-03-29 11:19:23 -04:00
Tim Gross
37d831712f E2E: namespace HCP vault and consul policies to avoid collisions (#12386)
Concurrent E2E runs can collide when provisioning policies on HCP
Consul and HCP Vault. Namespace these by the test run name, as we do
for most everything else.
2022-03-25 16:05:59 -04:00
Tim Gross
4d6ea8c593 E2E: move example test to use golangs stdlib test runner (#12383)
Our E2E "framework" has a bunch of features around test discovery and
standing up infra that were never completed or fully used, and we
ended up building out a large test suite that ignored all that in lieu
of Terraform-provided infrastructure for the last couple years.

This changeset is a proposal (and demonstration) for gradually
migrating our E2E tests off the framework code so that developers can
write fairly ordinary golang stdlib testing tests.
2022-03-25 14:44:16 -04:00
Tim Gross
a156ca7b71 e2e: test for allocations replacement on disconnected clients (#12375)
This test exercises the behavior of clients that become disconnected
and have their allocations replaced. Future test cases will exercise
the `max_client_disconnect` field on the job spec.
2022-03-25 12:26:43 -04:00
Tim Gross
02d26ceb1a CSI: set plugin CSI_ENDPOINT env var only if unset by user (#12257)
* Use unix:// prefix for CSI_ENDPOINT variable by default
* Some plugins have strict validation over the format of the
  `CSI_ENDPOINT` variable, and unfortunately not all plugins
  agree. Allow the user to override the `CSI_ENDPOINT` to workaround
  those cases.
* Update all demos and tests with CSI_ENDPOINT
2022-03-21 11:48:47 -04:00
Tim Gross
c3f53fa8d0 E2E: ensure ConnectACLsE2ETest has clean state before starting (#12334)
The `ConnectACLsE2ETest` checks that the SI tokens have been properly
cleaned up between tests, but following the change to use HCP the
previous `Connect` test suite will often have SI tokens that haven't
been cleaned up by the time this test suite runs. Wait for the SI
tokens to be cleaned up at the start of the test to ensure we have a
clean state.
2022-03-21 11:05:02 -04:00
Tim Gross
020fa6f8ba E2E with HCP Consul/Vault (#12267)
Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits:

* Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting.
* Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting
* Our E2E exercises HCP, which is important to us as an organization
* With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner.
* Packer builds for Linux and Windows become much simpler.

tl;dr way less janky shell scripting!
2022-03-18 09:27:28 -04:00
Seth Hoenig
bee4c07fe2 ci: missing import for nomad09upgrade 2022-03-17 08:49:15 -05:00
Seth Hoenig
7e6767ddf8 e2e: have e2e use ci.Parallel
This is a followup to having tests run in serial in CI.

The e2e package isn't in CI, but lets use the helper anyway
so we can setup semgrep rules covering the entire repository.
2022-03-17 08:37:34 -05:00
Tim Gross
bc40222e3e csi: add pagination args to volume snapshot list (#12193)
The snapshot list API supports pagination as part of the CSI
specification, but we didn't have it plumbed through to the command
line.
2022-03-07 12:19:28 -05:00
Tim Gross
711a9d9a8f csi: volume snapshot list plugin option is required (#12197)
The RPC for listing volume snapshots requires a plugin ID. Update the
`volume snapshot list` command to find the specific plugin from the
provided prefix.
2022-03-07 09:58:29 -05:00
Tim Gross
67a6ba5e02 e2e: use context for executing external commands (#12185)
If any E2E test hangs, it'll eventually timeout and panic, causing the
all the remaining tests to fail. External commands should use a short
context whenever possible so we can fail the test quickly and move on
to the next test.
2022-03-04 08:55:36 -05:00
Tim Gross
a69bb6bd3b e2e: StopJob should tolerate progress deadline expired (#12179)
The `TestRescheduleProgressDeadlineFail` E2E test failed during test
cleanup because the error message "progress deadline expired" that it
emits when we stop the job does not match the one expected from
monitoring the `job stop` command. Update the `StopJob` helper to
tolerate this use case as well.
2022-03-04 08:55:22 -05:00
Tim Gross
31b7de78fd e2e: configure prometheus for mTLS for Metrics suite (#12181)
The `Metrics` suite uses prometheus to scrape Nomad metrics so that
we're testing the full user experience of extracting metrics from
Nomad. With the addition of mTLS, we need to make sure prometheus also
has mTLS configuration because the metrics endpoint is protected.

Update the Nomad client configuration and prometheus job to bind-mount
the client's certs into the task so that the job can use these certs
to scrape the server. This is a temporary solution that gets the job
passing; we should give the job its own certificates (issued by
Vault?) when we've done some of the infrastructure rework we'd like.
2022-03-04 08:55:06 -05:00