Commit Graph

507 Commits

Author SHA1 Message Date
Tim Gross
02d26ceb1a CSI: set plugin CSI_ENDPOINT env var only if unset by user (#12257)
* Use unix:// prefix for CSI_ENDPOINT variable by default
* Some plugins have strict validation over the format of the
  `CSI_ENDPOINT` variable, and unfortunately not all plugins
  agree. Allow the user to override the `CSI_ENDPOINT` to workaround
  those cases.
* Update all demos and tests with CSI_ENDPOINT
2022-03-21 11:48:47 -04:00
Tim Gross
c3f53fa8d0 E2E: ensure ConnectACLsE2ETest has clean state before starting (#12334)
The `ConnectACLsE2ETest` checks that the SI tokens have been properly
cleaned up between tests, but following the change to use HCP the
previous `Connect` test suite will often have SI tokens that haven't
been cleaned up by the time this test suite runs. Wait for the SI
tokens to be cleaned up at the start of the test to ensure we have a
clean state.
2022-03-21 11:05:02 -04:00
Tim Gross
020fa6f8ba E2E with HCP Consul/Vault (#12267)
Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits:

* Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting.
* Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting
* Our E2E exercises HCP, which is important to us as an organization
* With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner.
* Packer builds for Linux and Windows become much simpler.

tl;dr way less janky shell scripting!
2022-03-18 09:27:28 -04:00
Seth Hoenig
bee4c07fe2 ci: missing import for nomad09upgrade 2022-03-17 08:49:15 -05:00
Seth Hoenig
7e6767ddf8 e2e: have e2e use ci.Parallel
This is a followup to having tests run in serial in CI.

The e2e package isn't in CI, but lets use the helper anyway
so we can setup semgrep rules covering the entire repository.
2022-03-17 08:37:34 -05:00
Tim Gross
bc40222e3e csi: add pagination args to volume snapshot list (#12193)
The snapshot list API supports pagination as part of the CSI
specification, but we didn't have it plumbed through to the command
line.
2022-03-07 12:19:28 -05:00
Tim Gross
711a9d9a8f csi: volume snapshot list plugin option is required (#12197)
The RPC for listing volume snapshots requires a plugin ID. Update the
`volume snapshot list` command to find the specific plugin from the
provided prefix.
2022-03-07 09:58:29 -05:00
Tim Gross
67a6ba5e02 e2e: use context for executing external commands (#12185)
If any E2E test hangs, it'll eventually timeout and panic, causing the
all the remaining tests to fail. External commands should use a short
context whenever possible so we can fail the test quickly and move on
to the next test.
2022-03-04 08:55:36 -05:00
Tim Gross
a69bb6bd3b e2e: StopJob should tolerate progress deadline expired (#12179)
The `TestRescheduleProgressDeadlineFail` E2E test failed during test
cleanup because the error message "progress deadline expired" that it
emits when we stop the job does not match the one expected from
monitoring the `job stop` command. Update the `StopJob` helper to
tolerate this use case as well.
2022-03-04 08:55:22 -05:00
Tim Gross
31b7de78fd e2e: configure prometheus for mTLS for Metrics suite (#12181)
The `Metrics` suite uses prometheus to scrape Nomad metrics so that
we're testing the full user experience of extracting metrics from
Nomad. With the addition of mTLS, we need to make sure prometheus also
has mTLS configuration because the metrics endpoint is protected.

Update the Nomad client configuration and prometheus job to bind-mount
the client's certs into the task so that the job can use these certs
to scrape the server. This is a temporary solution that gets the job
passing; we should give the job its own certificates (issued by
Vault?) when we've done some of the infrastructure rework we'd like.
2022-03-04 08:55:06 -05:00
Tim Gross
0292fd402d e2e: use UUID for CSI idempotency token (#12183)
The AWS EBS plugin appears to use the name field of the volume as an
idempotency token that persists across the entire AWS account, not
just the plugin lifespan.

Also fix the regex for the volume ID, which was originally taken from
the job ID regex but isn't actually the same. This hasn't failed tests
for us because we've always passed in the same volume ID.
2022-03-03 17:00:00 -05:00
Tim Gross
c27d9b1b67 e2e: use operator api for Networking suite validation (#12180)
With mTLS enabled, using `curl` in a bash script for validation
involves having to configure arguments to `curl` based on whether or
not the test infrastructure is using mTLS, whether ACLs are enabled,
etc. Use the new `operator api` command instead to pick up the client
configuration from the test environment automatically.
2022-03-03 15:17:29 -05:00
Tim Gross
03a8d72dba CSI: implement support for topology (#12129) 2022-03-01 10:15:46 -05:00
James Rasell
228be20f22 e2e: moved missed volume test stop command to util helper. 2022-02-02 08:42:58 +01:00
James Rasell
c2b5a5ebd9 e2e: account for new job stop CLI exit behaviour.
PR #11550 changed the job stop exit behaviour when monitoring the
deployment. When stopping a job, the deployment becomes cancelled
and therefore the CLI now exits with status code 1 as it see this
as an error.

This change adds a new utility e2e function that accounts for this
behaviour.
2022-02-01 14:16:37 +01:00
Luiz Aoqui
827cdca490 e2e: enable Consul HTTPS port and always restart Nomad systemd unit 2022-01-18 16:56:26 -05:00
James Rasell
ab9ba35e6a chore: fixup inconsistent method receiver names. (#11704) 2021-12-20 11:44:21 +01:00
Tim Gross
1391c37ef5 hclfmt on some config files (#11611) 2021-12-02 15:25:46 -05:00
Derek Strickland
bfac8d8456 Fix Vault E2E TLS config (#11483)
* Update e2e/terraform configuration for Vault and default to mtls=true
2021-12-02 12:20:09 -05:00
James Rasell
80dcae7216 core: allow setting and propagation of eval priority on job de/registration (#11532)
This change modifies the Nomad job register and deregister RPCs to
accept an updated option set which includes eval priority. This
param is optional and override the use of the job priority to set
the eval priority.

In order to ensure all evaluations as a result of the request use
the same eval priority, the priority is shared to the
allocReconciler and deploymentWatcher. This creates a new
distinction between eval priority and job priority.

The Nomad agent HTTP API has been modified to allow setting the
eval priority on job update and delete. To keep consistency with
the current v1 API, job update accepts this as a payload param;
job delete accepts this as a query param.

Any user supplied value is validated within the agent HTTP handler
removing the need to pass invalid requests to the server.

The register and deregister opts functions now all for setting
the eval priority on requests.

The change includes a small change to the DeregisterOpts function
which handles nil opts. This brings the function inline with the
RegisterOpts.
2021-11-23 09:23:31 +01:00
Luiz Aoqui
89447fb42e Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799)" (#11433) 2021-11-02 17:42:52 -04:00
Charlie Voiselle
8ba714e211 Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799) 2021-10-13 21:23:13 -04:00
Mahmood Ali
6c414cd5f9 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
James Rasell
017c51f9df Merge pull request #11194 from hashicorp/b-fix-e2e-acl-tls-provision
e2e: fix provisioning when ACLs and TLS enabled.
2021-09-17 08:11:10 +02:00
James Rasell
5e25277183 e2e: fix provisioning when ACLs and TLS enabled; no nightly TLS. 2021-09-16 17:15:41 +02:00
James Rasell
e34fa583f9 allow configuration of Docker hostnames in bridge mode (#11173)
Add a new hostname string parameter to the network block which
allows operators to specify the hostname of the network namespace.
Changing this causes a destructive update to the allocation and it
is omitted if empty from API responses. This parameter also supports
interpolation.

In order to have a hostname passed as a configuration param when
creating an allocation network, the CreateNetwork func of the
DriverNetworkManager interface needs to be updated. In order to
minimize the disruption of future changes, rather than add another
string func arg, the function now accepts a request struct along with
the allocID param. The struct has the hostname as a field.

The in-tree implementations of DriverNetworkManager.CreateNetwork
have been modified to account for the function signature change.
In updating for the change, the enhancement of adding hostnames to
network namespaces has also been added to the Docker driver, whilst
the default Linux manager does not current implement it.
2021-09-16 08:13:09 +02:00
Luiz Aoqui
a2274de261 e2e: use absolute path for mTLS env vars (#11126) 2021-09-03 12:59:21 -04:00
James Rasell
2998c4b05f Merge pull request #11098 from hashicorp/b-fixup-all-incorrect-docstrings
chore: fix incorrect docstring formatting.
2021-08-31 09:46:18 +02:00
Mahmood Ali
a144cb31ff Support mTLS clusters for e2e testing (#11092)
This allows us to spin up e2e clusters with mTLS configured for all HashiCorp services, i.e. Nomad, Consul, and Vault. Used it for testing #11089 .

mTLS is disabled by default. I have not updated Windows provisioning scripts yet - Windows also lacks ACL support from before. I intend to follow up for them in another round.
2021-08-30 10:18:16 -04:00
James Rasell
3bffe443ac chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
James Rasell
a8403ac9cf test: update e2e and dev scripts to use cni plugins v1.0.0 2021-08-27 11:14:47 +02:00
Mahmood Ali
2bb03170b6 e2e: Run system jobs on all datacenters (#11060)
Target all e2e datacenters for system and sysbatch e2e tests.  They
require that the system jobs run on all linux clients.

However, the jobs currenly only target `dc1` datacenter, but the nightly
e2e cluster has 4 clients spread in `dc1` and `dc2` datacenters, causing
the tests to fail.

I missed this problem in e2e dev cluster because it only used a single
dc1 datacenter.
2021-08-17 11:01:47 -04:00
Mahmood Ali
141ea605f7 e2e: fix tests
Use basic sleeps in busybox images. busybox are very light, and ping has
permissions complications, and it may fail for network related
issues.
2021-08-03 11:38:35 -04:00
Seth Hoenig
61ee443ee6 core: implement system batch scheduler
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
2021-08-03 10:30:47 -04:00
Mahmood Ali
26685b2efa e2e: wait for allocs and deployments (#10967)
As we moved to using `-detach` for registering jobs, we should wait
until allocs and deployments are created before asserting their
properties.

Fixing `TestNodeDrainIgnoreSystem` and `TestRescheduleProgressDeadlineFail` tests as they seem particularly flaky, failing 9 and 7 times (respectively) in the last two weeks.
2021-07-29 10:52:04 -04:00
Mahmood Ali
8d1a6ea1d5 e2e: use -detach mode when registering jobs with cli (#10877)
Pick up 15d39f0dee but for RegisterFromJobspec:

>  This PR changes the e2e helper thingy to set -detach option
>  when registering a job with the CLI instead of the API. This is
>  necessary for jobs which never become healthy, as the deployment
>  never finishes for failing jobs and the command never returns,
>  causing the test to timeout after 10 minutes.

This case occurs in TestVaultSecrets
2021-07-09 09:25:44 -04:00
Seth Hoenig
15d39f0dee e2e: use -detach mode when registering jobs with cli
This PR changes the e2e helper thingy to set -detach option
when registering a job with the CLI instead of the API. This is
necessary for jobs which never become healthy, as the deployment
never finishes for failing jobs and the command never returns,
causing the test to timeout after 10 minutes.
2021-06-18 12:18:40 -05:00
James Rasell
a7d055a584 Merge pull request #10744 from hashicorp/b-remove-duplicate-imports
chore: remove duplicate import statements
2021-06-11 16:42:34 +02:00
James Rasell
55f0611aec e2e: remove duplicate import statements. 2021-06-11 09:37:23 +02:00
Michael Schurter
57a79de929 e2e: use api.ipify.org
ipv4.icanhazip.com returns ipv6 addresses
2021-06-07 15:12:42 -07:00
Mahmood Ali
9c8f7624c9 remove unused Spark security group rules 2021-06-04 11:49:43 -04:00
Mahmood Ali
f6d503ddd0 e2e: pass nomad_url variable 2021-06-04 10:32:51 -04:00
Mahmood Ali
8d03f4ccbc e2e: NOMAD_VERSION is not set when installing url 2021-06-04 10:31:37 -04:00
Mahmood Ali
b73b136c2a restrict ingress ip 2021-06-04 10:31:35 -04:00
Luiz Aoqui
19792e5a6b e2e: fix terraform output environment command instruction (#10674) 2021-06-01 10:10:12 -04:00
Mahmood Ali
e694000f46 Merge pull request #10657 from hashicorp/b-alloc-exec-closing
Handle `nomad exec` termination events in order
2021-05-25 14:50:58 -04:00
Mahmood Ali
99b8e3191c e2e: Spin clusters with custom url binaries (#10656)
Ease spinning up a cluster, where binaries are fetched from arbitrary
urls.  These could be CircleCI `build-binaries` job artifacts, or
presigned S3 urls.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-05-25 13:47:39 -04:00
Mahmood Ali
b774accc43 e2e: stop suppressing unexpected EOF errors 2021-05-24 13:35:08 -04:00
Tim Gross
bfa8f90ba0 e2e: update TF lockfile 2021-05-18 09:35:57 -04:00
Tim Gross
ef0ebcd59f E2E: remove references to nomad_sha 2021-05-10 16:42:39 -04:00