Commit Graph

20 Commits

Author SHA1 Message Date
Tim Gross
9cc0e2eae0 upgrade testing: make cluster name prefix a variable (#25281)
During initial development of upgrade testing, we had a hard-coded prefix to
distinguish between clusters created for this vs those created by GHA
runners. Update the prefix to be a variable so that developers can add their own
prefix during test workload development.
2025-03-04 11:11:02 -05:00
Juana De La Cuesta
2dadf9fe6c Improve stability (#25244)
* func: add dependencies to avoid race conditions and move the update to each client to the main upgrade scenario

* Update enos/enos-scenario-upgrade.hcl

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update enos/enos-scenario-upgrade.hcl

Co-authored-by: Tim Gross <tgross@hashicorp.com>

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-03-04 16:23:07 +01:00
Tim Gross
4a62d1b75c upgrade tests: add CSI workload (#25223)
Add an upgrade test workload for CSI with the AWS EFS plugin. In order to
validate this workload, we'll need to deploy the plugin job and then register a
volume with it. So this extends the `run_workloads` module to allow for "pre
scripts" and "post scripts" to be run before and after a given job has been
deployed. We can use that as a model for other test workloads.

Ref: https://hashicorp.atlassian.net/browse/NET-12217
2025-02-27 15:16:04 -05:00
Tim Gross
6ae1444cf4 upgrade testing: debugging assistance (#25232)
Enos buries the Terraform output from provisioning. Add a shell script to load
the environment from provisioning for debugging Nomad during development of
upgrade tests.
2025-02-27 08:35:45 -05:00
Juana De La Cuesta
461d4268e2 func: add python servers to raw exec workloads (#25230) 2025-02-26 18:05:46 +01:00
Juana De La Cuesta
b13132043b Add new workloads (#25106)
* func: Add more workloads

* Update jobs.sh

* Update versions.sh

* style: format

* Update enos/modules/test_cluster_health/scripts/allocs.sh

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* docs: improve outputs descriptions

* func: change docker workloads to be redis boxes and add healthchecks

* func: register the services on consul

* style: format

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-02-26 17:02:27 +01:00
Tim Gross
8c95f5f17e upgrade testing: make sure we capture last error if not exiting (#25186)
While testing #25172 I found a few spots where #25152 wasn't capturing the
errors from transient failures correctly or exiting early instead of
retrying.

Ref: https://hashicorp.atlassian.net/browse/NET-11546
2025-02-24 09:37:17 -05:00
Juana De La Cuesta
0529c0247d Only take one snapshot when upgrading servers (#25187)
* func: add possibility of having different binaries for server and clients

* style: rename binaries modules

* func: remove the check for last configuration log, and only take one snapshot when upgrading the servers

* Update enos/modules/upgrade_servers/main.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-02-24 15:06:16 +01:00
Juana De La Cuesta
4a75d2de63 Adjust the servers to be always linux instances (#25172)
* func: add possibility of having different binaries for server and clients

* style: rename binaries modules

* docs: update comments

* fix: correct the token input variable for fetch binaries
2025-02-24 13:09:57 +01:00
Tim Gross
73cd934e1a upgrade testing: make script error handling more robust (#25152)
We're using `set -eo pipefail` everywhere in the Enos scripts, several of the
scripts used for checking assertions didn't take advantage of pipefail in such a
way that we could avoid early exits from transient errors. This meant that if a
server was slightly late to come back up, we'd hit an error and exit the whole
script instead of polling as expected.

While fixing this, I've made a number of other improvements to the shell scripts:

* I've changed the design of the polling loops so that we're calling a function
that returns an exit code and sets `last_error` value, along with any global
variables required by downstream functions. This makes the loops more readable
by reducing the number of global variables, and helped identify some places
where we're exiting instead of returning into the loop.

* Using `shellcheck -s bash` I fixes some unused variables and undefined
variables that we were missing because they were only used on the error paths.
2025-02-20 08:44:35 -05:00
Juana De La Cuesta
af735dce16 F net 11478 enos versions (#25092)
* fix: change the value of the version used for testing to account for ent versions

* func: add more specific test for servers stability

* func: change the criteria we use to verify the cluster stability after server upgrades

* style: syntax
2025-02-13 10:32:43 +01:00
Tim Gross
f0d3c2834e upgrade testing: add README and fix authorization header (#25059)
Add a README describing the setup required for running upgrade testing via
Enos. Also fix the authorization header of our `wget` to use the proper header
for short-lived tokens, and the output path variable of the artifactory step.

Co-authored-by: Juanadelacuesta <8647634+Juanadelacuesta@users.noreply.github.com>
2025-02-12 08:56:47 -05:00
Juana De La Cuesta
c5d74a96a3 Add module to upgrade clients (#25055)
* func: add module to upgrade clients

* func: add polling to verify the metadata to make sure all clients are up

* style: remove unused code

* fix: Give the allocations a little time to get to the expected number on teh test health check, to avoid possible flaky tests in the future

* fix: set the upgrade version as clients version for the last health check
2025-02-10 17:03:54 +01:00
Juana De La Cuesta
cae81182dd fix: refactor to avoid flakiness (#25047) 2025-02-10 10:53:39 +01:00
Tim Gross
d0a6424844 enos: improve documentation around required variables (#25051)
The variables definitions for Enos upgrade scenarios have a couple of unused
variables and some of the documentation strings are ambiguous:

* `nomad_region` and `binary_local_path` variables are unused and can be removed.
* `nomad_local_binary` refers to the directory where the binaries will be
  download, not the binaries themselves. Rename to make it clear this belongs to
  the artifactory fetch and not the provisioning step (which uses the
  artifactory fetch outputs).
2025-02-07 11:35:50 -05:00
Juana De La Cuesta
cf0a046364 Module to upgrade servers (#24971)
* func: add initial enos skeleton

* style: add headers

* func: change the variables input to a map of objects to simplify the workloads creation

* style: formating

* Add tests for servers and clients

* style: separate the tests in diferent scripts

* style: add missing headers

* func: add tests for allocs

* style: improve output

* func: add step to copy remote upgrade version

* style: hcl formatting

* fix: remove the terraform nomad provider

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: add missing license headers

* style: hcl fmt

* style: rename variables and fix format

* func: remove the template step on the workloads module and chop the noamd token output on the provide module

* fix: correct the jobspec path on the workloads module

* fix: add missing variable definitions on job specs for workloads

* style: formatting

* fix: Add clean token to remove extra new line added in provision

* func: add module to upgrade servers

* style: missing headers

* func: add upgrade module

* func: add install for windows as well

* func: add an intermediate module that runs the upgrade server for each server

* fix: add missing license headers

* fix: remove extra input variables and connect upgrade servers to the scenario

* fix: rename missing env variables for cluster health scripts

* func: move the cluster health test outside of the modules and into the upgrade scenario

* fix: fix the regex to ignore snap files on the gitignore file

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: remove extra input variables and connect upgrade servers to the scenario

* style: formatting

* fix: move taken and restoring snapshots out of the upgrade_single_server to avoid possible race conditions

* fix: rename variable in health test

* fix: Add clean token to remove extra new line added in provision

* func: add an intermediate module that runs the upgrade server for each server

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* func: fix the last_log_index check and add a versions check

* func: done use for_each when upgrading the servers, hardcodes each one to ensure they are upgraded one by one

* Update enos/modules/upgrade_instance/variables.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update enos/modules/upgrade_instance/variables.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update enos/modules/upgrade_instance/variables.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* func: make snapshot by calling every server and allowing stale data

* style: formatting

* fix: make the source for the upgrade binary unknow until apply

* func: use enos bundle to install remote upgrade version, enos_files is not meant for dynamic files

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-02-07 10:26:03 +01:00
Juana De La Cuesta
caeee0f238 Fix the last_log_index check and add a versions check (#24989)
* func: fix the last_log_index check and add a versions check

* fix: add small window to consider raft index equal
2025-02-05 10:34:11 +01:00
Juana De La Cuesta
3861c40220 func: add initial enos skeleton (#24787)
* func: add initial enos skeleton

* style: add headers

* func: change the variables input to a map of objects to simplify the workloads creation

* style: formating

* Add tests for servers and clients

* style: separate the tests in diferent scripts

* style: add missing headers

* func: add tests for allocs

* style: improve output

* func: add step to copy remote upgrade version

* style: hcl formatting

* fix: remove the terraform nomad provider

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: Add clean token to remove extra new line added in provision

* fix: add missing license headers

* style: hcl fmt

* style: rename variables and fix format

* func: remove the template step on the workloads module and chop the noamd token output on the provide module

* fix: correct the jobspec path on the workloads module

* fix: add missing variable definitions on job specs for workloads

* style: formatting

* fix: rename variable in health test
2025-01-30 16:37:55 +01:00
Juana De La Cuesta
039da61d8f [F-net-11478] Make keys directory cluster grouped (#24883)
* func: make windows arch dependant

* func: unify keys and make them cluster grouped

* Update README.md

* Update e2e/terraform/provision-infra/provision-nomad/variables.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update .gitignore

* style: add an output with the custer identifier

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-01-20 10:18:38 +01:00
Juana De La Cuesta
b29a3736a4 Update e2e infra provision to expect providers (#24694)
* func: move infra provisionining to a module and remove providers

* func: update paths

* func: update more paths

* func: update path inside bootstrap scrip

* style: remove debug prints on bootstrap scripts

* Delete e2e/terraform/csi/input/volume-efs.hcl

* fix: update keys path to use module path instead pf root

* fix: add missing headers

* fix: update keys directory inside provision-nomad

* style; format hcl files

* Update compute.tf

* Update e2e/terraform/main.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* Update e2e/terraform/provision-infra/compute.tf

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* fix: update more paths

* fix: fmt hcl files

* func: final paths revision for running e2e locally

* fix: make path of certs relative to module for the bootstrap

* func: final paths revision for running e2e locally

* Update network.tf

* fix: fix typo and add success message

* fix: remove the test name from token to avoid long names and use name for vol to avoid colisions

* func: unify the uploads folder

* func: make the uploads file one per cluster

* func: Add outputs with all data necessary to connect to the cluster

* fix: make nomad token a sensitive output

* Update bootstrap-nomad.sh

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2025-01-13 15:59:40 +01:00