nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
James Rasell	d3e077a78e	enos: Modify Windows TF variable to match new 2022 value. (#26067 )	2025-06-17 08:13:36 +01:00
Tim Gross	e4d2fc93cd	upgrade testing: temporarily disable CSI workload (#25589 ) The CSI workload we're using for upgrade testing seems to be flaky to come up. The plugin jobs don't launch in a timely fashion despite several attempts. In order to not block running the rest of the upgrade testing, let's disable this workload temporarily. We'll fix this in NET-12430. Ref: https://hashicorp.atlassian.net/browse/NET-12430	2025-04-03 08:53:20 -04:00
Juanadelacuesta	82fcc62c46	func: add verification for allocs correctly reattaching after client restarts	2025-03-21 15:14:00 +01:00
Juana De La Cuesta	9b9d16421e	Merge branch 'main' into NET-11546-enos-drain	2025-03-17 16:14:18 +01:00
Juanadelacuesta	cfd4ee1756	fix: add missing variables for drain module	2025-03-14 17:57:26 +01:00
Juanadelacuesta	fba2efa728	func: add a step to drain a node as part of the upgrade process	2025-03-14 17:43:36 +01:00
Juanadelacuesta	4b0903789e	func: add check script for vault workload	2025-03-14 17:03:35 +01:00
Juanadelacuesta	4c1ba45d48	func: add workload to test vault workload identity	2025-03-13 17:55:59 +01:00
Tim Gross	8cf34bde62	upgrade testing: allow configurable artifactory repo (#25350 ) Prerelease builds are in a different Artifactory repository than release builds. Make this a variable option so we can test prerelease builds in the nightly/weekly runs.	2025-03-13 10:32:02 -04:00
Tim Gross	61bbff9c24	upgrade testing: Variables, Workload Identity, and Task API (#25229 ) Add an upgrade test workload for that continuously writes to a Nomad Variable. In order to run this workload, we'll need to deploy a Workload-Associated ACL policy. So this extends the `run_workloads` module to allow for a "pre script" to be run before a given job is deployed. We can use that as a model for other test workloads. Ref: https://hashicorp.atlassian.net/browse/NET-12217	2025-03-11 08:48:40 -04:00
Tim Gross	5cc1b4e606	upgrade tests: add transparent proxy workload (#25176 ) Add an upgrade test workload for Consul service mesh with transparent proxy. Note this breaks from the "countdash" demo. The dashboard application only can verify the backend is up by making a websocket connection, which we can't do as a health check, and the health check it exposes for that purpose only passes once the websocket connection has been made. So replace the dashboard with a minimal nginx reverse proxy to the count-api instead. Ref: https://hashicorp.atlassian.net/browse/NET-12217	2025-03-07 15:25:26 -05:00
Tim Gross	916fe2c7fa	upgrade testing: rework CSI test to use self-contained workload (#25285 ) Getting the CSI test to work with AWS EFS or EBS has proven to be awkward because we're having to deal with external APIs with their own consistency guarantees, as well as challenges around teardown. Make the CSI test entirely self-contained by using a userland NFS server and the rocketduck CSI plugin. Ref: https://hashicorp.atlassian.net/browse/NET-12217 Ref: https://gitlab.com/rocketduck/csi-plugin-nfs	2025-03-05 11:48:19 -05:00
Tim Gross	7a051991bd	upgrade testing: temporarily disable CSI test (#25283 ) The CSI workload is failing and creating complications for teardown, so I'm reworking it. But this work is taking a while to finish, so while that's in progress let's disable the CSI workload so that we're running the upgrade tests all the way through to the end. I expect to be able to revert this in the next couple days.	2025-03-04 11:21:45 -05:00
Tim Gross	9cc0e2eae0	upgrade testing: make cluster name prefix a variable (#25281 ) During initial development of upgrade testing, we had a hard-coded prefix to distinguish between clusters created for this vs those created by GHA runners. Update the prefix to be a variable so that developers can add their own prefix during test workload development.	2025-03-04 11:11:02 -05:00
Juana De La Cuesta	2dadf9fe6c	Improve stability (#25244 ) * func: add dependencies to avoid race conditions and move the update to each client to the main upgrade scenario * Update enos/enos-scenario-upgrade.hcl Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update enos/enos-scenario-upgrade.hcl Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-03-04 16:23:07 +01:00
Tim Gross	4a62d1b75c	upgrade tests: add CSI workload (#25223 ) Add an upgrade test workload for CSI with the AWS EFS plugin. In order to validate this workload, we'll need to deploy the plugin job and then register a volume with it. So this extends the `run_workloads` module to allow for "pre scripts" and "post scripts" to be run before and after a given job has been deployed. We can use that as a model for other test workloads. Ref: https://hashicorp.atlassian.net/browse/NET-12217	2025-02-27 15:16:04 -05:00
Juana De La Cuesta	b13132043b	Add new workloads (#25106 ) * func: Add more workloads * Update jobs.sh * Update versions.sh * style: format * Update enos/modules/test_cluster_health/scripts/allocs.sh Co-authored-by: Tim Gross <tgross@hashicorp.com> * docs: improve outputs descriptions * func: change docker workloads to be redis boxes and add healthchecks * func: register the services on consul * style: format --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-02-26 17:02:27 +01:00
Juana De La Cuesta	4a75d2de63	Adjust the servers to be always linux instances (#25172 ) * func: add possibility of having different binaries for server and clients * style: rename binaries modules * docs: update comments * fix: correct the token input variable for fetch binaries	2025-02-24 13:09:57 +01:00
Tim Gross	73cd934e1a	upgrade testing: make script error handling more robust (#25152 ) We're using `set -eo pipefail` everywhere in the Enos scripts, several of the scripts used for checking assertions didn't take advantage of pipefail in such a way that we could avoid early exits from transient errors. This meant that if a server was slightly late to come back up, we'd hit an error and exit the whole script instead of polling as expected. While fixing this, I've made a number of other improvements to the shell scripts: * I've changed the design of the polling loops so that we're calling a function that returns an exit code and sets `last_error` value, along with any global variables required by downstream functions. This makes the loops more readable by reducing the number of global variables, and helped identify some places where we're exiting instead of returning into the loop. * Using `shellcheck -s bash` I fixes some unused variables and undefined variables that we were missing because they were only used on the error paths.	2025-02-20 08:44:35 -05:00
Juana De La Cuesta	af735dce16	F net 11478 enos versions (#25092 ) * fix: change the value of the version used for testing to account for ent versions * func: add more specific test for servers stability * func: change the criteria we use to verify the cluster stability after server upgrades * style: syntax	2025-02-13 10:32:43 +01:00
Juana De La Cuesta	c5d74a96a3	Add module to upgrade clients (#25055 ) * func: add module to upgrade clients * func: add polling to verify the metadata to make sure all clients are up * style: remove unused code * fix: Give the allocations a little time to get to the expected number on teh test health check, to avoid possible flaky tests in the future * fix: set the upgrade version as clients version for the last health check	2025-02-10 17:03:54 +01:00
Juana De La Cuesta	cae81182dd	fix: refactor to avoid flakiness (#25047 )	2025-02-10 10:53:39 +01:00
Tim Gross	d0a6424844	enos: improve documentation around required variables (#25051 ) The variables definitions for Enos upgrade scenarios have a couple of unused variables and some of the documentation strings are ambiguous: * `nomad_region` and `binary_local_path` variables are unused and can be removed. * `nomad_local_binary` refers to the directory where the binaries will be download, not the binaries themselves. Rename to make it clear this belongs to the artifactory fetch and not the provisioning step (which uses the artifactory fetch outputs).	2025-02-07 11:35:50 -05:00
Juana De La Cuesta	cf0a046364	Module to upgrade servers (#24971 ) * func: add initial enos skeleton * style: add headers * func: change the variables input to a map of objects to simplify the workloads creation * style: formating * Add tests for servers and clients * style: separate the tests in diferent scripts * style: add missing headers * func: add tests for allocs * style: improve output * func: add step to copy remote upgrade version * style: hcl formatting * fix: remove the terraform nomad provider * fix: Add clean token to remove extra new line added in provision * fix: Add clean token to remove extra new line added in provision * fix: Add clean token to remove extra new line added in provision * fix: add missing license headers * style: hcl fmt * style: rename variables and fix format * func: remove the template step on the workloads module and chop the noamd token output on the provide module * fix: correct the jobspec path on the workloads module * fix: add missing variable definitions on job specs for workloads * style: formatting * fix: Add clean token to remove extra new line added in provision * func: add module to upgrade servers * style: missing headers * func: add upgrade module * func: add install for windows as well * func: add an intermediate module that runs the upgrade server for each server * fix: add missing license headers * fix: remove extra input variables and connect upgrade servers to the scenario * fix: rename missing env variables for cluster health scripts * func: move the cluster health test outside of the modules and into the upgrade scenario * fix: fix the regex to ignore snap files on the gitignore file * fix: Add clean token to remove extra new line added in provision * fix: Add clean token to remove extra new line added in provision * fix: Add clean token to remove extra new line added in provision * fix: remove extra input variables and connect upgrade servers to the scenario * style: formatting * fix: move taken and restoring snapshots out of the upgrade_single_server to avoid possible race conditions * fix: rename variable in health test * fix: Add clean token to remove extra new line added in provision * func: add an intermediate module that runs the upgrade server for each server * fix: Add clean token to remove extra new line added in provision * fix: Add clean token to remove extra new line added in provision * fix: Add clean token to remove extra new line added in provision * func: fix the last_log_index check and add a versions check * func: done use for_each when upgrading the servers, hardcodes each one to ensure they are upgraded one by one * Update enos/modules/upgrade_instance/variables.tf Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update enos/modules/upgrade_instance/variables.tf Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update enos/modules/upgrade_instance/variables.tf Co-authored-by: Tim Gross <tgross@hashicorp.com> * func: make snapshot by calling every server and allowing stale data * style: formatting * fix: make the source for the upgrade binary unknow until apply * func: use enos bundle to install remote upgrade version, enos_files is not meant for dynamic files --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-02-07 10:26:03 +01:00
Juana De La Cuesta	3861c40220	func: add initial enos skeleton (#24787 ) * func: add initial enos skeleton * style: add headers * func: change the variables input to a map of objects to simplify the workloads creation * style: formating * Add tests for servers and clients * style: separate the tests in diferent scripts * style: add missing headers * func: add tests for allocs * style: improve output * func: add step to copy remote upgrade version * style: hcl formatting * fix: remove the terraform nomad provider * fix: Add clean token to remove extra new line added in provision * fix: Add clean token to remove extra new line added in provision * fix: Add clean token to remove extra new line added in provision * fix: add missing license headers * style: hcl fmt * style: rename variables and fix format * func: remove the template step on the workloads module and chop the noamd token output on the provide module * fix: correct the jobspec path on the workloads module * fix: add missing variable definitions on job specs for workloads * style: formatting * fix: rename variable in health test	2025-01-30 16:37:55 +01:00

25 Commits