8 Commits

Author SHA1 Message Date
Tim Gross
6c9f2fdd29 reduce upgrade testing flakes (#25839)
This changeset includes several adjustments to the upgrade testing scripts to
reduce flakes and make problems more understandable:

* When a node is drained prior to the 3rd client upgrade, it's entirely
  possible the 3rd client to be upgraded is the drained node. This results in
  miscounting the expected number of allocations because many of them will be
  "complete" (service/batch) or "pending" (system). Leave the system jobs running
  during drains and only count the running allocations at that point as the
  expected set. Move the inline script that gets this count into a script file for
  legibility.

* When the last initial workload is deployed, it's possible for it to be
  briefly still in "pending" when we move to the next step. Poll for a short
  window for the expected count of jobs.

* Make sure that any scripts that are being run right after a server or client
 is coming back up can handle temporary unavailability gracefully.

* Change the debugging output of several scripts to avoid having the debug
  output run into the error message (Ex. "some allocs are not running" looked like
  the first allocation running was the missing allocation).

* Add some notes to the README about running locally with `-dev` builds and
  tagging a cluster with your own name.

Ref: https://hashicorp.atlassian.net/browse/NMD-162
2025-05-13 08:40:22 -04:00
Juanadelacuesta
ebeb3047c8 docs: add note about workloads life expectancy 2025-03-12 16:51:03 +01:00
Juana De La Cuesta
3de2a6b1d6 Update README.md 2025-03-11 17:51:56 +01:00
Juana De La Cuesta
b1ea04a4d1 Update README.md 2025-03-11 17:50:26 +01:00
Juana De La Cuesta
859f257d32 Update README.md 2025-03-11 17:48:45 +01:00
Juanadelacuesta
08f386e8e5 docs: Add section of readme to add workloads 2025-03-11 17:48:14 +01:00
Tim Gross
6ae1444cf4 upgrade testing: debugging assistance (#25232)
Enos buries the Terraform output from provisioning. Add a shell script to load
the environment from provisioning for debugging Nomad during development of
upgrade tests.
2025-02-27 08:35:45 -05:00
Tim Gross
f0d3c2834e upgrade testing: add README and fix authorization header (#25059)
Add a README describing the setup required for running upgrade testing via
Enos. Also fix the authorization header of our `wget` to use the proper header
for short-lived tokens, and the output path variable of the artifactory step.

Co-authored-by: Juanadelacuesta <8647634+Juanadelacuesta@users.noreply.github.com>
2025-02-12 08:56:47 -05:00