* Add factory hooks for jobs to have previously stable versions and stopped status
* Since #24973 node-read isn't presupposed and so should regex match only on the common url parts
* Job detail tests for title buttons are now bimodal and default to having previously-stable version in history
* prettier plz
* Breaking a thing on purpose to see if my other broken thing is broken
* continue-on-error set to false to get things red when appropriate
* OK what if continue-on-error=true but we do a separate failure reporting after the fact
* fail-fast are you the magic incantation that I need?
* Re-fix my test now that fast-fail is off
* Fix to server-leader by adding a region first, and always()-append to uploading partition results
* Express failure step lists failing tests so you don't have to click back into ember-exam step
* temporary snapshot and logging for flakey test in service job detail
* Bunch of region and tasklogs test fixups
* using allocStatusDistribution to ensure service job always has a non-queued alloc
This change creates a reusable workflow for notifying Slack on CI
failures. The message will include useful links and information
about the failure, so product engineers can investigate and fix
any problems.
The new workflow is used by selected workflows which trigger on
merges to main or release/* branches. The notification is only
sent on failure and when the event was a push (PR merge) meaning
the number of notifications should be minimal.
The aim is to help identify and draw attention to failure across
our release branches, in particular when automated processes
happen.
* retain artifacts from test runs including test timing
* Pinning commit hashes for action helpers
* trigger for ui-test run
* Trying to isolate down to a simple upload
* Once more with mkdir
* What if we just wrote our own test reporter tho
* Let the partitioned runs handle placement
* Filter out common token logs, add a summary at the end, and note failures in logtime
* Custom reporter cannot also have an output file, he finds out two days late
* Aggregate summary, duration, and removing failure case
* Conditional test report generation
* Timeouts are errors
* Trying with un-partitioned input json file
* Remove the commented-out lines for main-only runs
* combine-ui-test-results as its own script
Trusted Supply Chain Component Registry (TSCCR) enforcement starts Monday and an
internal report shows our semgrep action is pinned to a version that's not
currently permitted. Update all the action versions to whatever's the new
hotness to maximum the time-to-live on these until we have automated pinning
setup.
Also version bumps our chromedriver action, which randomly broke upstream today.
so in enterprise we can use Vault for secrets,
without merge conflicts from oss->ent.
also:
* use hashicorp/setup-golang
* setup-js for self-hosted runners
they don't come with yarn, nor chrome,
and might not always match node version.
Since the matrix exercises different test cases, it's better to allow
all partitions to completely run, even if one of them fails, so it's
easier to catch multiple test failures.
This seems to finish in about 20 minutes, or run for 6+ hours until hitting
a default timeout. Set a timeout to 30 minutes so we aren't wasting time
and runners.
* Exam to parallelize tests
* Logging to try to solve test flakiness
* Logging in another failure
* Hardening for one test and snapshot for another
* Explicitly set the first one as the servicedAlloc instead of randomly picking
* A wild CircleCI test failure appears
* de-log
namely, these workflows:
test-e2e, test-ui, and test-windows
extra-curricularly, as part of the overall
migration effort company-wide, this also includes
some standardization such as:
* explicit permissions:read on various workflows
* pinned action version shas (per https://github.com/hashicorp/security-public-tsccr)
* actionlint, which among other things runs
shellcheck on GHA run steps
Co-authored-by: emilymianeil <eneil@hashicorp.com>
Co-authored-by: Daniel Kimsey <daniel.kimsey@hashicorp.com>