If multiple dynamic host volumes are created in quick succession, it's possible
for the server to attempt placement on a host where another volume has been
placed but not yet fingerprinted as ready. Once a `VolumeCreate` RPC returns a
response, we've already invoked the plugin successfully and written to state, so
we're just waiting on the fingerprint for scheduling purposes. Change the
placement selection so that we skip a node if it has a volume, regardless of
whether that volume is ready yet.
Prerelease builds are in a different Artifactory repository than release
builds. Make this a variable option so we can test prerelease builds in the
nightly/weekly runs.
if the auth-url api is getting DOS'd,
then we do not expect it to still function;
we only protect the rest of the system.
users will need to use a break-glass ACL
token if they need Nomad UI/API access
during such a denial of service.
* docs: fix missing api version on acl auth method path
* docs: fix missing api version on acl binding rules path
* docs: fix missing api version on acl policies path
* docs: fix missing api version on acl roles path
* docs: fix missing api version on acl tokens path
Errors from `volume create` or `volume delete` only get logged by the client
agent, which may make it harder for volume authors to debug these tasks if they
are not also the cluster administrator with access to host logs.
Allow plugins to include an optional error message in their response. Because we
can't count on receiving this response (the error could come before the plugin
executes), we parse this message optimistically and include it only if
available.
Ref: https://hashicorp.atlassian.net/browse/NET-12087
Add an upgrade test workload for that continuously writes to a Nomad
Variable. In order to run this workload, we'll need to deploy a
Workload-Associated ACL policy. So this extends the `run_workloads` module to
allow for a "pre script" to be run before a given job is deployed. We can use
that as a model for other test workloads.
Ref: https://hashicorp.atlassian.net/browse/NET-12217
* DHV UI init
* /csi routes to /storage routes and a routeRedirector util (#25163)
* /csi routes to /storage routes and a routeRedirector util
* Tests and routes move csi/ to storage/
* Changelog added
* [ui] Storage UI overhaul + Dynamic Host Volumes UI (#25226)
* Storage index page and DHV model properties
* Naive version of a storage overview page
* Experimental fetch of alloc data dirs
* Fetch ephemeral disks and static host volumes as an ember concurrency task and nice table stylings
* Playing nice with section header labels to make eslint happy even though wcag was already cool with it
* inlined the storage type explainers and reordered things, plus tooltips and keynav
* Bones of a dynamic host volume individual page
* Woooo dynamic host volume model, adapter, and serializer with embedded alloc relationships
* Couple test fixes
* async:false relationship for dhv.hasMany('alloc') to prevent a ton of xhr requests
* DHV request type at index routemodel and better serialization
* Pagination and searching and query params oh my
* Test retrofits for csi volumes
* Really fantastic flake gets fixed
* DHV detail page acceptance test and a bunch of mirage hooks
* Seed so that the actions test has a guaranteed task
* removed ephemeral disk and static host volume manual scanning
* CapacityBytes and capabilities table added to DHV detail page
* Debugging actions flyout test
* was becoming clear that faker.seed editing was causing havoc elsewhere so might as well not boil the ocean and just tell this test to do what I want it to
* Post-create job gets taskCount instead of count
* CSI volumes now get /csi route prefix at detail level
* lazyclick method for unused keynav removed
* keyboard nav and table-watcher for DHV added
* Addressed PR comments, changed up capabilities table and id references, etc.
* Capabilities table for DHV and ID in details header
* Testfixes for pluginID and capabilities table on DHV page
PKCE is enabled by default for new/updated auth methods.
* ref: https://oauth.net/2/pkce/
Client assertions are an optional, more secure replacement for client secrets
* ref: https://oauth.net/private-key-jwt/
a change to the existing flow, even without these new options,
is that the oidc.Req is retained on the Nomad server (leader)
in between auth-url and complete-auth calls.
and some fields in auth method config are now more strictly required.
When a task included a template block, Nomad was adding a Consul
identity by default which allowed the template to use Consul API
template functions even when they were not needed or desired.
This change removes the implict addition of Consul identities to
tasks when they include a template block. Job specification
authors will now need to add a Consul identity or Consul block to
their task if they have a template which uses Consul API functions.
This change also removes the default addition of a Consul block to
all task groups registered and processed by the API package.
When Nomad registers a service within Consul it is regarded as a
node service. In order for Nomad workloads to read these services,
it must have an ACL policy which includes node_prefix read. If it
does not, the service is filtered out from the result.
This change adds the required permission to the Consul setup
command.
Add an upgrade test workload for Consul service mesh with transparent
proxy. Note this breaks from the "countdash" demo. The dashboard application
only can verify the backend is up by making a websocket connection, which we
can't do as a health check, and the health check it exposes for that purpose
only passes once the websocket connection has been made. So replace the
dashboard with a minimal nginx reverse proxy to the count-api instead.
Ref: https://hashicorp.atlassian.net/browse/NET-12217
* Basic implementation for server members and node status
* Commands for alloc status and job status
* -ui flag for most commands
* url hints for variables
* url hints for job dispatch, evals, and deployments
* agent config ui.cli_url_links to disable
* Fix an issue where path prefix was presumed for variables
* driver uncomment and general cleanup
* -ui flag on the generic status endpoint
* Job run command gets namespaces, and no longer gets ui hints for --output flag
* Dispatch command hints get a namespace, and bunch o tests
* Lots of tests depend on specific output, so let's not mess with them
* figured out what flagAddress is all about for testServer, oof
* Parallel outside of test instances
* Browser-opening test, sorta
* Env var for disabling/enabling CLI hints
* Addressing a few PR comments
* CLI docs available flags now all have -ui
* PR comments addressed; switched the env var to be consistent and scrunched monitor-adjacent hints a bit more
* ui.Output -> ui.Warn; moves hints from stdout to stderr
* isTerminal check and parseBool on command option
* terminal.IsTerminal check removed for test-runner-not-being-terminal reasons
When a CSI plugin is launched, we probe it until the csi_plugin.health_timeout
expires (by default 30s). But if the plugin never becomes healthy, we're not
restarting the task as documented.
Update the plugin supervisor to trigger a restart instead. We still exit the
supervisor loop at that point to avoid having the supervisor send probes to a
task that isn't running yet. This requires reworking the poststart hook to allow
the supervisor loop to be restarted when the task restarts.
In doing so, I identified that we weren't respecting the task kill context from
the post start hook, which would leave the supervisor running in the window
between when a task is killed because it failed and its stop hooks were
triggered. Combine the two contexts to make sure we stop the supervisor
whichever context gets closed first.
Fixes: https://github.com/hashicorp/nomad/issues/25293
Ref: https://hashicorp.atlassian.net/browse/NET-12264
The check to read back node metadata depends on a resource that waits for the
Nomad API, but that resource doesn't wait for the metadata to be written in the
first place (and the client subsequently upgraded). Add this dependency so that
we're reading back the node metadata as the last step.
Ref: https://github.com/hashicorp/nomad-e2e/actions/runs/13690355150/job/38282457406