In anticipation of having quotas for dynamic host volumes, we want the user
experience of the storage limits to feel integrated with the other resource
limits. This is currently prevented by reusing the `Resources` type instead of
having a specific type for `QuotaResources`.
Update the quota limit/usage types to use a `QuotaResources` that includes a new
storage resources quota block. The wire format for the two types are compatible
such that we can migrate the existing variables limit in the FSM.
Also fixes improper parallelism in the quota init test where we change working
directory to avoid file write conflicts but this breaks when multiple tests are
executed in the same process.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/2096
* Update who-uses-nomad.mdx
Our new contract with Roblox states that we can't mention anywhere on our sites that they use us.
* Update who-uses-nomad.mdx
Edited the sentence above the companies list to more accurately reflect them.
Also added Target to the list with a link to their case study.
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.
When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail.
If the check fails, destroy the network namespace and try to recreate it from
scratch once. If that fails in the second pass, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).
This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.
Fixes: https://github.com/hashicorp/nomad/issues/24292
Ref: https://github.com/hashicorp/nomad/pull/24650
Adds an additional check in the Keyring.Delete RPC to make sure we're not
trying to delete a key that's been used to encrypt a variable. It also adds a
-force flag for the CLI/API to sidestep that check.
Adds new topics to the event stream for CSI volumes and CSI plugins. We'll emit
event when either is created or deleted, and when CSI volumes are claimed.
Add a new topic to the event stream for host volumes. We'll emit events when a
dynamic host volume is registered or deregistered, and whenever a node
fingerprints with a changed volume.
Ref: https://hashicorp.atlassian.net/browse/NET-11549
The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.
The new datapoints are:
- nomad.client.alloc_hook.prerun.success (counter)
- nomad.client.alloc_hook.prerun.failed (counter)
- nomad.client.alloc_hook.prerun.elapsed (sample)
- nomad.client.task_hook.prestart.success (counter)
- nomad.client.task_hook.prestart.failed (counter)
- nomad.client.task_hook.prestart.elapsed (sample)
The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.
Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
This PR adds Consul Template's executeTemplate function to the denylist by
default, in order to prevent accidental or malicious infinitely recursive
execution.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
A more comprehensive env.denylist that now includes more token, token file and
license variables.
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
In order to help users understand multi-region federated
deployments, this change adds two new sections to the website.
The first expands the architecture page, so we can add further
detail over time with an initial federation page. The second adds
a federation operations page which goes into failure planning and
mitigation.
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
this opens up dispatching parameterized jobs by systems
that do not allow modifying what http request body they send
e.g. these two things are equal:
POST '{"Payload": "'"$(base64 <<< "hello")"'"}' /v1/job/my-job/dispatch
POST 'hello' /v1/job/my-job/dispatch/payload
* escaping newlines is not allowed in go-sockaddr template
* client{} block in client section
* tiny extra clarification that the NOMAD_ADDR is an example
Core scheduler relies on a special table in the state store—the TimeTable—to
figure out which objects can be GC'd. The TimeTable correlates Raft indices
with objects insertion time, a solution we used before most of the objects we
store in the state contained timestamps. This introduced a bit of a memory
overhead and complexity, but most importantly meant that any GC threshold users
set greater than timeTableLimit = 72 * time.Hour was ignored. This PR removes
the TimeTable and relies on object timestamps to determine whether they could
be GCd or not.
* initial content from Daniel's doc
* Add IPv6 support doc to operations section.
* daniel obsessively re-refactors his docs
* Style guide edits
* a few more style nits
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
The parsing of the idempotency_token requires snake case, as it is a URL query parameter and not part of the JSON request body.
See also: 2df473c561/command/agent/http.go (L951)
When using transparent proxy mode with the `connect` block, the UID of the
workload cannot be the same as the UID of the Envoy sidecar (currently 101 in
the default Envoy container image).
Fixes: https://github.com/hashicorp/nomad/issues/23508