This helper returns the token as well as the ACL policy, to be used in a later
commit for logging the token info associated with nomad exec invocation.
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.
Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
Noticed that `detected drivers` log line was misleading - when a driver
doesn't fingerprint before timeout, their health status is empty string
`""` which we would mark as detected.
Now, we log all drivers along with their state to ease driver
fingerprint debugging.
I noticed that `watchNodeUpdates()` almost immediately after
`registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5
seconds.
This call is unnecessary and made debugging a bit harder. So here, we
ensure that we only re-register node for new node events, not for
initial registration.
Here we retain 0.8.7 behavior of waiting for driver fingerprints before
registering a node, with some timeout. This is needed for system jobs,
as system job scheduling for node occur at node registration, and the
race might mean that a system job may not get placed on the node because
of missing drivers.
The timeout isn't strictly necessary, but raising it to 1 minute as it's
closer to indefinitely blocked than 1 second. We need to keep the value
high enough to capture as much drivers/devices, but low enough that
doesn't risk blocking too long due to misbehaving plugin.
Fixes https://github.com/hashicorp/nomad/issues/5579
Currently, when logmon fails to reattach, we will retry reattachment to
the same pid until the task restart specification is exhausted.
Because we cannot clear hook state during error conditions, it is not
possible for us to signal to a future restart that it _shouldn't_
attempt to reattach to the plugin.
Here we revert to explicitly detecting reattachment seperately from a
launch of a new logmon, so we can recover from scenarios where a logmon
plugin has failed.
This is a net improvement over the current hard failure situation, as it
means in the most common case (the pid has gone away), we can recover.
Other reattachment failure modes where the plugin may still be running
could potentially cause a duplicate process, or a subsequent failure to launch
a new plugin.
If there was a duplicate process, it could potentially cause duplicate
logging. This is better than a production workload outage.
If there was a subsequent failure to launch a new plugin, it would fail
in the same (retry until restarts are exhausted) as the current failure
mode.
Renewal time was being calculated as 10s+Intn(lease-10s), so the renewal
time could be very rapid or within 1s of the deadline: [10s, lease)
This commit fixes the renewal time by calculating it as:
(lease/2) +/- 10s
For a lease of 60s this means the renewal will occur in [20s, 40s).
Revert "fingerprint Constraints and Affinities have Equals, as set"
This reverts commit 596f16fb5f.
Revert "client tests assert the independent handling of interface and speed"
This reverts commit 7857ac5993.
Revert "structs missed applying a style change from the review"
This reverts commit 658916e327.
Revert "client, structs comments"
This reverts commit be2838d6ba.
Revert "client fingerprint updateNetworks preserves the network configuration"
This reverts commit fc309cb430.
Revert "client_test cleanup comments from review"
This reverts commit bc0bf4efb9.
Revert "client Networks Equals is set equality"
This reverts commit f8d432345b.
Revert "struct cleanup indentation in RequestedDevice Equals"
This reverts commit f4746411ca.
Revert "struct Equals checks for identity before value checking"
This reverts commit 0767a4665e.
Revert "fix client-test, avoid hardwired platform dependecy on lo0"
This reverts commit e89dbb2ab1.
Revert "refactor error in client fingerprint to include the offending data"
This reverts commit a7fed726c6.
Revert "add client updateNodeResources to merge but preserve manual config"
This reverts commit 84bd433c7e.
Revert "refactor struts.RequestedDevice to have its own Equals"
This reverts commit 6897825240.
Revert "refactor structs.Resource.Networks to have its own Equals"
This reverts commit 49e2e6c77b.
Revert "refactor structs.Resource.Devices to have its own Equals"
This reverts commit 4ede9226bb.
Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources"
This reverts commit 49fbaace52.
Revert "add structs.Resources Equals"
This reverts commit 8528a2a2a6.
Revert "test that fingerprint resources are updated, net not clobbered"
This reverts commit 8ee02ddd23.
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
Remove runLaunched tracking as Run is *always* called for killable
TaskRunners. TaskRunners which fail before Run can be called (during
NewTaskRunner or Restore) are not killable as they're never added to the
client's alloc map.