The group utility struct does not support asynchronously launched
goroutines (goroutines-inside-of-goroutines), so switch those uses to a
normal go call.
This means watchNodeUpdates and watchNodeEvents may not be shutdown when
Shutdown() exits. During nomad agent shutdown this does not matter.
During tests this means a test may leak those goroutines or be unable to
know when those goroutines have exited.
Since there's no runtime impact and these goroutines do not affect alloc
state syncing it seems ok to risk leaking them.
We were incorrectly returning a 0 duration to the taskrunner when
determining when a task should restart. This would cause tasks to be
restarted immediately, ignoring the restart {} stanza in a users
configuration.
This commit causes us to return the restart duration to the task runner
so it may correctly delay further execution.
This change makes few compromises:
* Looks up the devices associated with tasks at look up time. Given
that `nomad alloc status` is called rarely generally (compared to stats
telemetry and general job reporting), it seems fine. However, the
lookup overhead grows bounded by number of `tasks x total-host-devices`,
which can be significant.
* `client.Client` performs the task devices->statistics lookup. It
passes self to alloc/task runners so they can look up the device statistics
allocated to them.
* Currently alloc/task runners are responsible for constructing the
entire RPC response for stats
* The alternatives for making task runners device statistics aware
don't seem appealing (e.g. having task runners contain reference to hostStats)
* On the alloc aggregation resource usage, I did a naive merging of task device statistics.
* Personally, I question the value of such aggregation, compared to
costs of struct duplication and bloating the response - but opted to be
consistent in the API.
* With naive concatination, device instances from a single device group used by separate tasks in the alloc, would be aggregated in two separate device group statistics.
In state values, we need to be able to distinguish between zero values
(e.g. `false`) and unset values (e.g. `nil`).
We can alternatively use protobuf `oneOf` and nested map to ensure
consistency of fields that are set together, but the golang
representation does not represent that well and introducing a mismatch
between representations. Thus, I opted not to use it.