The recent change to collection via a "one-shot" Docker API call
did not update the stream boolean argument. This results in the
PreCPUStats values being zero and therefore breaking the CPU
calculations which rely on this data. The base fix is to update
the passed boolean parameter to match the desired non-streaming
behaviour. The non-streaming API call correctly returns the
PreCPUStats data which can be seen in the added unit test.
The most recent change also modified the behaviour of the
collectStats go routine, so that any error encountered results in
the routine exiting. In the event this was a transient error, the
container will continue to run, however, no stats will be collected
until the task is stopped and replaced. This PR reverts the
behaviour, so that an error encountered during a stats collection
run results in the error being logged but the collection process
continuing with a backoff used.
* windows: revert process listing logic to that of v1.6.10
In Nomad 1.7 much of the process management code was refactored, including
a rewrite of how the process tree of an executor was determined on Windows
machines. Unfortunately that rewrite has been cursed with performance issues
and bugs. Instead, revert to the logic used in v1.6.10.
* changelog
Recently we moved from github.com/syndtr/gocapability to
github.com/moby/sys/capability due to the former package no longer being
maintainer. The new package's capability function works differently: the
known/supported functionality is split now, and the .ListSupported() call will
always return an empty list on non-linux systems. This means Nomad agents won't
start on darwin or windows.
github.com/moby/sys/capability is a fork of the (no longer maintained)
github.com/syndtr/gocapability package.
For changes since the fork took place, see
https://github.com/moby/sys/blob/main/capability/CHANGELOG.md
Note that the "workaround for RHEL6" is removed for a number of reasons.
Feel free to choose the one you like the most, either is sufficient:
1. /proc/sys/kernel/cap_last_cap is available since RHEL 6.7
(kernel 2.6.32-573.el6), released 9 years ago (2015-07-22).
2. It incorrectly returns CAP_BLOCK_SUSPEND (36), which was only added
in kernel v3.5 and was never backported to RHEL6 kernels. The
correct value for RHEL6 would be CAP_MAC_ADMIN (33).
3. As far as upstream kernels go, /proc/sys/kernel/cap_last_cap was
added in kernel v3.2, and a correct value depends on the kernel
version. It could be CAP_WAKE_ALARM (35), added to kernel v3.0, or
CAP_SYSLOG (34), added to kernel v2.6.38, or possibly a lesser value
for even older kernels.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
* drivers: move executor process out of v1 task cgroup after process starts
This PR changes the behavior of the raw exec task driver on old cgroups v1
systems such that the executor process is no longer a member of the cgroups
created for the task. Now, the executor process is placed into those
cgroups and starts the task child process (just as before), but now then
exits those cgroups and exists in the nomad parent cgroup. This change
makes the behavior sort of similar to cgroups v2 systems, where we never
have the executor enter the task cgroup to begin with (because we can
directly clone(3) the task process into it).
Fixes#23951
* executor: handle non-linux case
* cgroups: add test case for no executor process in task cgroup (v1)
* add changelog
* drivers: also move executor out of cpuset cgroup