Typically the `LOGNAME` environment variable should be set according
to the values within `/etc/passwd` and represents the name of the
logged in user. This should be set, where possible, alongside the
USER and HOME variables for all drivers that use the shared
executor and do not use a sub-shell.
* docs: revert to labels={"foo.bar": "baz"} style
Back in #24074 I thought it was necessary to wrap labels in a list to
support quoted keys in hcl2. This... doesn't appear to be true at all?
The simpler `labels={...}` syntax appears to work just fine.
I updated the docs and a test (and modernized it a bit). I also switched
some other examples to the `labels = {}` format from the old `labels{}`
format.
* copywronged
* fmtd
the executor dies, leaving an orphaned process still running.
the panic fix:
* don't `panic()`
* and return an empty, but non-nil, func on cgroup error
feature fix:
* allow non-root agent to proceed with exec when cgroups are off
In #25963 we added normalization of CPU shares for large hosts where the total
compute was larger than the maximum CPU shares. But if the result after
normalization is less than 2, runc will have an integer overflow. We prevent
this in the shared executor for the `exec`/`rawexec` driver by clamping to the
safe minimum value. Do this for the `docker` driver as well and add test
coverage of it for the shared executor too.
Fixes: https://github.com/hashicorp/nomad/issues/26080
Ref: https://github.com/hashicorp/nomad/pull/25963
The `resources.cpu` field is scheduled in MHz. On most Linux task drivers, this
value is then mapped to a `cpu.share` (cgroups v1) or `cpu.weight` (cgroups
v2). But this means on very large hosts where the total compute is greater than
the Linux kernel defined maximum CPU shares, you can't set a `resources.cpu`
value large enough to consume the entire host.
The `cpu.share`/`cpu.weight` value is relative within the parent cgroup's slice,
which is owned by Nomad. So we can fix this by re-normalizing the weight on very
large hosts such that the maximum `resources.cpu` matches up with largest
possible CPU share. This happens in the task driver so that the rest of Nomad
doesn't need to be aware of this implementation detail. Note that these functions
will result in bad share config if the request is more than the available, but that's
supposed to be caught in the scheduler so by not catching it here we intentionally
hit the runc error.
Fixes: https://hashicorp.atlassian.net/browse/NMD-297
Fixes: https://github.com/hashicorp/nomad/issues/7731
Ref: https://go.hashi.co/rfc/nmd-211
Collecting metrics from processes is expensive, especially on platforms like
Windows. The executor code has a 5s cache of stats to ensure that we don't
thrash syscalls on nodes running many allocations. But the timestamp used to
calculate TTL of this cache was never being set, so we were always treating it
as expired. This causes excess CPU utilization on client nodes.
Ensure that when we fill the cache, we set the timestamp. In testing on Windows,
this reduces exector CPU overhead by roughly 75%.
This changeset includes two other related items:
* The `telemetry.publish_allocation_metrics` field correctly prevents a node
from publishing metrics, but the stats hook on the taskrunner still collects
the metrics, which can be expensive. Thread the configuration value into the
stats hook so that we don't collect if `telemetry.publish_allocation_metrics =
false`.
* The `linuxProcStats` type in the executor's `procstats` package is misnamed as
a result of a couple rounds of refactoring. It's used by all task executors,
not just Linux. Rename this and move a comment about how Windows processes are
listed so that the comment is closer to where the logic is implemented.
Fixes: https://github.com/hashicorp/nomad/issues/23323
Fixes: https://hashicorp.atlassian.net/browse/NMD-455
When the context closes, the stats emitter closes its channel. It's possible
for the channel to be closed in the stats emitter goroutine before the `select`
in the test sees that the context has closed, which can result in a panic in the
test when we try to read the empty value off the channel.
As of April 1, Docker Hub rate limits tightened. With only 10 pulls/hr/IP, we're
likely to encounter test failures. Switch all Docker images getting pulled from
this repository to use the HashiCorp managed registry mirror.
Note that most of our tests in `drivers/docker` don't pull from the remote
registry but load a local image, while others will need to pull from the remote
and fetch different images depending on OS/arch. Refactor the definition of test
task configuration to make it clear which is which, and de-factor some false
sharing of setup functions.
Updates the E2E tests to use that registry by configuring the Docker
daemon. This required changing out a few container images that we don't have in
the registry, but these new images are all smaller. There are a couple of tests
that still use explicitly-tagged `docker.io` images or other third-party
registries, which have been left in place.
Ref: https://hashicorp.atlassian.net/browse/NET-12233
update E2E images to those in the registry mirror
fix windows and docklog test build
fix stopsignal test
mop-up
more mop-up
In #25496 we introduced the ability to have `task.user` set for on Windows, so
long as the user ID fits a particular shape. But this uncovered a 7 year old bug
in the `java` driver introduced in #5143, where we set the `task.user` to the
non-existent Unix user `nobody`, even if we're running on Windows.
Prior to the change in #25496 we always ignored the `task.user`, so this was not
a problem. We don't set the `task.user` in the `raw_exec` driver, and the
otherwise very similar `exec` driver is Linux-only, so we never see the problem
there.
Fix the bug in the `java` driver by gating the change to the `task.user` on not
being Windows. Also add a check to the new code path that the user is non-empty
before parsing it, so that any third party drivers that might be borrowing the
executor code don't hit the same probem on Windows.
Ref: https://github.com/hashicorp/nomad/pull/5143
Ref: https://github.com/hashicorp/nomad/pull/25496
Fixes: https://github.com/hashicorp/nomad/issues/25638
* modify rawexec TaskConfig and Config to accept envvar denylist
* update rawexec driver docs to include deniedEnvars options
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
---------
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
Every now and then TestExecutor_OOMKilled would fail with: "unable to start
container process: container init was OOM-killed (memory limit too low?)" which
started happening since we upgraded libcontainer.
This PR removes manual (and arbitrary) resource limits on the test
task, since it should be OOMd with resources inherited from the
testExecutorCommandWithChroot, and it fixes a small possible goroutine leak in
the OOM checker in exec driver.
Nomad driver handles incorrectly set exit code 0 in case of executor failure.
This corrects that behavior.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Docker driver's TestDockerDriver_OOMKilled should run on cgroups v2 now, since
we're running docker v27 client library and our runners run docker v26 that
contain containerd fixcontainerd/containerd#6323.
* fix: fix the docker image parser to account for private repos
* style: change the local regex for docker image indentifiers and use docker package instead
* func: return early when no repo found on the image name
* func: return error if no path found in image
* Update drivers/docker/utils.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Update coordinator.go
* Update driver.go
* Update network.go
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
at least one bug has been created because it's
easy to miss a future.set() in pullImageImpl()
this pulls future.set() out to PullImage(),
the same level where it's created and wait()ed
The recent change to collection via a "one-shot" Docker API call
did not update the stream boolean argument. This results in the
PreCPUStats values being zero and therefore breaking the CPU
calculations which rely on this data. The base fix is to update
the passed boolean parameter to match the desired non-streaming
behaviour. The non-streaming API call correctly returns the
PreCPUStats data which can be seen in the added unit test.
The most recent change also modified the behaviour of the
collectStats go routine, so that any error encountered results in
the routine exiting. In the event this was a transient error, the
container will continue to run, however, no stats will be collected
until the task is stopped and replaced. This PR reverts the
behaviour, so that an error encountered during a stats collection
run results in the error being logged but the collection process
continuing with a backoff used.
* windows: revert process listing logic to that of v1.6.10
In Nomad 1.7 much of the process management code was refactored, including
a rewrite of how the process tree of an executor was determined on Windows
machines. Unfortunately that rewrite has been cursed with performance issues
and bugs. Instead, revert to the logic used in v1.6.10.
* changelog
Recently we moved from github.com/syndtr/gocapability to
github.com/moby/sys/capability due to the former package no longer being
maintainer. The new package's capability function works differently: the
known/supported functionality is split now, and the .ListSupported() call will
always return an empty list on non-linux systems. This means Nomad agents won't
start on darwin or windows.
github.com/moby/sys/capability is a fork of the (no longer maintained)
github.com/syndtr/gocapability package.
For changes since the fork took place, see
https://github.com/moby/sys/blob/main/capability/CHANGELOG.md
Note that the "workaround for RHEL6" is removed for a number of reasons.
Feel free to choose the one you like the most, either is sufficient:
1. /proc/sys/kernel/cap_last_cap is available since RHEL 6.7
(kernel 2.6.32-573.el6), released 9 years ago (2015-07-22).
2. It incorrectly returns CAP_BLOCK_SUSPEND (36), which was only added
in kernel v3.5 and was never backported to RHEL6 kernels. The
correct value for RHEL6 would be CAP_MAC_ADMIN (33).
3. As far as upstream kernels go, /proc/sys/kernel/cap_last_cap was
added in kernel v3.2, and a correct value depends on the kernel
version. It could be CAP_WAKE_ALARM (35), added to kernel v3.0, or
CAP_SYSLOG (34), added to kernel v2.6.38, or possibly a lesser value
for even older kernels.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
* drivers: move executor process out of v1 task cgroup after process starts
This PR changes the behavior of the raw exec task driver on old cgroups v1
systems such that the executor process is no longer a member of the cgroups
created for the task. Now, the executor process is placed into those
cgroups and starts the task child process (just as before), but now then
exits those cgroups and exists in the nomad parent cgroup. This change
makes the behavior sort of similar to cgroups v2 systems, where we never
have the executor enter the task cgroup to begin with (because we can
directly clone(3) the task process into it).
Fixes#23951
* executor: handle non-linux case
* cgroups: add test case for no executor process in task cgroup (v1)
* add changelog
* drivers: also move executor out of cpuset cgroup