Commit Graph

859 Commits

Author SHA1 Message Date
Juana De La Cuesta
a9e7166b6b [gh-24339] Move from streaming stats to polling for docker (#24525)
* fix: dont stream the docker stats, read them one by one

* func: add a NewSafeTicker to the herlper functions

* style: remove commented code
2024-11-21 17:36:53 +01:00
Seth Hoenig
dd396a3900 windows: revert process listing logic to that of v1.6.10 (#24494)
* windows: revert process listing logic to that of v1.6.10

In Nomad 1.7 much of the process management code was refactored, including
a rewrite of how the process tree of an executor was determined on Windows
machines. Unfortunately that rewrite has been cursed with performance issues
and bugs. Instead, revert to the logic used in v1.6.10.

* changelog
2024-11-20 11:20:20 -06:00
Piotr Kazmierczak
5dfb38d806 drivers: fix capabilities on non-linux systems (#24450)
Recently we moved from github.com/syndtr/gocapability to
github.com/moby/sys/capability due to the former package no longer being
maintainer. The new package's capability function works differently: the
known/supported functionality is split now, and the .ListSupported() call will
always return an empty list on non-linux systems. This means Nomad agents won't
start on darwin or windows.
2024-11-13 15:58:25 +01:00
Kir Kolyshkin
d09c8ddf21 deps: switch to moby/sys/capability (#24093)
github.com/moby/sys/capability is a fork of the (no longer maintained)
github.com/syndtr/gocapability package.

For changes since the fork took place, see
https://github.com/moby/sys/blob/main/capability/CHANGELOG.md

Note that the "workaround for RHEL6" is removed for a number of reasons.
Feel free to choose the one you like the most, either is sufficient:

1. /proc/sys/kernel/cap_last_cap is available since RHEL 6.7
   (kernel 2.6.32-573.el6), released 9 years ago (2015-07-22).

2. It incorrectly returns CAP_BLOCK_SUSPEND (36), which was only added
   in kernel v3.5 and was never backported to RHEL6 kernels. The
   correct value for RHEL6 would be CAP_MAC_ADMIN (33).

3. As far as upstream kernels go, /proc/sys/kernel/cap_last_cap was
   added in kernel v3.2, and a correct value depends on the kernel
   version. It could be CAP_WAKE_ALARM (35), added to kernel v3.0, or
   CAP_SYSLOG (34), added to kernel v2.6.38, or possibly a lesser value
   for even older kernels.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-11-11 14:07:31 -05:00
Seth Hoenig
a0ff07393b drivers: provide empty implementations of cgroup helpers for non-root nomad (#24392) 2024-11-07 12:24:37 -06:00
Seth Hoenig
b58abf48c1 drivers: move executor process out of v1 task cgroup after process starts (#24340)
* drivers: move executor process out of v1 task cgroup after process starts

This PR changes the behavior of the raw exec task driver on old cgroups v1
systems such that the executor process is no longer a member of the cgroups
created for the task. Now, the executor process is placed into those
cgroups and starts the task child process (just as before), but now then
exits those cgroups and exists in the nomad parent cgroup. This change
makes the behavior sort of similar to cgroups v2 systems, where we never
have the executor enter the task cgroup to begin with (because we can
directly clone(3) the task process into it).

Fixes #23951

* executor: handle non-linux case

* cgroups: add test case for no executor process in task cgroup (v1)

* add changelog

* drivers: also move executor out of cpuset cgroup
2024-11-07 07:31:38 -06:00
Michael Smithhisler
0f97574eae test: fix rawexec driver unix test imports (#24352) 2024-11-01 12:10:03 -04:00
Michael Smithhisler
658c429d75 Drivers: add work_dir config to exec/raw_exec/java drivers (#24249)
---------

Co-authored-by: wurosh <uros.m.perisic@gmail.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-11-01 11:04:40 -04:00
Juanadelacuesta
80e398bbf7 test: add tests for validateBounds 2024-10-31 14:54:27 +01:00
Juanadelacuesta
8752bb0a65 func: move the user lookup into the validation, it's used everywhere the function is called 2024-10-31 10:34:26 +01:00
Juana De La Cuesta
f1439f54f7 Update drivers/shared/validators/validators.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2024-10-31 09:32:51 +01:00
Juanadelacuesta
3f884bb3fa fix: remove the setConfig and modify the test driver to include idValidator to avoid panics 2024-10-30 17:38:54 +01:00
Juanadelacuesta
f954a1a5e8 fix: remove the setConfig and modify the test driver to include idValidator to avoid panics 2024-10-30 16:16:42 +01:00
Juanadelacuesta
a86e951f03 style: rename DeniedHostGidsStr to reflect refactor 2024-10-30 15:22:50 +01:00
Juanadelacuesta
a90eda628d func: implement mock validator to avoid changes on the rawexec tests 2024-10-30 15:07:47 +01:00
Juanadelacuesta
088417163b fix: add set config to populate idValidator on tests 2024-10-30 13:40:19 +01:00
Juanadelacuesta
445b19ce3e docs: update func docs 2024-10-30 12:35:06 +01:00
Juanadelacuesta
f707a02f4d fix: update test to force recreation of idvalidator 2024-10-30 12:28:59 +01:00
Juanadelacuesta
bba0407250 style: remove unused code and duplicated test 2024-10-30 11:43:04 +01:00
Juanadelacuesta
3fa2717195 style: remove unused code 2024-10-30 11:36:25 +01:00
Juanadelacuesta
a491ceff5f fix: put back MSL license header 2024-10-30 11:25:27 +01:00
Juanadelacuesta
e1a0c7cb43 fix: move exclusive unix test back from driver tests 2024-10-30 11:22:41 +01:00
Juanadelacuesta
9a6d2648c8 style: improve debug logging 2024-10-30 11:21:51 +01:00
Juanadelacuesta
2b9bb7a289 license: change missing file to BUSL 2024-10-30 10:24:35 +01:00
Juanadelacuesta
1751b618e4 func: Add conditional to validation init, to allow for easy testing 2024-10-29 16:45:33 +01:00
Juanadelacuesta
a9a452341c license: update headers to BUSL 2024-10-29 15:54:09 +01:00
Juanadelacuesta
0227788e22 fix: update tests configuration 2024-10-29 15:24:12 +01:00
Juanadelacuesta
0cd1b5ff13 func: move the validation to a dependency and use id sets 2024-10-28 18:59:51 +01:00
Juanadelacuesta
65be613be9 fix: rename test to avoid conflict 2024-10-28 12:17:57 +01:00
Juanadelacuesta
d77dc7dfa4 style: format 2024-10-28 11:46:51 +01:00
Juanadelacuesta
ed04b1bf64 style: remove print 2024-10-28 11:35:03 +01:00
Mike Nomitch
fd7e81dbce Fixing accidental move of helper fn to unix only validators file 2024-10-28 11:15:41 +01:00
Mike Nomitch
c4f2a41da6 Splitting validators unix functions into own file 2024-10-28 11:15:41 +01:00
Mike Nomitch
ff5ab3776c Tweaking user lookup code 2024-10-28 11:15:41 +01:00
Mike Nomitch
e1c226e633 Restructuring IDRange 2024-10-28 11:15:41 +01:00
Mike Nomitch
0fbf592131 moving user out of validators 2024-10-28 11:15:41 +01:00
Mike Nomitch
916af5a948 Moving idrange struct location 2024-10-28 11:15:41 +01:00
Mike Nomitch
9565dde138 Only parsing id ranges once 2024-10-28 11:15:41 +01:00
Mike Nomitch
d0049b1e63 Fixed error in denied_uids spec 2024-10-28 11:15:41 +01:00
Mike Nomitch
6b6a1b5bc4 Fixed windows build error 2024-10-28 11:15:41 +01:00
Mike Nomitch
cf36509474 Removing unnecessary int conversion 2024-10-28 11:15:40 +01:00
Mike Nomitch
9cc3992ca6 Adds ability to restrict uid and gids in exec and raw_exec 2024-10-28 11:15:37 +01:00
Seth Hoenig
b539b54c9e docker: close hijacked write connection when exec ends (#24244) 2024-10-17 11:41:29 -05:00
Seth Hoenig
b18851617f docker: close response connection once stdin is exhausted (#24202) 2024-10-17 11:07:23 -05:00
Piotr Kazmierczak
1ac14f4869 docker: always use API version negotiation when initializing clients (#24237)
During a refactoring of the docker driver in #23966 we introduced a bug: API
version negotiation option was not passed to every new client call.
2024-10-17 15:23:14 +02:00
Tim Gross
d12128c380 docker: use streaming stats collection to correct CPU stats (#24229)
In #23966 we switched to the official Docker SDK for the `docker` driver. In the
process we refactored code around stats collection to use the "one shot" version
of stats. Unfortunately this "one shot" stats collection does not include the
`PreCPU` stats, which are the stats from the previous read. This breaks the
calculation we use to determine CPU ticks, because now we're subtracting 0 from
the current value to get the delta.

Switch back to using the streaming stats collection. Add a test that fully
exercises the `TaskStats` API.

Fixes: https://github.com/hashicorp/nomad/issues/24224
Ref: https://hashicorp.atlassian.net/browse/NET-11348
2024-10-17 08:25:59 -04:00
Piotr Kazmierczak
f9cbaaf6c7 docker: fix a bug where auth for private registries wasn't parsed correctly (#24215)
In #23966 we introduced an official Docker client and did not notice that in
contrast to our previous 3rd party client, the official SDK PullOptions object
expects a base64 encoded JSON with username and password, instead of username/
password pair.
2024-10-16 22:04:54 +02:00
Tim Gross
6b8ddff1fa windows: set job object for executor and children (#24214)
On Windows, if the `raw_exec` driver's executor exits, the child processes are
not also killed. Create a Windows "job object" (not to be confused with a Nomad
job) and add the executor to it. Child processes of the executor will inherit
the job automatically. When the handle to the job object is freed (on executor
exit), the job itself is destroyed and this causes all processes in that job to
exit.

Fixes: https://github.com/hashicorp/nomad/issues/23668
Ref: https://learn.microsoft.com/en-us/windows/win32/procthread/job-objects
2024-10-16 09:20:26 -04:00
Tim Gross
fec91d1dc8 windows: trade heap for stack to build process tree for stats in linear space (#24182)
In #20619 we overhauled how we were gathering stats for Windows
processes. Unlike in Linux where we can ask for processes in a cgroup, on
Windows we have to make a single expensive syscall to get all the processes and
then build the tree ourselves. Our algorithm to do so is recursive and quadratic
in both steps and space with the number of processes on the host. For busy hosts
this hits the stack limit and panics the Nomad client.

We already build a map of parent PID to PID, so modify this to be a map of
parent PID to slice of children and then traverse that tree only from the root
we care about (the executor PID). This moves the allocations to the heap but
makes the stats gathering linear in steps and space required.

This changeset also moves as much of this code as possible into an area
 not conditionally-compiled by OS, as the tagged test file was not being run in CI.

Fixes: https://github.com/hashicorp/nomad/issues/23984
2024-10-14 11:26:38 -04:00
Tim Gross
e9ba630639 docker: fix script check execution (#24098)
In #24095 we made a fix for non-streaming exec into Docker tasks for script
checks and `change_mode = "script"`, but didn't complete E2E testing. We need to
use `ContainerExecAttach` in the new API in order to get stdout/stderr from
tasklets, but the previous `ContainerExecStart` call will prevent this from
running successfully with an error that the exec has already run.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
* This should fix the remaining issues in nightly E2E for Docker.
2024-10-01 16:41:38 -04:00