29 Commits

Author SHA1 Message Date
Jorge Marey
25426f0777 fingerprint: add config option to disable dmidecode (#25108) 2025-02-13 11:20:48 -05:00
Michael Smithhisler
658c429d75 Drivers: add work_dir config to exec/raw_exec/java drivers (#24249)
---------

Co-authored-by: wurosh <uros.m.perisic@gmail.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2024-11-01 11:04:40 -04:00
Seth Hoenig
591394fb62 drivers: plumb hardware topology via grpc into drivers (#18504)
* drivers: plumb hardware topology via grpc into drivers

This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.

* cr: use enum instead of bool for core grade

* cr: fix test slit tables to be possible
2023-09-18 08:58:07 -05:00
Seth Hoenig
2e1974a574 client: refactor cpuset partitioning (#18371)
* client: refactor cpuset partitioning

This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.

Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.

Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.

* cr: tweaks for feedback
2023-09-12 09:11:11 -05:00
hashicorp-copywrite[bot]
2d35e32ec9 Update copyright file headers to BUSL-1.1 2023-08-10 17:27:15 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Lance Haig
3160c76209 deps: Update ioutil library references to os and io respectively for drivers package (#16331)
* Update ioutil library references to os and io respectively for drivers package

No user facing changes so I assume no change log is required

* Fix failing tests
2023-03-08 10:31:09 -06:00
Seth Hoenig
5da1a31e94 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Seth Hoenig
9bb4b8fa04 drivers/java: enable setting allow_caps on java driver
Enable setting allow_caps on the java task driver plugin, along
with the associated cap_add and cap_drop options in java task
configuration.
2021-05-17 12:37:40 -06:00
Seth Hoenig
b682371a22 drivers/exec+java: Add configuration to restore previous PID/IPC namespace behavior.
This PR adds default_pid_mode and default_ipc_mode options to the exec and java
task drivers. By default these will default to "private" mode, enabling PID and
IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing
so is not recommended, but may be necessary to support legacy job configurations.

Closes #9969
2021-02-05 15:52:11 -06:00
Mahmood Ali
6e9edd53b3 tests: failover to copying when symlinking fails
Symlinking busybox may fail when the test code and the test temporary
directory live on different volumes/partitions; so we should copy
instead.  This situation arises in the Vagrant setup, where the code
repository live on special file sharing volume.

Somewhat unrelated, remove `f.Sync()` invocation from a test copyFile
helper function.  Sync is useful only for crash recovery, and isn't
necessary in our test setup.  The sync invocation is a significant
overhead as it requires the OS to flush any cached writes to disk.
2020-09-30 09:58:22 -04:00
Nick Ethier
af6aee3b8a fix test failures from rebase 2020-06-18 11:05:32 -07:00
Nick Ethier
e9ff8a8daa Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Mahmood Ali
d6c75e301e cleanup driver eventor goroutines
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.

This change makes few modifications:

First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting.  Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.

Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
2020-05-26 11:04:04 -04:00
Mahmood Ali
74e5e20c0b drivers: implement streaming exec for executor based drivers
These simply delegate call to backend executor.
2019-05-10 19:17:14 -04:00
Mahmood Ali
d6250ec0d6 tests: IsTravis() -> IsCI()
Replace IsTravis() references that is intended for more CI environments
rather than for Travis environment specifically.
2019-02-20 08:21:03 -05:00
Mahmood Ali
941f89e0fe tests: add hcl task driver config parsing tests (#5314)
* drivers: add config parsing tests

Add basic tests for parsing and encoding task config.

* drivers/docker: fix some config declarations

* refactor and document config parse helpers
2019-02-12 14:46:37 -05:00
Alex Dadgar
e46d67a889 Driver tests do not use hcl2/hcl, hclspec, or hclutils 2019-01-22 15:43:34 -08:00
Alex Dadgar
437f03d877 recover 2019-01-07 14:49:40 -08:00
Alex Dadgar
cd6879409c Drivers 2018-12-18 15:50:11 -08:00
Mahmood Ali
2fb5e35012 Merge pull request #4950 from hashicorp/b-exc-libcontainer-kill
executor: kill all container processes
2018-12-08 09:52:42 -05:00
Mahmood Ali
7a49e9b68e drivers/exec: refactor stop/kill tests
Simplify the tests to do all assertions within the main goroutine and
account for status propagation delay.
2018-12-04 20:34:43 -05:00
Danielle Tomlinson
1c98fdf935 plugins: Move driver testing support to subpackage
this allows us to drop a cyclical import, but is subobptimal as it
requires BaseDriver tests to move. This falls firmly into the realm of
being a hack. Alternatives welcome.
2018-12-01 17:29:39 +01:00
Preetha Appan
829bf74aa8 modify fingerprint interface to use typed attribute struct 2018-11-28 10:01:03 -06:00
Mahmood Ali
cc52606a07 distinguish java driver tests from others 2018-11-13 10:21:40 -05:00
Mahmood Ali
a24fd59d4b Fix java driver tests 2018-11-13 10:21:40 -05:00
Mahmood Ali
0022496e5b add java driver tests 2018-11-06 12:41:39 -08:00