nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
James Rasell	1916a16311	exec: Set LOGNAME env var on exec based drivers. (#26703 ) Typically the `LOGNAME` environment variable should be set according to the values within `/etc/passwd` and represents the name of the logged in user. This should be set, where possible, alongside the USER and HOME variables for all drivers that use the shared executor and do not use a sub-shell.	2025-09-05 14:07:27 +01:00
Daniel Bennett	7c633f8109	exec: don't panic on rootless raw_exec tasks (#26401 ) the executor dies, leaving an orphaned process still running. the panic fix: * don't `panic()` * and return an empty, but non-nil, func on cgroup error feature fix: * allow non-root agent to proceed with exec when cgroups are off	2025-08-04 13:58:35 -04:00
James Rasell	5989d5862a	ci: Update golangci-lint to v2 and fix highlighted issues. (#26334 )	2025-07-25 10:44:08 +01:00
Tim Gross	c8dcd3c2db	docker: clamp CPU shares to minimum of 2 (#26081 ) In #25963 we added normalization of CPU shares for large hosts where the total compute was larger than the maximum CPU shares. But if the result after normalization is less than 2, runc will have an integer overflow. We prevent this in the shared executor for the `exec`/`rawexec` driver by clamping to the safe minimum value. Do this for the `docker` driver as well and add test coverage of it for the shared executor too. Fixes: https://github.com/hashicorp/nomad/issues/26080 Ref: https://github.com/hashicorp/nomad/pull/25963	2025-06-19 13:48:06 -04:00
Tim Gross	34e96932a1	drivers: normalize CPU shares/weights to fit large hosts (#25963 ) The `resources.cpu` field is scheduled in MHz. On most Linux task drivers, this value is then mapped to a `cpu.share` (cgroups v1) or `cpu.weight` (cgroups v2). But this means on very large hosts where the total compute is greater than the Linux kernel defined maximum CPU shares, you can't set a `resources.cpu` value large enough to consume the entire host. The `cpu.share`/`cpu.weight` value is relative within the parent cgroup's slice, which is owned by Nomad. So we can fix this by re-normalizing the weight on very large hosts such that the maximum `resources.cpu` matches up with largest possible CPU share. This happens in the task driver so that the rest of Nomad doesn't need to be aware of this implementation detail. Note that these functions will result in bad share config if the request is more than the available, but that's supposed to be caught in the scheduler so by not catching it here we intentionally hit the runc error. Fixes: https://hashicorp.atlassian.net/browse/NMD-297 Fixes: https://github.com/hashicorp/nomad/issues/7731 Ref: https://go.hashi.co/rfc/nmd-211	2025-06-03 15:57:40 -04:00
Tim Gross	77c8acb422	telemetry: fix excessive CPU consumption in executor (#25870 ) Collecting metrics from processes is expensive, especially on platforms like Windows. The executor code has a 5s cache of stats to ensure that we don't thrash syscalls on nodes running many allocations. But the timestamp used to calculate TTL of this cache was never being set, so we were always treating it as expired. This causes excess CPU utilization on client nodes. Ensure that when we fill the cache, we set the timestamp. In testing on Windows, this reduces exector CPU overhead by roughly 75%. This changeset includes two other related items: * The `telemetry.publish_allocation_metrics` field correctly prevents a node from publishing metrics, but the stats hook on the taskrunner still collects the metrics, which can be expensive. Thread the configuration value into the stats hook so that we don't collect if `telemetry.publish_allocation_metrics = false`. * The `linuxProcStats` type in the executor's `procstats` package is misnamed as a result of a couple rounds of refactoring. It's used by all task executors, not just Linux. Rename this and move a comment about how Windows processes are listed so that the comment is closer to where the logic is implemented. Fixes: https://github.com/hashicorp/nomad/issues/23323 Fixes: https://hashicorp.atlassian.net/browse/NMD-455	2025-05-19 09:24:13 -04:00
Piotr Kazmierczak	0fa0624576	exec: Fix incorrect `HOME` and `USER` env variables for tasks that have `user` set (#25859 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-05-16 15:02:45 +02:00
Tim Gross	374e987b9b	metrics: emit cache and rss stats on cgroup v2 (#25751 ) In cgroups v2, a different map of memory stats is available from the kernel than in v1. The Docker API reflects this change. But there are equivalent values in the map for RSS (anonymously mapped memory) and cache (filesystem cache and tmpfs), which the Docker driver is not currently emitting. Fallback to these alternate values when the cgroups v1 values are not available. Include the anonymous mapping in the "measured" allocation stats as "RSS" so that they both show up in allocation metrics. We can do this on both the `docker` driver and the Linux executor for `exec` and `java` drivers. Fixes: https://github.com/hashicorp/nomad/issues/19185 Ref: https://hashicorp.atlassian.net/browse/NMD-437 Ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files Ref: https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt	2025-04-24 12:48:18 -04:00
James Rasell	c85c723336	ci: Run core tests groups workflow on amd64 and arm64 runners. (#25695 )	2025-04-17 15:16:29 +01:00
Tim Gross	48f304d0ca	java: only set nobody user on Unix (#25648 ) In #25496 we introduced the ability to have `task.user` set for on Windows, so long as the user ID fits a particular shape. But this uncovered a 7 year old bug in the `java` driver introduced in #5143, where we set the `task.user` to the non-existent Unix user `nobody`, even if we're running on Windows. Prior to the change in #25496 we always ignored the `task.user`, so this was not a problem. We don't set the `task.user` in the `raw_exec` driver, and the otherwise very similar `exec` driver is Linux-only, so we never see the problem there. Fix the bug in the `java` driver by gating the change to the `task.user` on not being Windows. Also add a check to the new code path that the user is non-empty before parsing it, so that any third party drivers that might be borrowing the executor code don't hit the same probem on Windows. Ref: https://github.com/hashicorp/nomad/pull/5143 Ref: https://github.com/hashicorp/nomad/pull/25496 Fixes: https://github.com/hashicorp/nomad/issues/25638	2025-04-10 10:34:34 -04:00
Denis Rodin	aca0ff438a	raw_exec windows: add support for setting the task user (#25496 )	2025-04-03 11:21:13 -04:00
Piotr Kazmierczak	e9ebbed32c	drivers: unflake `TestExecutor_OOMKilled` (#25521 ) Every now and then TestExecutor_OOMKilled would fail with: "unable to start container process: container init was OOM-killed (memory limit too low?)" which started happening since we upgraded libcontainer. This PR removes manual (and arbitrary) resource limits on the test task, since it should be OOMd with resources inherited from the testExecutorCommandWithChroot, and it fixes a small possible goroutine leak in the OOM checker in exec driver.	2025-03-28 11:35:02 +01:00
Piotr Kazmierczak	16bbdd9833	drivers: adapt shared executor code to use opencontainers/runc 1.2 (#25138 ) Co-authored-by: Michael Smithhisler <michael.smithhisler@hashicorp.com>	2025-03-17 14:32:16 +01:00
Simon Zou	73ceacd236	ListProcesses through PID when cgroup is not found in Linux (#25198 ) * ListProcesses through PID when cgroup is not found * add changelog entry * update the ListByPid for windows	2025-03-06 17:41:51 +01:00
Juana De La Cuesta	6ffe441983	[gh-24931] Return dummy function for moving processes when running rootless (#24944 ) * fix: stop executor launch if nomad doesnt have permissions * func: return move function if c group is not enabled	2025-03-06 10:34:21 +01:00
Jorge Marey	25426f0777	fingerprint: add config option to disable dmidecode (#25108 )	2025-02-13 11:20:48 -05:00
Michael Smithhisler	4e2d9675e7	executor: fail early on reattach if listener is not executor (#24538 )	2024-12-02 09:56:00 -05:00
Piotr Kazmierczak	3a18f22c18	goflags: go:build linux for tests that won't compile on other platforms (#24559 ) I'm a heavy LSP user and I frequently goto:next_error. This confuses my editor on macOS.	2024-11-28 15:05:00 +01:00
Seth Hoenig	dd396a3900	windows: revert process listing logic to that of v1.6.10 (#24494 ) * windows: revert process listing logic to that of v1.6.10 In Nomad 1.7 much of the process management code was refactored, including a rewrite of how the process tree of an executor was determined on Windows machines. Unfortunately that rewrite has been cursed with performance issues and bugs. Instead, revert to the logic used in v1.6.10. * changelog	2024-11-20 11:20:20 -06:00
Kir Kolyshkin	d09c8ddf21	deps: switch to moby/sys/capability (#24093 ) github.com/moby/sys/capability is a fork of the (no longer maintained) github.com/syndtr/gocapability package. For changes since the fork took place, see https://github.com/moby/sys/blob/main/capability/CHANGELOG.md Note that the "workaround for RHEL6" is removed for a number of reasons. Feel free to choose the one you like the most, either is sufficient: 1. /proc/sys/kernel/cap_last_cap is available since RHEL 6.7 (kernel 2.6.32-573.el6), released 9 years ago (2015-07-22). 2. It incorrectly returns CAP_BLOCK_SUSPEND (36), which was only added in kernel v3.5 and was never backported to RHEL6 kernels. The correct value for RHEL6 would be CAP_MAC_ADMIN (33). 3. As far as upstream kernels go, /proc/sys/kernel/cap_last_cap was added in kernel v3.2, and a correct value depends on the kernel version. It could be CAP_WAKE_ALARM (35), added to kernel v3.0, or CAP_SYSLOG (34), added to kernel v2.6.38, or possibly a lesser value for even older kernels. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-11-11 14:07:31 -05:00
Seth Hoenig	a0ff07393b	drivers: provide empty implementations of cgroup helpers for non-root nomad (#24392 )	2024-11-07 12:24:37 -06:00
Seth Hoenig	b58abf48c1	drivers: move executor process out of v1 task cgroup after process starts (#24340 ) * drivers: move executor process out of v1 task cgroup after process starts This PR changes the behavior of the raw exec task driver on old cgroups v1 systems such that the executor process is no longer a member of the cgroups created for the task. Now, the executor process is placed into those cgroups and starts the task child process (just as before), but now then exits those cgroups and exists in the nomad parent cgroup. This change makes the behavior sort of similar to cgroups v2 systems, where we never have the executor enter the task cgroup to begin with (because we can directly clone(3) the task process into it). Fixes #23951 * executor: handle non-linux case * cgroups: add test case for no executor process in task cgroup (v1) * add changelog * drivers: also move executor out of cpuset cgroup	2024-11-07 07:31:38 -06:00
Michael Smithhisler	658c429d75	Drivers: add work_dir config to exec/raw_exec/java drivers (#24249 ) --------- Co-authored-by: wurosh <uros.m.perisic@gmail.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-11-01 11:04:40 -04:00
Tim Gross	6b8ddff1fa	windows: set job object for executor and children (#24214 ) On Windows, if the `raw_exec` driver's executor exits, the child processes are not also killed. Create a Windows "job object" (not to be confused with a Nomad job) and add the executor to it. Child processes of the executor will inherit the job automatically. When the handle to the job object is freed (on executor exit), the job itself is destroyed and this causes all processes in that job to exit. Fixes: https://github.com/hashicorp/nomad/issues/23668 Ref: https://learn.microsoft.com/en-us/windows/win32/procthread/job-objects	2024-10-16 09:20:26 -04:00
Tim Gross	fec91d1dc8	windows: trade heap for stack to build process tree for stats in linear space (#24182 ) In #20619 we overhauled how we were gathering stats for Windows processes. Unlike in Linux where we can ask for processes in a cgroup, on Windows we have to make a single expensive syscall to get all the processes and then build the tree ourselves. Our algorithm to do so is recursive and quadratic in both steps and space with the number of processes on the host. For busy hosts this hits the stack limit and panics the Nomad client. We already build a map of parent PID to PID, so modify this to be a map of parent PID to slice of children and then traverse that tree only from the root we care about (the executor PID). This moves the allocations to the heap but makes the stats gathering linear in steps and space required. This changeset also moves as much of this code as possible into an area not conditionally-compiled by OS, as the tagged test file was not being run in CI. Fixes: https://github.com/hashicorp/nomad/issues/23984	2024-10-14 11:26:38 -04:00
Seth Hoenig	51215bf102	deps: update to go-set/v3 and refactor to use custom iterators (#23971 ) * deps: update to go-set/v3 * deps: use custom set iterators for looping	2024-09-16 13:40:10 -05:00
Tim Gross	b25f1b66ce	resources: allow job authors to configure size of secrets tmpfs (#23696 ) On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks need larger space for downloading large secrets. This is especially the case for tasks using `templates`, which need extra room to write a temporary file to the secrets directory that gets renamed to the old file atomically. This changeset allows increasing the size of the tmpfs in the `resources` block. Because this is a memory resource, we need to include it in the memory we allocate for scheduling purposes. The task is already prevented from using more memory in the tmpfs than the `resources.memory` field allows, but can bypass that limit by writing to the tmpfs via `template` or `artifact` blocks. Therefore, we need to account for the size of the tmpfs in the allocation resources. Simply adding it to the memory needed when we create the allocation allows it to be accounted for in all downstream consumers, and then we'll subtract that amount from the memory resources just before configuring the task driver. For backwards compatibility, the default value of 1MiB is "free" and ignored by the scheduler. Otherwise we'd be increasing the allocated resources for every existing alloc, which could cause problems across upgrades. If a user explicitly sets `resources.secrets = 1` it will no longer be free. Fixes: https://github.com/hashicorp/nomad/issues/2481 Ref: https://hashicorp.atlassian.net/browse/NET-10070	2024-08-05 16:06:58 -04:00
Piotr Kazmierczak	85430be6dd	raw_exec: oom_score_adj support (#23308 )	2024-06-14 11:36:27 +02:00
Luke Palmer	75874136ac	fix cgroup setup for non-default devices (#22518 )	2024-06-13 09:27:19 -04:00
Seth Hoenig	7d00a494d9	windows: fix inefficient gathering of task processes (#20619 ) * windows: fix inefficient gathering of task processes * return set of just executor pid in case of ps error	2024-05-17 09:46:23 -05:00
Juana De La Cuesta	169818b1bd	[gh-6980] Client: clean up old allocs before running new ones using the `exec` task driver. (#20500 ) Whenever the "exec" task driver is being used, nomad runs a plug in that in time runs the task on a container under the hood. If by any circumstance the executor is killed, the task is reparented to the init service and wont be stopped by Nomad in case of a job updated or stop. This commit introduces two mechanisms to avoid this behaviour: * Adds signal catching and handling to the executor, so in case of a SIGTERM, the signal will also be passed on to the task. * Adds a pre start clean up of the processes in the container, ensuring only the ones the executor runs are present at any given time.	2024-05-14 09:51:27 +02:00
Tim Gross	623486b302	deps: vendor containernetworking/plugins functions for net NS utils (#20556 ) We bring in `containernetworking/plugins` for the contents of a single file, which we use in a few places for running a goroutine in a specific network namespace. This code hasn't needed an update in a couple of years, and a good chunk of what we need was previously vendored into `client/lib/nsutil` already. Updating the library via dependabot is causing errors in Docker driver tests because it updates a lot of transient dependencies, and it's bringing in a pile of new transient dependencies like opentelemetry. Avoid this problem going forward by vendoring the remaining code we hadn't already. Ref: https://github.com/hashicorp/nomad/pull/20146	2024-05-13 09:10:16 -04:00
Seth Hoenig	14a022cbc0	drivers/raw_exec: enable setting cgroup override values (#20481 ) * drivers/raw_exec: enable setting cgroup override values This PR enables configuration of cgroup override values on the `raw_exec` task driver. WARNING: setting cgroup override values eliminates any gauruntee Nomad can make about resource availability for any task on the client node. For cgroup v2 systems, set a single unified cgroup path using `cgroup_v2_override`. The path may be either absolute or relative to the cgroup root. config { cgroup_v2_override = "custom.slice/app.scope" } or config { cgroup_v2_override = "/sys/fs/cgroup/custom.slice/app.scope" } For cgroup v1 systems, set a per-controller path for each controller using `cgroup_v1_override`. The path(s) may be either absolute or relative to the controller root. config { cgroup_v1_override = { "pids": "custom/app", "cpuset": "custom/app", } } or config { cgroup_v1_override = { "pids": "/sys/fs/cgroup/pids/custom/app", "cpuset": "/sys/fs/cgroup/cpuset/custom/app", } } * drivers/rawexec: ensure only one of v1/v2 cgroup override is set * drivers/raw_exec: executor should error if setting cgroup does not work * drivers/raw_exec: create cgroups in raw_exec tests * drivers/raw_exec: ensure we fail to start if custom cgroup set and non-root * move custom cgroup func into shared file --------- Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2024-05-07 16:46:27 -07:00
Seth Hoenig	05937ab75b	exec2: add client support for unveil filesystem isolation mode (#20115 ) * exec2: add client support for unveil filesystem isolation mode This PR adds support for a new filesystem isolation mode, "Unveil". The mode introduces a "alloc_mounts" directory where tasks have user-owned directory structure which are bind mounts into the real alloc directory structure. This enables a task driver to use landlock (and maybe the real unveil on openbsd one day) to isolate a task to the task owned directory structure, providing sandboxing. * actually create alloc-mounts-dir directory * fix doc strings about alloc mount dir paths	2024-03-13 08:24:17 -05:00
carrychair	5f5b34db0e	remove repetitive words (#20110 ) Signed-off-by: carrychair <linghuchong404@gmail.com>	2024-03-11 08:52:08 +00:00
Luiz Aoqui	b52a44717e	executor: limit the value of CPU shares (#19935 ) The value for the executor cgroup CPU weight must be within the limits imposed by the Linux kernel. Nomad used the task `resource.cpu`, an unbounded value, directly as the cgroup CPU weight, causing it to potentially go outside the imposed values. This commit clamps the CPU shares values to be within the limits allowed. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-02-09 16:29:14 -05:00
Tim Gross	110d93ab25	windows: remove LazyDLL calls for system modules (#19925 ) On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to load a few system DLL files, which does not prevent DLL hijacking attacks. Hypothetically a local attacker on the client host that can place an abusive library in a specific location could use this to escalate privileges to the Nomad process. Although this attack does not fall within the Nomad security model, it doesn't hurt to follow good practices here. We can remove two of these DLL loads by using wrapper functions provided by the stdlib in `x/sys/windows` Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>	2024-02-09 08:47:48 -05:00
Seth Hoenig	9410c519ff	drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration (#19599 ) * drivers/raw_exec: remove plumbing for ineffective no_cgroups configuration * fix tests	2024-01-11 08:20:15 -06:00
Seth Hoenig	cb7d078c1d	drivers/raw_exec: enable configuring raw_exec task to have no memory limit (#19670 ) * drivers/raw_exec: enable configuring raw_exec task to have no memory limit This PR makes it possible to configure a raw_exec task to not have an upper memory limit, which is how the driver would behave pre-1.7. This is done by setting memory_max = -1. The cluster (or node pool) must have memory oversubscription enabled. * cl: add cl	2024-01-09 14:57:13 -06:00
Marvin Chin	d75293d2ab	Add OOM detection for exec driver (#19563 ) * Add OomKilled field to executor proto format * Teach linux executor to detect and report OOMs * Teach exec driver to propagate OOMKill information * Fix data race * use tail /dev/zero to create oom condition * use new test framework * minor tweaks to executor test * add cl entry * remove type conversion --------- Co-authored-by: Marvin Chin <marvinchin@users.noreply.github.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-01-03 09:50:27 -06:00
Matt Robenolt	656bb5cafa	drivers/executor: set oom_score_adj for raw_exec (#19515 ) * drivers/executor: set oom_score_adj for raw_exec This might not be wholly true since I don't know all configurations of Nomad, but in our use cases, we run some of our tasks as `raw_exec` for reasons. We observed that our tasks were running with `oom_score_adj = -1000`, which prevents them from being OOM'd. This value is being inherited from the nomad agent parent process, as configured by systemd. Similar to #10698, we also were shocked to have this value inherited down to every child process and believe that we should also set this value to 0 explicitly. I have no idea if there are other paths that might leverage this or other ways that `raw_exec` can manifest, but this is how I was able to observe and fix in one of our configurations. We have been running in production our tasks wrapped in a script that does: `echo 0 > /proc/self/oom_score_adj` to avoid this issue. * drivers/executor: minor cleanup of setting oom adjustment * e2e: add test for raw_exec oom adjust score * e2e: set oom score adjust to -999 * cl: add cl --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-01-02 13:35:09 -06:00
Luiz Aoqui	e4e70b086a	ci: run linter in `./api` package (#19513 )	2023-12-19 15:59:47 -05:00
Tim Gross	0236bd0907	qemu: fix panic from missing resources block (#19089 ) The `qemu` driver uses our universal executor to run the qemu command line tool. Because qemu owns the resource isolation, we don't pass in the resource block that the universal executor uses to configure cgroups and core pinning. This resulted in a panic. Fix the panic by returning early in the cgroup configuration in the universal executor. This fixes `qemu` but also any third-party drivers that might exist and are using our executor code without passing in the resource block. In future work, we should ensure that the `resources` block is being translated into qemu equivalents, so that we have support for things like NUMA-aware scheduling for that driver. Fixes: https://github.com/hashicorp/nomad/issues/19078	2023-11-14 16:26:44 -05:00
Seth Hoenig	e3c8700ded	deps: upgrade to go-set/v2 (#18638 ) No functional changes, just cleaning up deprecated usages that are removed in v2 and replace one call of .Slice with .ForEach to avoid making the intermediate copy.	2023-10-05 11:56:17 -05:00
Seth Hoenig	591394fb62	drivers: plumb hardware topology via grpc into drivers (#18504 ) * drivers: plumb hardware topology via grpc into drivers This PR swaps out the temporary use of detecting system hardware manually in each driver for using the Client's detected topology by plumbing the data over gRPC. This ensures that Client configuration is taken to account consistently in all references to system topology. * cr: use enum instead of bool for core grade * cr: fix test slit tables to be possible	2023-09-18 08:58:07 -05:00
Seth Hoenig	2e1974a574	client: refactor cpuset partitioning (#18371 ) * client: refactor cpuset partitioning This PR updates the way Nomad client manages the split between tasks that make use of resources.cpus vs. resources.cores. Previously, each task was explicitly assigned which CPU cores they were able to run on. Every time a task was started or destroyed, all other tasks' cpusets would need to be updated. This was inefficient and would crush the Linux kernel when a client would try to run ~400 or so tasks. Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently manage cpusets. * cr: tweaks for feedback	2023-09-12 09:11:11 -05:00
Seth Hoenig	6747ef8803	drivers/raw_exec: restore ability to run tasks without nomad running as root (#18206 ) Although nomad officially does not support running the client as a non-root user, doing so has been more or less possible with the raw_exec driver as long as you don't expect features to work like networking or running tasks as specific users. In the cgroups refactoring I bulldozed right over the special casing we had in place for raw_exec to continue working if the cgroups were unable to be created. This PR restores that behavior - you can now (as before) run the nomad client as a non-root user and make use of the raw_exec task driver.	2023-08-15 11:22:30 -05:00
Seth Hoenig	a4cc76bd3e	numa: enable numa topology detection (#18146 ) * client: refactor cgroups management in client * client: fingerprint numa topology * client: plumb numa and cgroups changes to drivers * client: cleanup task resource accounting * client: numa client and config plumbing * lib: add a stack implementation * tools: remove ec2info tool * plugins: fixup testing for cgroups / numa changes * build: update makefile and package tests and cl	2023-08-10 17:05:30 -05:00
Patric Stout	e190eae395	Use config "cpu_total_compute" (if set) for all CPU statistics (#17628 ) Before this commit, it was only used for fingerprinting, but not for CPU stats on nodes or tasks. This meant that if the auto-detection failed, setting the cpu_total_compute didn't resolved the issue. This issue was most noticeable on ARM64, as there auto-detection always failed.	2023-07-19 13:30:47 -05:00
grembo	6f04b91912	Add `disable_file` parameter to job's `vault` stanza (#13343 ) This complements the `env` parameter, so that the operator can author tasks that don't share their Vault token with the workload when using `image` filesystem isolation. As a result, more powerful tokens can be used in a job definition, allowing it to use template stanzas to issue all kinds of secrets (database secrets, Vault tokens with very specific policies, etc.), without sharing that issuing power with the task itself. This is accomplished by creating a directory called `private` within the task's working directory, which shares many properties of the `secrets` directory (tmpfs where possible, not accessible by `nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the container. If the `disable_file` parameter is set to `false` (its default), the Vault token is also written to the NOMAD_SECRETS_DIR, so the default behavior is backwards compatible. Even if the operator never changes the default, they will still benefit from the improved behavior of Nomad never reading the token back in from that - potentially altered - location.	2023-06-23 15:15:04 -04:00

1 2 3 4 5 ...

254 Commits