In #25496 we introduced the ability to have `task.user` set for on Windows, so
long as the user ID fits a particular shape. But this uncovered a 7 year old bug
in the `java` driver introduced in #5143, where we set the `task.user` to the
non-existent Unix user `nobody`, even if we're running on Windows.
Prior to the change in #25496 we always ignored the `task.user`, so this was not
a problem. We don't set the `task.user` in the `raw_exec` driver, and the
otherwise very similar `exec` driver is Linux-only, so we never see the problem
there.
Fix the bug in the `java` driver by gating the change to the `task.user` on not
being Windows. Also add a check to the new code path that the user is non-empty
before parsing it, so that any third party drivers that might be borrowing the
executor code don't hit the same probem on Windows.
Ref: https://github.com/hashicorp/nomad/pull/5143
Ref: https://github.com/hashicorp/nomad/pull/25496
Fixes: https://github.com/hashicorp/nomad/issues/25638
Nomad driver handles incorrectly set exit code 0 in case of executor failure.
This corrects that behavior.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* exec2: add client support for unveil filesystem isolation mode
This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.
* actually create alloc-mounts-dir directory
* fix doc strings about alloc mount dir paths
* drivers: plumb hardware topology via grpc into drivers
This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.
* cr: use enum instead of bool for core grade
* cr: fix test slit tables to be possible
Log lines which include an error should use the full term "error"
as the context key. This provides consistency across the codebase
and avoids a Go style which operators might not be aware of.
This test exercises upgrades between 0.8 and Nomad versions greater
than 0.9. We have not supported 0.8.x in a very long time and in any
case the test has been marked to skip because the downloader doesn't
work.
This PR improves the regular expression used for matching the java
version string, which varies a lot depending on the java vendor and
version.
These are the example strings we now test for:
java version "1.7.0_80"
openjdk version "11.0.1" 2018-10-16
openjdk version "11.0.1" 2018-10-16
java version "1.6.0_36"
openjdk version "1.8.0_192"
openjdk 11.0.11 2021-04-20 LTS
The last one is a new test added on behalf of #6081, which is
still broken on today's CentOS 7 default JDK package.
openjdk 11.0.11 2021-04-20 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing)
==> Evaluation "21c6caf7" finished with status "complete" but failed to place all allocations:
Task Group "example" (failed to place 1 allocation):
* Constraint "${driver.java.version} >= 11.0.0": 1 nodes excluded by filter
Evaluation "2b737d48" waiting for additional capacity to place remainder
Fixes#6081
This changeset does not introduce any functional change for the
docker driver, but rather cleans up the implementation around
computing configured capabilities by re-using code written for
the exec/java task drivers.
This PR adds pid_mode and ipc_mode options to the exec and java task
driver config options. By default these will defer to the default_pid_mode
and default_ipc_mode agent plugin options created in #9969. Setting
these values to "host" mode disables isolation for the task. Doing so
is not recommended, but may be necessary to support legacy job configurations.
Closes#9970
This PR adds default_pid_mode and default_ipc_mode options to the exec and java
task drivers. By default these will default to "private" mode, enabling PID and
IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing
so is not recommended, but may be necessary to support legacy job configurations.
Closes#9969
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.
This change makes few modifications:
First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting. Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.
Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.
This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.
Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.
Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
Looks like the RecoverTask doesn't set taskHandle.logger field causing
a panic when the handle attempts to log (e.g. when Shutdown or Signaling
fails).
Without passing the network isolation configuration to the executor,
java tasks are not placed in the same network namespace as the other
processes in their task group, which breaks Consul Connect.
* master: (23 commits)
tests: avoid assertion in goroutine
spell check
ci: run checkscripts
tests: deflake TestRktDriver_StartWaitRecoverWaitStop
drivers/rkt: Remove unused github.com/rkt/rkt
drivers/rkt: allow development on non-linux
cli: Hide `nomad docker_logger` from help output
api: test api and structs are in sync
goimports until make check is happy
nil check node resources to prevent panic
tr: use context in as select statement
move pluginutils -> helper/pluginutils
vet
goimports
gofmt
Split hclspec
move hclutils
Driver tests do not use hcl2/hcl, hclspec, or hclutils
move reattach config
loader and singleton
...
plugins/driver: update driver interface to support streaming stats
client/tr: use streaming stats api
TODO:
* how to handle errors and closed channel during stats streaming
* prevent tight loop if Stats(ctx) returns an error
drivers: update drivers TaskStats RPC to handle streaming results
executor: better error handling in stats rpc
docker: better control and error handling of stats rpc
driver: allow stats to return a recoverable error
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.