Commit Graph

182 Commits

Author SHA1 Message Date
Michael Schurter
eeb1da8a2e test: update tests to properly use AllocDir
Also use t.TempDir when possible.
2021-10-19 10:49:07 -07:00
Michael Schurter
37f053ff89 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
Mahmood Ali
feb450a393 executor: set CpuWeight in cgroup-v2 (#11287)
Cgroup-v2 uses `cpu.weight` property instead of cpu shares:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files
. And it uses a different range (i.e. `[1, 10000]`) from cpu.shares
(i.e. `[2, 262144]`) to make things more interesting.

Luckily, the libcontainer provides a helper function to perform the
conversion
[`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value).

I have confirmed that docker/libcontainer performs the conversion as
well in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541
, and that CpuShares is ignored by libcontainer in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29
.
2021-10-14 08:46:07 -04:00
Mahmood Ali
bc2a51d43a executor: suppress spurious log messages (#11273)
Suppress stats streaming error log messages when task finishes.
Streaming errors are expected when a task finishes and they aren't
actionable to users.

Also, note that the task runner Stats hook retries collecting stats
after a delay. If the connection terminates prematurely, it will be
retried, and closing the stats stream is not very disruptive.

Ideally, executor terminates cleanly when task exits, but that's a more
substantial change that may require changing the executor/drivers interface.

Fixes #10814
2021-10-06 12:42:35 -04:00
Mahmood Ali
6c414cd5f9 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
James Rasell
3bffe443ac chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
James Rasell
b3fe60a7f3 drivers: remove duplicate import statements. 2021-06-11 09:38:09 +02:00
Mahmood Ali
0be58d72f4 drivers/exec: Don't inherit Nomad oom_score_adj value (#10698)
Explicitly set the `oom_score_adj` value for `exec` and `java` tasks.

We recommend that the Nomad service to have oom_score_adj of a low value
(e.g. -1000) to avoid having nomad agent OOM Killed if the node is
oversubscriped.

However, Nomad's workloads should not inherit Nomad's process, which is
the default behavior.

Fixes #10663
2021-06-03 14:15:50 -04:00
Seth Hoenig
595cef8136 drivers/exec: pass capabilities through executor RPC
Add capabilities to the LaunchRequest proto so that the
capabilities set actually gets plumbed all the way through
to task launch.
2021-05-17 12:37:40 -06:00
Seth Hoenig
191144c3bf drivers/exec: enable setting allow_caps on exec driver
This PR enables setting allow_caps on the exec driver
plugin configuration, as well as cap_add and cap_drop in
exec task configuration. These options replicate the
functionality already present in the docker task driver.

Important: this change also reduces the default set of
capabilities enabled by the exec driver to match the
default set enabled by the docker driver. Until v1.0.5
the exec task driver would enable all capabilities supported
by the operating system. v1.0.5 removed NET_RAW from that
list of default capabilities, but left may others which
could potentially also be leveraged by compromised tasks.

Important: the "root" user is still special cased when
used with the exec driver. Older versions of Nomad enabled
enabled all capabilities supported by the operating system
for tasks set with the root user. To maintain compatibility
with existing clusters we continue supporting this "feature",
however we maintain support for the legacy set of capabilities
rather than enabling all capabilities now supported on modern
operating systems.
2021-05-17 12:37:40 -06:00
Seth Hoenig
003d68fe6d drivers/docker+exec+java: disable net_raw capability by default
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.

A future version of nomad will enable similar configurability for the
exec and java task drivers.
2021-05-12 13:22:09 -07:00
Nick Ethier
e0a599ed9c nit: code cleanup/organization 2021-04-16 15:14:29 -04:00
Nick Ethier
5377be43ff executor: add support for cpuset cgroup 2021-04-15 10:24:31 -04:00
Yoan Blanc
a814f0253f chore: bump golangci-lint from v1.24 to v1.39
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2021-04-03 09:50:23 +02:00
Mahmood Ali
4532272931 drivers/exec: Account for cgroup-v2 memory stats
If the host is running with cgroup-v2, RSS and Max Usage doesn't get
reported anymore.
2021-04-01 12:13:21 -04:00
zhsj
46b335d652 deps: update runc to v1.0.0-rc93
includes updates for breaking changes in runc v1.0.0-rc93
2021-03-31 10:57:02 -04:00
Mahmood Ali
43549b46fc driver/exec: set soft memory limit
Linux offers soft memory limit:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html#soft-limits
, and
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html?highlight=memory.low
.

We can set soft memory limits through libcontainer
`Resources.MemoryReservation`: https://pkg.go.dev/github.com/opencontainers/runc@v0.1.1/libcontainer/configs#Resources
2021-03-30 16:55:58 -04:00
Mahmood Ali
5e3fbd5774 oversubscription: driver/exec to honor MemoryMaxMB 2021-03-30 16:55:58 -04:00
Tim Gross
14568b3e00 deps: bump gopsutil to v3.21.2 2021-03-30 16:02:51 -04:00
Charlie Voiselle
d914990e5f Fixup uses of sanity (#10187)
* Fixup uses of `sanity`
* Remove unnecessary comments.

These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.

* Update nomad/fsm_test.go
2021-03-16 18:05:08 -04:00
Seth Hoenig
836ee9e4a2 drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior
This PR adds pid_mode and ipc_mode options to the exec and java task
driver config options. By default these will defer to the default_pid_mode
and default_ipc_mode agent plugin options created in #9969. Setting
these values to "host" mode disables isolation for the task. Doing so
is not recommended, but may be necessary to support legacy job configurations.

Closes #9970
2021-02-08 14:26:35 -06:00
Seth Hoenig
6dd5de4b69 docs: fixup comments, var names 2021-02-08 10:58:44 -06:00
Seth Hoenig
b682371a22 drivers/exec+java: Add configuration to restore previous PID/IPC namespace behavior.
This PR adds default_pid_mode and default_ipc_mode options to the exec and java
task drivers. By default these will default to "private" mode, enabling PID and
IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing
so is not recommended, but may be necessary to support legacy job configurations.

Closes #9969
2021-02-05 15:52:11 -06:00
Chris Baker
7f06adf1af Merge tag 'v1.0.3' into post-release-1.0.3
Version 1.0.3
2021-01-29 19:30:08 +00:00
Kris Hicks
f4fe79ad89 Backfill unit test for NEWIPC 2021-01-28 12:03:19 +00:00
Chris Baker
109fb53e50 put exec process in a new IPC namespace 2021-01-28 12:03:19 +00:00
Kris Hicks
41681fa1d2 Add unit test for container namespacing 2021-01-28 12:03:19 +00:00
Kris Hicks
75cea8ad48 Always check that resource constraints were applied 2021-01-28 12:03:19 +00:00
Kris Hicks
677353a205 Add PID namespacing and e2e test 2021-01-28 12:03:19 +00:00
Kris Hicks
bcd4752fc9 executor_linux: Remove unreachable PATH= code (#9778)
This has to have been unused because the HasPrefix operation is
backwards, meaning a Command.Env that includes PATH= never would have
worked; the default path was always used.
2021-01-15 11:19:09 -08:00
Kris Hicks
7747124ef0 Apply some suggested fixes from staticcheck (#9598) 2020-12-10 07:29:18 -08:00
Kris Hicks
071f4c7596 Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Mahmood Ali
6d57a1c69b use comment ignores (#9448)
Use targetted ignore comments for the cases where we are bound by
backward compatibility.

I've left some file based linters, especially when the file is riddled
with linter voilations (e.g. enum names), or if it's a property of the
file (e.g. package and file names).

I encountered an odd behavior related to RPC_REQUEST_RESPONSE_UNIQUE and
RPC_REQUEST_STANDARD_NAME.  Apparently, if they target a `stream` type,
we must separate them into separate lines so that the ignore comment
targets the type specifically.
2020-11-25 16:03:01 -05:00
Mahmood Ali
edaa16589b honor task user when execing into raw_exec task (#9439)
Fix #9210 .

This update the executor so it honors the User when using nomad alloc exec. The bug was that the exec task didn't honor the init command when execing.
2020-11-25 09:34:10 -05:00
Mahmood Ali
bd745fa3e5 raw_exec: don't use cgroups when no_cgroup is set (#9328)
When raw_exec is configured with [`no_cgroups`](https://www.nomadproject.io/docs/drivers/raw_exec#no_cgroups), raw_exec shouldn't attempt to create a cgroup.

Prior to this change, we accidentally always required freezer cgroup to do stats PID tracking. We already have the proper fallback in place for metrics, so only need to ensure that we don't create a cgroup for the task.

Fixes https://github.com/hashicorp/nomad/issues/8565
2020-11-11 16:20:34 -05:00
Mahmood Ali
669f9cbcbe Merge pull request #9304 from hashicorp/b-legacy-executors-are-executors
Legacy executors are executors after all
2020-11-10 12:54:03 -05:00
Kris Hicks
b3825e74b5 protos: Update .proto files not to use Go package name (#9301)
Previously, it was required that you `go get github.com/hashicorp/nomad` to be
able to build protos, as the protoc invocation added an include directive that
pointed to `$GOPATH/src`, which is how dependent protos were discovered. As
Nomad now uses Go modules, it won't necessarily be cloned to `$GOPATH`.
(Additionally, if you _had_ go-gotten Nomad at some point, protoc compilation
would have possibly used the _wrong_ protos, as those wouldn't necessarily be
the most up-to-date ones.)

This change modifies the proto files and the `protoc` invocation to handle
discovering dependent protos via protoc plugin modifier statements that are
specific to the protoc plugin being used.

In this change, `make proto` was run to recompile the protos, which results in
changes only to the gzipped `FileDescriptorProto`.
2020-11-10 08:42:35 -08:00
Mahmood Ali
5847b93c94 Legacy executors are executors after all
This fixes a bug where pre-0.9 executors fail to recover after an
upgrade.

The bug is that legacyExecutorWrappers didn't get updated with
ExecStreaming function, and thus failed to implement the Executor
function. Sadly, this meant that all recovery attempts fail, as the
runtime check in
b312aacbc9/drivers/shared/executor/utils.go (L103-L110)
.
2020-11-10 10:20:07 -05:00
Mahmood Ali
0f2af169ad tests: copy permissions when copying files
On the failover path, copy the permission bits (a.k.a. file mode),
specially the execution bit.
2020-10-01 10:23:14 -04:00
Mahmood Ali
7b9d6a3552 tests: ignore empty cgroup
My latest Vagrant box contains an empty cgroup name that isn't used for
isolation:

```
$ cat /proc/self/cgroup  | grep ::
0::/user.slice/user-1000.slice/session-17.scope
```
2020-10-01 10:23:13 -04:00
Mahmood Ali
6e9edd53b3 tests: failover to copying when symlinking fails
Symlinking busybox may fail when the test code and the test temporary
directory live on different volumes/partitions; so we should copy
instead.  This situation arises in the Vagrant setup, where the code
repository live on special file sharing volume.

Somewhat unrelated, remove `f.Sync()` invocation from a test copyFile
helper function.  Sync is useful only for crash recovery, and isn't
necessary in our test setup.  The sync invocation is a significant
overhead as it requires the OS to flush any cached writes to disk.
2020-09-30 09:58:22 -04:00
Mahmood Ali
d7f14d12c1 Upgrade to golang 1.15 (#8858)
Upgrade to golang 1.15

Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 .
This commit lifts the fix in https://github.com/creack/pty/pull/97 .
2020-09-09 15:59:29 -04:00
Shengjing Zhu
274bf2ee1c Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Mahmood Ali
2f1c63e548 tests: ensure that test is long enough to configure cgroups 2020-05-31 10:42:06 -04:00
Mahmood Ali
d2ddef5ba3 update grpc
Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.
2020-03-03 08:39:54 -05:00
Thomas Lefebvre
5a017acd0b client: support no_pivot_root in exec driver configuration 2020-02-18 09:27:16 -08:00
Mahmood Ali
87c0c92ac7 Pass stats interval colleciton to executor
This fixes a bug where executor based drivers emit stats every second,
regardless of user configuration.

When serializing the Stats request across grpc, the nomad agent dropped
the Interval value, and then executor uses 1s as a default value.
2020-01-31 14:17:15 -05:00
Mahmood Ali
f794b49ec6 simplify cgroup path lookup 2019-12-11 12:43:25 -05:00
Mahmood Ali
596d0be5d8 executor: stop joining executor to container cgroup
Stop joining libcontainer executor process into the newly created task
container cgroup, to ensure that the cgroups are fully destroyed on
shutdown, and to make it consistent with other plugin processes.

Previously, executor process is added to the container cgroup so the
executor process resources get aggregated along with user processes in
our metric aggregation.

However, adding executor process to container cgroup adds some
complications with much benefits:

First, it complicates cleanup.  We must ensure that the executor is
removed from container cgroup on shutdown.  Though, we had a bug where
we missed removing it from the systemd cgroup.  Because executor uses
`containerState.CgroupPaths` on launch, which includes systemd, but
`cgroups.GetAllSubsystems` which doesn't.

Second, it may have advese side-effects.  When a user process is cpu
bound or uses too much memory, executor should remain functioning
without risk of being killed (by OOM killer) or throttled.

Third, it is inconsistent with other drivers and plugins.  Logmon and
DockerLogger processes aren't in the task cgroups.  Neither are
containerd processes, though it is equivalent to executor in
responsibility.

Fourth, in my experience when executor process moves cgroup while it's
running, the cgroup aggregation is odd.  The cgroup
`memory.usage_in_bytes` doesn't seem to capture the full memory usage of
the executor process and becomes a red-harring when investigating memory
issues.

For all the reasons above, I opted to have executor remain in nomad
agent cgroup and we can revisit this when we have a better story for
plugin process cgroup management.
2019-12-11 11:28:09 -05:00
Mahmood Ali
2f4b9da61a drivers/exec: test all cgroups are destroyed 2019-12-11 11:12:29 -05:00