Commit Graph

551 Commits

Author SHA1 Message Date
Mahmood Ali
76324a1511 don't GC images in tests by default 2020-05-26 21:24:55 -04:00
Mahmood Ali
5de673e6f2 tests: don't delete images after tests complete
Fix some docker test flakiness where image cleanup process may
contaminate other tests. A clean up process may attempt to delete an
image while it's used by another test.
2020-05-26 18:53:24 -04:00
Mahmood Ali
d6c75e301e cleanup driver eventor goroutines
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.

This change makes few modifications:

First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting.  Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.

Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
2020-05-26 11:04:04 -04:00
Tim Gross
8860b72bc3 volumes: return better error messages for unsupported task drivers (#8030)
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
2020-05-21 09:18:02 -04:00
Mahmood Ali
a6e59eb748 Use an image managed by nomad account
This is a retag of stefanscherer/busybox-windows@sha256:af396324c4c62e369a388ebb38d4efd44211dc7c95a438e6feb62b4ae4194c5b
2020-05-15 12:55:22 -04:00
Mahmood Ali
4c679d4c64 Use a pinned tag of stefanscherer/busybox-windows 2020-05-15 12:20:37 -04:00
Michele
d09937e41a Move appveyor tests to circle 2020-05-15 10:15:37 -04:00
Mahmood Ali
72c08e0591 docker: Fix docker image gc tracking
This fixes a bug where docker images may not be GCed.  The cause of the
bug is that we track the task using `task.ID+task.Name` on task start
but remove on plain `task.ID`.

This haromize the two paths by using `task.ID`, as it's unique enough
and it's also used in the `loadImage` path (path when loading an image
from a local tarball instead of dockerhub).
2020-05-13 12:33:17 -04:00
Mahmood Ali
dd06346435 Merge pull request #7932 from hashicorp/f-docker-custom-runtimes
Docker runtimes
2020-05-12 11:59:36 -04:00
Mahmood Ali
44c93e3598 update tests 2020-05-12 11:39:09 -04:00
Mahmood Ali
9aa10cc4cd use allow_runtimes for consistency
Other allow lists use allow_ prefix (e.g. allow_caps, allow_privileged).
2020-05-12 11:03:08 -04:00
Mahmood Ali
0f499a37ef Apply suggestions from code review
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-05-12 10:56:47 -04:00
Mahmood Ali
e393e71ef8 more tests 2020-05-12 10:14:54 -04:00
Mahmood Ali
88ec571375 Add a knob to restrict docker runtimes 2020-05-12 10:14:43 -04:00
Juan Larriba
65f09ed119 Run Linux Images (LCOW) and Windows Containers side by side (#7850)
Makes it possible to run Linux Containers On Windows with Nomad alongside Windows Containers. Fingerprint prevents only to run Nomad in Windows 10 with Linux Containers
2020-05-04 13:08:47 -04:00
Mahmood Ali
9db46fde84 driver/docker: protect against nil container
Protect against a panic when we attempt to start a container with a name
that conflicts with an existing one.  If the existing one is being
deleted while nomad first attempts to create the container, the
createContainer will fail with `container already exists`, but we get
nil container reference from the `containerByName` lookup, and cause a
crash.

I'm not certain how we get into the state, except for being very
unlucky.  I suspect that this case may be the result of a concurrent
restart or the docker engine API not being fully consistent (e.g. an
earlier call purged the container, but docker didn't free up resources
yet to create a new container with the same name immediately yet).

If that's the case, then re-attempting creation will hopefully succeed,
or we'd at least fail enough times for the alloc to be rescheduled to
another node.
2020-04-19 15:34:45 -04:00
Ben Buzbee
3bd675ea9d Rename OCIRuntime to Runtime; allow gpu conflicts is they are the same runtime; add conflict test 2020-04-03 12:15:11 -07:00
Ben Buzbee
5f58ac619a Support custom docker runtimes
This enables customers who want to use gvisor and have it configured on their clients.
2020-04-03 11:07:37 -07:00
Mahmood Ali
b931961053 Merge pull request #7554 from benbuzbee/benbuz/fix-seccomp-file
Parse security_opts before sending them to docker daemon
2020-03-31 11:54:17 -04:00
Ben Buzbee
a3c3f7e88f Parse security_opts before sending them to docker daemon
Fixes #6720

Copy the parsing function from the docker CLI. Docker daemon expects to see JSON for seccomp file not a path.
2020-03-31 08:34:41 -07:00
Mahmood Ali
b1bf952d47 Merge pull request #7550 from hashicorp/vendor-fsouza-go-docker-client-20200330
Vendor fsouza/go-docker-client update
2020-03-31 08:46:30 -04:00
Mahmood Ali
2def8de33e driver/docker: fix memory swapping
MemorySwappiness can only be set in non-Windows options: https://ci.appveyor.com/project/hashicorp/nomad/builds/31832149

Also fixes https://github.com/hashicorp/nomad/issues/6085
2020-03-30 16:51:16 -04:00
Mahmood Ali
860b83c55b Merge pull request #7508 from greut/docker-drain-timer
docker: drain fingerprint timer
2020-03-30 16:37:53 -04:00
Yoan Blanc
a272a18ba9 Update drivers/docker/fingerprint.go
Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>
2020-03-30 22:11:42 +02:00
Mahmood Ali
d851b006dc vendors: update fsouza/go-docker-client to v.1.6.3 2020-03-30 15:10:53 -04:00
Mahmood Ali
a7999c2d1c Merge pull request #7531 from greut/docker-v19.03.8
Docker v19.03.8
2020-03-30 14:45:10 -04:00
Mahmood Ali
a57fb198e4 tests: attempt to deflake TestDockerDriver_PidsLimit
This is an attemp to deflake TestDockerDriver_PidsLimit by having one
more process and ensuring they run for longer.
2020-03-30 07:06:52 -04:00
Mahmood Ali
4667440572 Resolve docker types conflict
Looks like the latest `github.com/docker/docker/registry.ResolveAuthConfig` expect
`github.com/docker/docker/api/types.AuthConfig` rather than
`github.com/docker/cli/cli/config/types.AuthConfig`. The two types are
identical but live in different packages.

Here, we embed `registry.ResolveAuthConfig` from upstream repo, but with
the signature we need.
2020-03-28 17:29:06 +01:00
Yoan Blanc
ace5ab113e docker: v19.03.8
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-28 17:29:04 +01:00
Mahmood Ali
2aa49252ae Merge pull request #7257 from bbckr/avoid-resolving-dot-in-named-pipe
Avoid resolving dotted segments when host path for volume is named pipe
2020-03-26 16:59:29 -04:00
Yoan Blanc
270100b89e fixup! docker: drain fingerprint timer
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-26 16:02:20 +01:00
Yoan Blanc
1075021262 docker: drain fingerprint timer
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-26 16:00:53 +01:00
Mahmood Ali
4822c66d2b Revert "vendor: fsouza/go-docker-client v1.6.3" 2020-03-23 10:48:47 -04:00
Yoan Blanc
39076e0f6b docker: disable swap in Windows only
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-23 08:35:09 +01:00
Yoan Blanc
460c09f009 fixup! fixup! vendor: fsouza/go-docker-client v1.6.3
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-22 10:04:52 +01:00
Yoan Blanc
2770a0f316 vendor: fsouza/go-docker-client v1.6.3
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-22 09:25:46 +01:00
Mahmood Ali
ac53c110a6 Merge pull request #7236 from hashicorp/b-remove-rkt
Remove rkt as a built-in driver
2020-03-17 09:07:35 -04:00
Yoan Blanc
9032cbe304 docker: v18.09.9
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-15 08:23:11 +01:00
bckr
d9cf32657e Remove argument passing runtime GOOS 2020-03-03 15:39:43 +01:00
bckr
1549ed827a Fix too many arguments 2020-03-03 15:38:38 +01:00
Mahmood Ali
d2ddef5ba3 update grpc
Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.
2020-03-03 08:39:54 -05:00
bckr
2a2bb0430e Avoid resolving dotted segments when host path for volume is named pipe 2020-03-03 14:00:19 +01:00
Mahmood Ali
e265e4c7b2 Remove rkt as a built-in driver
Rkt has been archived and is no longer an active project:
* https://github.com/rkt/rkt
* https://github.com/rkt/rkt/issues/4024

The rkt driver will continue to live as an external plugin.
2020-02-26 22:16:41 -05:00
Thomas Lefebvre
5a017acd0b client: support no_pivot_root in exec driver configuration 2020-02-18 09:27:16 -08:00
Mahmood Ali
87c0c92ac7 Pass stats interval colleciton to executor
This fixes a bug where executor based drivers emit stats every second,
regardless of user configuration.

When serializing the Stats request across grpc, the nomad agent dropped
the Interval value, and then executor uses 1s as a default value.
2020-01-31 14:17:15 -05:00
John Schlederer
81592734b5 Making pull activity timeout configurable in Docker
* Making pull activity timeout configurable in Docker plugin config, first pass

* Fixing broken function call

* Fixing broken tests

* Fixing linter suggestion

* Adding documentation on new parameter in Docker plugin config

* Adding unit test

* Setting min value for pull_activity_timeout, making pull activity duration a private var
2019-12-18 12:58:53 +01:00
Mahmood Ali
20f8227c0a Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob
driver: allow disabling log collection
2019-12-13 11:41:20 -05:00
Mahmood Ali
e82dad732b address review comments 2019-12-13 11:21:00 -05:00
Mahmood Ali
f794b49ec6 simplify cgroup path lookup 2019-12-11 12:43:25 -05:00
Mahmood Ali
596d0be5d8 executor: stop joining executor to container cgroup
Stop joining libcontainer executor process into the newly created task
container cgroup, to ensure that the cgroups are fully destroyed on
shutdown, and to make it consistent with other plugin processes.

Previously, executor process is added to the container cgroup so the
executor process resources get aggregated along with user processes in
our metric aggregation.

However, adding executor process to container cgroup adds some
complications with much benefits:

First, it complicates cleanup.  We must ensure that the executor is
removed from container cgroup on shutdown.  Though, we had a bug where
we missed removing it from the systemd cgroup.  Because executor uses
`containerState.CgroupPaths` on launch, which includes systemd, but
`cgroups.GetAllSubsystems` which doesn't.

Second, it may have advese side-effects.  When a user process is cpu
bound or uses too much memory, executor should remain functioning
without risk of being killed (by OOM killer) or throttled.

Third, it is inconsistent with other drivers and plugins.  Logmon and
DockerLogger processes aren't in the task cgroups.  Neither are
containerd processes, though it is equivalent to executor in
responsibility.

Fourth, in my experience when executor process moves cgroup while it's
running, the cgroup aggregation is odd.  The cgroup
`memory.usage_in_bytes` doesn't seem to capture the full memory usage of
the executor process and becomes a red-harring when investigating memory
issues.

For all the reasons above, I opted to have executor remain in nomad
agent cgroup and we can revisit this when we have a better story for
plugin process cgroup management.
2019-12-11 11:28:09 -05:00