Commit Graph

52 Commits

Author SHA1 Message Date
Mahmood Ali
011315ba4c logging.Type over logging.Driver 2019-02-28 16:40:18 -05:00
Mahmood Ali
314d7a0f41 drivers/docker: rename logging type to driver
Docker uses the term logging `driver` in its public documentations: in
`docker` daemon config[1], `docker run` arguments [2] and in docker compose file[3].
Interestingly, docker used `type` in its API [4] instead of everywhere
else.

It's unfortunate that Nomad used `type` modeling after the Docker API
rather than the user facing documents.  Nomad using `type` feels very
non-user friendly as it's disconnected from how Docker markets the flag
and shows internal representation instead.

Here, we rectify the situation by introducing `driver` field and
prefering it over `type` in logging.

[1] https://docs.docker.com/config/containers/logging/configure/
[2] https://docs.docker.com/engine/reference/run/#logging-drivers---log-driver
[3] https://docs.docker.com/compose/compose-file/#logging
[4] https://docs.docker.com/engine/api/v1.39/#operation/ContainerCreate
2019-02-28 16:04:03 -05:00
Danielle Tomlinson
2657bf02c0 Merge pull request #5355 from hashicorp/dani/windows-dockerstats
docker: Support Stats on Windows
2019-02-26 16:39:48 +01:00
Danielle Tomlinson
6c774e7b46 docker: Return undetected before first detection
This commit causes the docker driver to return undetected before it
first establishes a connection to the docker daemon.

This fixes a bug where hosts without docker installed would return as
unhealthy, rather than undetected.
2019-02-25 11:02:42 +01:00
Danielle Tomlinson
6624d3667b docker: Support stats on Windows 2019-02-22 14:19:58 +01:00
Danielle Tomlinson
df57099e6f docker: Avoid leaking containers during Reattach
Currently if a docker_logger cannot be reattached to, we will leak the
container that was being used. This is problematic if e.g using static
ports as it means you can never recover your task, or if a service is
expensive to run and will then be running without supervision.
2019-02-20 17:47:06 +01:00
Danielle Tomlinson
274a3485b2 docker: Respawn docker logger during recovery
Sometimes the nomad docker_logger may be killed by a service manager
when restarting the client for upgrades or reliability reasons.

Currently if this happens, we leak the users container and try to
reschedule over it.

This commit adds a new step to the recovery process that will spawn a
new docker logger process that will fetch logs from _the current
timestamp_. This is to avoid restarting users tasks because our logging
sidecar has failed.
2019-02-20 17:12:56 +01:00
Danielle Tomlinson
e2244cd0d4 drivers/docker: SIGTERM to stop containers
Windows Docker daemon does not support SIGINT, SIGTERM is the semantic
equivalent that allows for graceful shutdown before being followed up by
a SIGKILL.
2019-02-14 15:38:54 +00:00
Nick Ethier
bed9efae44 Merge branch 'master' into f-driver-upgradepath-test
* master: (23 commits)
  tests: avoid assertion in goroutine
  spell check
  ci: run checkscripts
  tests: deflake TestRktDriver_StartWaitRecoverWaitStop
  drivers/rkt: Remove unused github.com/rkt/rkt
  drivers/rkt: allow development on non-linux
  cli: Hide `nomad docker_logger` from help output
  api: test api and structs are in sync
  goimports until make check is happy
  nil check node resources to prevent panic
  tr: use context in as select statement
  move pluginutils -> helper/pluginutils
  vet
  goimports
  gofmt
  Split hclspec
  move hclutils
  Driver tests do not use hcl2/hcl, hclspec, or hclutils
  move reattach config
  loader and singleton
  ...
2019-01-23 21:01:24 -05:00
Nick Ethier
a9060f44eb drivers: add docker upgrade path and e2e test 2019-01-23 14:44:42 -05:00
Alex Dadgar
2d23f4a038 move reattach config 2019-01-22 15:11:58 -08:00
Nick Ethier
994c66f7d7 drivers: use consts for task handle version 2019-01-18 18:31:01 -05:00
Nick Ethier
2f91ac88f7 cleanup code comments and small fixes from refactor 2019-01-18 18:31:01 -05:00
Mahmood Ali
9f7619344e Merge pull request #5190 from hashicorp/f-memory-usage
Track Basic Memory Usage as reported by cgroups
2019-01-18 16:46:02 -05:00
Preetha Appan
3e16b52361 clean up read access 2019-01-16 11:04:11 -06:00
Preetha Appan
80c00fc268 Refactor logging in drivers to use a tri-state boolean
Changes logging warnings/errors only if the state changes
from healthy to unhealthy
2019-01-16 10:19:31 -06:00
Preetha Appan
35ab26658c Make docker driver logging less redundant 2019-01-16 10:16:57 -06:00
oleksii.shyman
cc98f282d4 Add support for docker runtimes
- docker fingerprint issues a docker api system info call to get the
  list of supported OCI runtimes.
  - OCI runtimes are reported as comma separated list of names
  - docker driver is aware of GPU runtime presence
  - docker driver throws an error when user tries to run container with
  GPU, when GPU runtime is not present
  - docker GPU runtime name is configurable
2019-01-15 11:34:47 -08:00
Danielle Tomlinson
f9a4594095 docker: Terminate dockerlogger
Previously, we did not attempt to stop Docker Logger processes until
DestroyTask, which means that under many circumstances, we will never
successfully close the plugin client.

This commit terminates the plugin process when `run` terminates, or when
`DestroyTask` is called.

Steps to repro:

```
$ nomad agent -dev
$ nomad init
$ nomad run example.nomad
$ nomad stop example
$ ps aux | grep nomad # See docker logger process running
$ signal the dev agent
$ ps aux | grep nomad # See docker logger process running
```
2019-01-15 14:58:05 +01:00
Mahmood Ali
b5c20aa50b Track Basic Memory Usage as reported by cgroups
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends.  This number is closer
what Docker reports.

Related to https://github.com/hashicorp/nomad/issues/5165 .
2019-01-14 18:47:52 -05:00
Nick Ethier
fbf9a4c772 executor: implement streaming stats API
plugins/driver: update driver interface to support streaming stats

client/tr: use streaming stats api

TODO:
 * how to handle errors and closed channel during stats streaming
 * prevent tight loop if Stats(ctx) returns an error

drivers: update drivers TaskStats RPC to handle streaming results

executor: better error handling in stats rpc

docker: better control and error handling of stats rpc

driver: allow stats to return a recoverable error
2019-01-12 12:18:22 -05:00
Mahmood Ali
800a3522e3 drivers: re-export ResourceUsage structs
Re-export the ResourceUsage structs in drivers package to avoid drivers
directly depending on the internal client/structs package directly.

I attempted moving the structs to drivers, but that caused some import
cycles that was a bit hard to disentagle.  Alternatively, I added an
alias here that's sufficient for our purposes of avoiding external
drivers depend on internal packages, while allowing us to restructure
packages in future without breaking source compatibility.
2019-01-08 09:11:47 -05:00
Mahmood Ali
c0162fab35 move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Danielle Tomlinson
476e44b4e4 drivers: Implement InternalPluginDriver interface
This implements the InternalPluginDriver interface in each driver, and
calls the cancellation fn for their respective eventers.

This fixes a per task goroutine leak during test suite execution.
2019-01-08 13:49:31 +01:00
Nick Ethier
6951ca487d drivermanager: use allocID and task name to route task events 2018-12-18 23:01:51 -05:00
Danielle Tomlinson
ad4bac8d77 docker: Delete Task on Destroy
Currently the docker driver does not remove tasks from its state map
when destroying the task, which leads to issues when restarting tasks in
place, and leaks expired handles over time.
2018-12-18 15:53:31 +01:00
Mahmood Ali
1678a8499b drivers/docker: enforce volumes.enabled (#4983)
When volumes.enable flag is off in Docker driver, disable all mounts of
paths outside alloc dir.
2018-12-11 14:22:50 -05:00
Mahmood Ali
04af097008 driver/docker: honor plugin devices 2018-12-04 21:31:28 -05:00
Mahmood Ali
65e9b3b8be refactor device manipulation 2018-12-04 20:55:59 -05:00
Danielle Tomlinson
b8b4ce2fa6 Merge pull request #4936 from hashicorp/f-legacy-refactor
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Mahmood Ali
ef424132d0 Merge pull request #4926 from hashicorp/f-docker-image-ref
Use user provided image name to launch container
2018-11-30 07:27:39 -05:00
Danielle Tomlinson
03db4cf82d client: Rename drivers/shared/env => client/taskenv 2018-11-30 12:18:39 +01:00
Danielle Tomlinson
6756ffd052 drivers: Move client/drivers/env to drivers/shared/env
As part of deprecating legacy drivers, we're moving the env package to a
new drivers/shared tree, as it is used by the modern docker and rkt
driver packages, and is useful for 3rd party plugins.
2018-11-30 10:46:13 +01:00
Mahmood Ali
2310600d91 Use user provided image name to launch container
This allows the container to be tagged with a user friendly image name
(e.g. `redis:3.2`) rather than the image ID (e.g.
`sha256:87856cc39862cec77541d68382e4867d7ccb29a85a17221446c857ddaebca916`).

Useful for human debugging, as well as some debugging and image scanning
tools.

This risks two bad changes:
1. Discrepancy in image resolution between docker and Nomad's image
loader.
  * I checked the image creation paths in Nomad, and noticed that we
either pulled the image or inspect the image with the user provided
name.

2. A race in image tagging where the tag is modified between image
loading and container creation.
  * I, personally, don't think this case is cause for concern, as it is
analogous to the task running a bit later.  As long as the image is
still present, creating the container should be good.
2018-11-27 16:12:15 -05:00
Mahmood Ali
9c89ea4e08 Support docker bind mounts 2018-11-27 07:20:17 -05:00
Mahmood Ali
2aa034e174 Merge pull request #4908 from hashicorp/f-docker-opts-storageopt
Add support for docker storage options
2018-11-20 21:08:27 -05:00
Nick Ethier
fc53f5f635 docker: sync access to exit result within a handle 2018-11-20 20:41:32 -05:00
Michael Schurter
ace09b3a84 Apply suggestions from code review
Co-Authored-By: nickethier <ncethier@gmail.com>
2018-11-20 20:33:31 -05:00
Mahmood Ali
2c7bea7190 Add support for storage opt 2018-11-20 16:11:02 -05:00
Nick Ethier
249dbfffd2 docker: move config RPCs to config.go 2018-11-19 22:59:18 -05:00
Nick Ethier
0462c8a7f8 docker: remove container pointer from task handle 2018-11-19 22:59:18 -05:00
Nick Ethier
d7631e3b23 docker: move volume driver options to seperate block 2018-11-19 22:59:18 -05:00
Nick Ethier
577d4a2ea8 docker: group common config into blocks 2018-11-19 22:59:17 -05:00
Michael Schurter
455e75492c Apply suggestions from code review
Co-Authored-By: nickethier <ncethier@gmail.com>
2018-11-19 22:59:17 -05:00
Nick Ethier
3468880f50 docker: remove global pull coordinator 2018-11-19 22:59:17 -05:00
Nick Ethier
0c62b9adf4 docker: moved fingerprint code to it's own file 2018-11-19 22:59:17 -05:00
Nick Ethier
3601e4241d plugins/driver: remove NodeResources from task Resources and use PercentTicks field for docker driver 2018-11-19 22:59:17 -05:00
Nick Ethier
37ed75502e docker: move recoverable error proto to shared structs 2018-11-19 22:59:16 -05:00
Nick Ethier
396f6ab1fb docker: implement recover task logic 2018-11-19 22:59:16 -05:00
Nick Ethier
902eb9475c docker: finished porting tests 2018-11-19 22:59:16 -05:00