Commit Graph

61 Commits

Author SHA1 Message Date
Alex Dadgar
6bb99c93d0 Review comments 2019-01-07 14:50:28 -08:00
Alex Dadgar
144866a87b Mock driver has recovery, stats 2019-01-07 14:49:40 -08:00
Alex Dadgar
b300306c4a comments 2019-01-07 14:49:40 -08:00
Alex Dadgar
437f03d877 recover 2019-01-07 14:49:40 -08:00
Mahmood Ali
17b7490891 taskrunner: emit TaskReceived event
Preserve pre-0.9, where task runner emits `Received: Task received by
client` event on task runner creation.
2019-01-04 14:32:29 -05:00
Alex Dadgar
99df4c98c7 Store device envs separately and pass to drivers 2018-12-19 14:23:09 -08:00
Nick Ethier
6951ca487d drivermanager: use allocID and task name to route task events 2018-12-18 23:01:51 -05:00
Nick Ethier
331793e283 client: batch initial fingerprinting in plugin manangers
drivermanager: fix pr comments/feedback
2018-12-18 22:56:19 -05:00
Nick Ethier
39ca1b00dd client/drivermananger: add driver manager
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.

Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
2018-12-18 22:55:18 -05:00
Alex Dadgar
517bf1c35f Fix unit tests + upgrade pathing resources 2018-12-18 15:50:44 -08:00
Alex Dadgar
d5512c39f0 Lint 2018-12-18 15:50:44 -08:00
Alex Dadgar
7a0b73341a LinuxResources doesn't use task.Resources 2018-12-18 15:50:44 -08:00
Alex Dadgar
cd6879409c Drivers 2018-12-18 15:50:11 -08:00
Alex Dadgar
da6925bfc1 utilities 2018-12-18 15:48:52 -08:00
Danielle Tomlinson
b92bc1178d taskrunner: Use a random suffix for Task Config
The RestartCount is not really suitable for use as a source of
uniqueness within task invocations as it is not monotonic, and interacts
with the restart stanza in a users config, so conflates restarts due to
task failures, with restarts due to enviromental changes, such as consul
template or vault secrets changing.

Here we instead use a substring from a uuid, which is more random than
we strictly need, but is nicer than rolling our own random string
generator here.
2018-12-19 00:38:54 +01:00
Alex Dadgar
8b624340ad Fix various bugs with task events
Fixes the following:
* Emitting events when the task fails to start
* Don't double emit events on task shutdown (nomad stop)
* Don't emit a OOM kill metric unless actually OOM'd
2018-12-05 14:27:07 -08:00
Danielle Tomlinson
03db4cf82d client: Rename drivers/shared/env => client/taskenv 2018-11-30 12:18:39 +01:00
Danielle Tomlinson
6756ffd052 drivers: Move client/drivers/env to drivers/shared/env
As part of deprecating legacy drivers, we're moving the env package to a
new drivers/shared tree, as it is used by the modern docker and rkt
driver packages, and is useful for 3rd party plugins.
2018-11-30 10:46:13 +01:00
Alex Dadgar
429c5bb885 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Chris Baker
e3108507c5 Merge pull request #4891 from hashicorp/b-1150-rkt-volume-names
drivers/rkt: fix invalid volumes
2018-11-27 18:55:00 -05:00
Danielle Tomlinson
d361015562 Merge pull request #4909 from hashicorp/b-restart-delay
taskrunner: Return the restart delay correctly
2018-11-27 23:55:54 +01:00
Michael Schurter
5d6d4bf290 Merge pull request #4883 from hashicorp/f-graceful-shutdown
Support graceful shutdowns in agent
2018-11-27 15:55:15 -06:00
Michael Schurter
021c0cc4bf client: document how AR/TR Run methods behave 2018-11-26 12:50:35 -08:00
Chris Baker
790fe0b1db modified TaskConfig to include AllocID
use this for volume names in drivers/rkt to address #1150
2018-11-26 18:54:26 +00:00
Danielle Tomlinson
d73c2f3c8d taskrunner: Return the restart delay correctly
We were incorrectly returning a 0 duration to the taskrunner when
determining when a task should restart. This would cause tasks to be
restarted immediately, ignoring the restart {} stanza in a users
configuration.

This commit causes us to return the restart duration to the task runner
so it may correctly delay further execution.
2018-11-20 21:52:23 +01:00
Nick Ethier
6dfa2309c4 task_runner: use NodeResources instead of deprecated struct 2018-11-20 13:46:39 -05:00
Nick Ethier
4dfcc80a0b task_runner: use task and alloc copies instead of referencing the original pointer 2018-11-20 13:34:46 -05:00
Nick Ethier
cf69895d31 task_runner: emit event on task exit with exit result details 2018-11-19 22:59:17 -05:00
Nick Ethier
3601e4241d plugins/driver: remove NodeResources from task Resources and use PercentTicks field for docker driver 2018-11-19 22:59:17 -05:00
Nick Ethier
9c154d7135 drivers: added NodeResources to drivers.TaskConfig 2018-11-19 22:59:16 -05:00
Nick Ethier
98b295d617 docker: started work on porting docker driver to new plugin framework 2018-11-19 22:59:15 -05:00
Michael Schurter
31f113ba4d client: support graceful shutdowns
Client.Shutdown now blocks until all AllocRunners and TaskRunners have
exited their Run loops. Tasks are left running.
2018-11-19 16:39:30 -08:00
Mahmood Ali
6b7953af21 Merge pull request #4884 from hashicorp/f-alloc-devices-cli
Report alloc device statistics in API and CLI
2018-11-16 18:04:54 -05:00
Mahmood Ali
fd49039f09 address review comments 2018-11-16 17:13:01 -05:00
Mahmood Ali
58cbafe913 Populate alloc stats API with device stats
This change makes few compromises:

* Looks up the devices associated with tasks at look up time.  Given
that `nomad alloc status` is called rarely generally (compared to stats
telemetry and general job reporting), it seems fine.  However, the
lookup overhead grows bounded by number of `tasks x total-host-devices`,
which can be significant.

* `client.Client` performs the task devices->statistics lookup.  It
passes self to alloc/task runners so they can look up the device statistics
allocated to them.
  * Currently alloc/task runners are responsible for constructing the
entire RPC response for stats
  * The alternatives for making task runners device statistics aware
don't seem appealing (e.g. having task runners contain reference to hostStats)

* On the alloc aggregation resource usage, I did a naive merging of task device statistics.
  * Personally, I question the value of such aggregation, compared to
costs of struct duplication and bloating the response - but opted to be
consistent in the API.
  * With naive concatination, device instances from a single device group used by separate tasks in the alloc, would be aggregated in two separate device group statistics.
2018-11-16 10:26:32 -05:00
Michael Schurter
a552a0c42c client/tr: add a bit of context to envbuilder errors 2018-11-15 16:26:25 -08:00
Michael Schurter
b878bbf3f7 client: add new nested variables to task's hcl ctx
The error messages are really bad, but it's extremely difficult to
produce good error messages without the original HCL.
2018-11-15 16:26:25 -08:00
Michael Schurter
43b359914b client: interpolate driver configurations
Also add missing SetDriverNetwork calls.
2018-11-15 16:25:57 -08:00
Michael Schurter
740ca8e6ca client: fix tr lifecycle logic and shutdown delay
ShutdownDelay must be honored whenever the task is killed or restarted.
Services were not being deregistered prior to restarting.
2018-11-05 12:32:05 -08:00
Michael Schurter
fdbe446ea6 client: first pass at implementing task restoring
Task restoring works but dead tasks may be restarted
2018-11-05 12:32:05 -08:00
Nick Ethier
da7563b8c3 Merge pull request #4795 from hashicorp/f-plugin-config
Pass client configuration to plugins through loader
2018-10-29 18:42:27 -07:00
Michael Schurter
d71e7666bd ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Michael Schurter
2aed3e8527 ar: refactor task killing into 1 method
Update comments and address some PR comments from #4775
2018-10-17 10:06:59 -07:00
Michael Schurter
2417ec5621 ar: fix task leader, update, and stop handling 2018-10-17 10:06:59 -07:00
Nick Ethier
3244a4cc57 plumb NomadConfig into plugins 2018-10-16 22:47:22 -04:00
Michael Schurter
cf42289c8b fix linter errors 2018-10-16 16:56:57 -07:00
Michael Schurter
e026d6e80a tr: remove unused DriverHandle interface
was causing typed nil interface panics and served no purpose
2018-10-16 16:56:56 -07:00
Michael Schurter
2256917936 Port client portion of #4392 to new taskrunner
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the
client-specific portions must be ported from master to arv2.
2018-10-16 16:56:56 -07:00
Nick Ethier
44cc52a0d4 client: simplify driver plugin logic from review comments 2018-10-16 16:56:56 -07:00
Nick Ethier
4f9522dd54 client: review comments and fixup/skip tests 2018-10-16 16:56:56 -07:00