Commit Graph

103 Commits

Author SHA1 Message Date
Alex Dadgar
c7fc39d38d Merge pull request #5168 from hashicorp/b-kill-race
Improve Kill handling on task runner
2019-01-09 12:05:10 -08:00
Alex Dadgar
d916c0dd12 add more comments 2019-01-09 12:04:22 -08:00
Michael Schurter
e44d51f4d0 Spelling fix
Co-Authored-By: dadgar <alex@hashicorp.com>
2019-01-09 11:42:40 -08:00
Mahmood Ali
d1fbd735f3 Merge pull request #5157 from hashicorp/r-drivers-no-cstructs
drivers: avoid referencing client/structs package
2019-01-09 13:06:46 -05:00
Alex Dadgar
a5ba15591a Improve Kill handling on task runner
This PR improves how killing a task is handled. Before the kill function
directly orchestrated the killing and was only valid while the task was
running. The new behavior is to mark the desired state and wait for the
task runner to converge to that state.
2019-01-08 16:42:26 -08:00
Michael Schurter
1ae8261139 client: emit Killing/Killed task events
We were just emitting Killed/Terminated events before. In v0.8 we
emitted Killing/Killed, but lacked Terminated when explicitly stopping
a task. This change makes it so Terminated is always included, whether
explicitly stopping a task or it exiting on its own.

New output:

2019-01-04T14:58:51-08:00  Killed            Task successfully killed
2019-01-04T14:58:51-08:00  Terminated        Exit Code: 130, Signal: 2
2019-01-04T14:58:51-08:00  Killing           Sent interrupt
2019-01-04T14:58:51-08:00  Leader Task Dead  Leader Task in Group dead
2019-01-04T14:58:49-08:00  Started           Task started by client
2019-01-04T14:58:49-08:00  Task Setup        Building Task Directory
2019-01-04T14:58:49-08:00  Received          Task received by client

Old (v0.8.6) output:

2019-01-04T22:14:54Z  Killed            Task successfully killed
2019-01-04T22:14:54Z  Killing           Sent interrupt. Waiting 5s before force killing
2019-01-04T22:14:54Z  Leader Task Dead  Leader Task in Group dead
2019-01-04T22:14:53Z  Started           Task started by client
2019-01-04T22:14:53Z  Task Setup        Building Task Directory
2019-01-04T22:14:53Z  Received          Task received by client
2019-01-08 07:20:54 -08:00
Mahmood Ali
c0162fab35 move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Mahmood Ali
694e3010c2 use drivers.FSIsolation 2019-01-08 09:11:47 -05:00
Mahmood Ali
607e7f2dde remove always false parameter
Simplify allocDir.Build() function to avoid depending on client/structs,
and remove a parameter that's always set to `false`.

The motivation here is to avoid a dependency cycle between
drivers/cstructs and alloc_dir.
2019-01-08 09:11:47 -05:00
Alex Dadgar
6bb99c93d0 Review comments 2019-01-07 14:50:28 -08:00
Alex Dadgar
5424d3b540 vet 2019-01-07 14:49:41 -08:00
Alex Dadgar
19e67a0916 Test recovery 2019-01-07 14:49:41 -08:00
Alex Dadgar
144866a87b Mock driver has recovery, stats 2019-01-07 14:49:40 -08:00
Alex Dadgar
b300306c4a comments 2019-01-07 14:49:40 -08:00
Alex Dadgar
3257eb6d86 Fix hooks 2019-01-07 14:49:40 -08:00
Alex Dadgar
437f03d877 recover 2019-01-07 14:49:40 -08:00
Mahmood Ali
17b7490891 taskrunner: emit TaskReceived event
Preserve pre-0.9, where task runner emits `Received: Task received by
client` event on task runner creation.
2019-01-04 14:32:29 -05:00
Danielle Tomlinson
54dde24bcb taskrunner: Persist environment from hooks
https://github.com/hashicorp/nomad/pull/5032 introduced a regression
where the origHookState was used in place of the response from the hook.
2019-01-03 13:13:57 +01:00
Alex Dadgar
99df4c98c7 Store device envs separately and pass to drivers 2018-12-19 14:23:09 -08:00
Michael Schurter
784706a1e5 client/state: support upgrading from 0.8->0.9
Also persist and load DeploymentStatus to avoid rechecking health after
client restarts.
2018-12-19 10:39:27 -08:00
Michael Schurter
3a41fd7b31 tr: fix HookState Copy() and Equal() methods
They did not take into account the Env field.
2018-12-19 09:58:06 -08:00
Nick Ethier
6951ca487d drivermanager: use allocID and task name to route task events 2018-12-18 23:01:51 -05:00
Nick Ethier
331793e283 client: batch initial fingerprinting in plugin manangers
drivermanager: fix pr comments/feedback
2018-12-18 22:56:19 -05:00
Nick Ethier
2f010a2f25 client/drivermananger: fixup issues from rebase and address PR comments 2018-12-18 22:55:38 -05:00
Nick Ethier
32aaedd6b7 tr: deregister task handler on cleanup 2018-12-18 22:55:38 -05:00
Nick Ethier
39ca1b00dd client/drivermananger: add driver manager
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.

Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
2018-12-18 22:55:18 -05:00
Alex Dadgar
517bf1c35f Fix unit tests + upgrade pathing resources 2018-12-18 15:50:44 -08:00
Alex Dadgar
d5512c39f0 Lint 2018-12-18 15:50:44 -08:00
Alex Dadgar
7a0b73341a LinuxResources doesn't use task.Resources 2018-12-18 15:50:44 -08:00
Alex Dadgar
cd6879409c Drivers 2018-12-18 15:50:11 -08:00
Alex Dadgar
da6925bfc1 utilities 2018-12-18 15:48:52 -08:00
Danielle Tomlinson
b92bc1178d taskrunner: Use a random suffix for Task Config
The RestartCount is not really suitable for use as a source of
uniqueness within task invocations as it is not monotonic, and interacts
with the restart stanza in a users config, so conflates restarts due to
task failures, with restarts due to enviromental changes, such as consul
template or vault secrets changing.

Here we instead use a substring from a uuid, which is more random than
we strictly need, but is nicer than rolling our own random string
generator here.
2018-12-19 00:38:54 +01:00
Danielle Tomlinson
61a17621e3 taskrunner: Use hook errors for artifacts 2018-12-17 10:39:38 +01:00
Danielle Tomlinson
4d4201331c taskrunner: Emit task events when a hook fails 2018-12-13 18:20:18 +01:00
Alex Dadgar
8b624340ad Fix various bugs with task events
Fixes the following:
* Emitting events when the task fails to start
* Don't double emit events on task shutdown (nomad stop)
* Don't emit a OOM kill metric unless actually OOM'd
2018-12-05 14:27:07 -08:00
Danielle Tomlinson
03db4cf82d client: Rename drivers/shared/env => client/taskenv 2018-11-30 12:18:39 +01:00
Danielle Tomlinson
756325bcbd client: Merge driver/shared/structs and client/structs 2018-11-30 10:56:45 +01:00
Danielle Tomlinson
d4ef3b68d9 driver: Flatten SetEnvvars into taskdirhook 2018-11-30 10:46:13 +01:00
Danielle Tomlinson
bacd6175f5 client: Migrate DriverStats optout to drivers/shared/structs 2018-11-30 10:46:13 +01:00
Danielle Tomlinson
6756ffd052 drivers: Move client/drivers/env to drivers/shared/env
As part of deprecating legacy drivers, we're moving the env package to a
new drivers/shared tree, as it is used by the modern docker and rkt
driver packages, and is useful for 3rd party plugins.
2018-11-30 10:46:13 +01:00
Michael Schurter
b959d9831c add nil check around task resources in device hook
Looking at NewTaskRunner I'm unsure whether TaskRunner.TaskResources
(from which req.TaskResources is set) is intended to be nil at times or
if the TODO in NewTaskRunner is intended to ensure it is always non-nil.
2018-11-27 17:25:33 -08:00
Michael Schurter
c7b4ee1546 assume that slices contain only non-nil items 2018-11-27 17:25:33 -08:00
Michael Schurter
a13607f2d9 client: properly support hook env vars
The old approach was incomplete. Hook env vars are now:

 * persisted and restored between agent restarts
 * deterministic (LWW if 2 hooks set the same key)
2018-11-27 17:25:33 -08:00
Alex Dadgar
429c5bb885 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Chris Baker
e3108507c5 Merge pull request #4891 from hashicorp/b-1150-rkt-volume-names
drivers/rkt: fix invalid volumes
2018-11-27 18:55:00 -05:00
Danielle Tomlinson
d361015562 Merge pull request #4909 from hashicorp/b-restart-delay
taskrunner: Return the restart delay correctly
2018-11-27 23:55:54 +01:00
Michael Schurter
5d6d4bf290 Merge pull request #4883 from hashicorp/f-graceful-shutdown
Support graceful shutdowns in agent
2018-11-27 15:55:15 -06:00
Michael Schurter
021c0cc4bf client: document how AR/TR Run methods behave 2018-11-26 12:50:35 -08:00
Chris Baker
790fe0b1db modified TaskConfig to include AllocID
use this for volume names in drivers/rkt to address #1150
2018-11-26 18:54:26 +00:00
Danielle Tomlinson
d73c2f3c8d taskrunner: Return the restart delay correctly
We were incorrectly returning a 0 duration to the taskrunner when
determining when a task should restart. This would cause tasks to be
restarted immediately, ignoring the restart {} stanza in a users
configuration.

This commit causes us to return the restart duration to the task runner
so it may correctly delay further execution.
2018-11-20 21:52:23 +01:00