Commit Graph

88 Commits

Author SHA1 Message Date
Alex Dadgar
20189c2f7d Merge pull request #4969 from hashicorp/f-alloc-hooks
Make alloc health watcher a postrun hook rather than shutdown hook
2018-12-12 14:34:36 -08:00
Danielle Tomlinson
419743f165 allocrunner: Test alloc runners should include a noop migrator 2018-12-11 13:12:35 +01:00
Danielle Tomlinson
d44d4b57de client: Unify handling of previous and preempted allocs 2018-12-11 13:12:35 +01:00
Danielle Tomlinson
a4cf83d00c client: Wait for preempted allocs to terminate
When starting an allocation that is preempting other allocs, we create a
new group allocation watcher, and then wait for the allocations to
terminate in the allocation PreRun hooks.

If there's no preempted allocations, then we simply provide a
NoopAllocWatcher.
2018-12-11 00:59:18 +01:00
Alex Dadgar
75ac7c4be0 Make alloc health watcher a postrun hook rather than shutdown hook 2018-12-06 12:30:31 -08:00
Danielle Tomlinson
62ac40ab09 allocrunner: Basic test alloc runner 2018-12-06 12:28:23 +01:00
Alex Dadgar
8b624340ad Fix various bugs with task events
Fixes the following:
* Emitting events when the task fails to start
* Don't double emit events on task shutdown (nomad stop)
* Don't emit a OOM kill metric unless actually OOM'd
2018-12-05 14:27:07 -08:00
Danielle Tomlinson
03db4cf82d client: Rename drivers/shared/env => client/taskenv 2018-11-30 12:18:39 +01:00
Danielle Tomlinson
756325bcbd client: Merge driver/shared/structs and client/structs 2018-11-30 10:56:45 +01:00
Danielle Tomlinson
d4ef3b68d9 driver: Flatten SetEnvvars into taskdirhook 2018-11-30 10:46:13 +01:00
Danielle Tomlinson
bacd6175f5 client: Migrate DriverStats optout to drivers/shared/structs 2018-11-30 10:46:13 +01:00
Danielle Tomlinson
6756ffd052 drivers: Move client/drivers/env to drivers/shared/env
As part of deprecating legacy drivers, we're moving the env package to a
new drivers/shared tree, as it is used by the modern docker and rkt
driver packages, and is useful for 3rd party plugins.
2018-11-30 10:46:13 +01:00
Michael Schurter
b959d9831c add nil check around task resources in device hook
Looking at NewTaskRunner I'm unsure whether TaskRunner.TaskResources
(from which req.TaskResources is set) is intended to be nil at times or
if the TODO in NewTaskRunner is intended to ensure it is always non-nil.
2018-11-27 17:25:33 -08:00
Michael Schurter
c7b4ee1546 assume that slices contain only non-nil items 2018-11-27 17:25:33 -08:00
Michael Schurter
a13607f2d9 client: properly support hook env vars
The old approach was incomplete. Hook env vars are now:

 * persisted and restored between agent restarts
 * deterministic (LWW if 2 hooks set the same key)
2018-11-27 17:25:33 -08:00
Alex Dadgar
429c5bb885 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Michael Schurter
e660046f06 Merge pull request #4896 from hashicorp/b-prevalloc-deadlock
Fix deadlock in previous alloc watcher by emitting last alloc update
2018-11-27 19:07:16 -06:00
Michael Schurter
3a61c2bb37 fix test breakage caused by rebase 2018-11-27 16:34:01 -08:00
Chris Baker
e3108507c5 Merge pull request #4891 from hashicorp/b-1150-rkt-volume-names
drivers/rkt: fix invalid volumes
2018-11-27 18:55:00 -05:00
Danielle Tomlinson
d361015562 Merge pull request #4909 from hashicorp/b-restart-delay
taskrunner: Return the restart delay correctly
2018-11-27 23:55:54 +01:00
Michael Schurter
6d49163b12 client: emit last sent alloc to new listeners
Fixes a deadlock where the allocwatcher would block forever waiting for
an update from a terminal alloc.

Made the broadcaster easier to debug as well.
2018-11-27 14:06:08 -08:00
Michael Schurter
5d6d4bf290 Merge pull request #4883 from hashicorp/f-graceful-shutdown
Support graceful shutdowns in agent
2018-11-27 15:55:15 -06:00
Michael Schurter
134c04744e client/ar: remove useless wait ch from runTasks
Arguably this makes task.WaitCh() useless, but I think exposing a wait
chan from TaskRunners is a generically useful API.
2018-11-26 12:51:18 -08:00
Michael Schurter
021c0cc4bf client: document how AR/TR Run methods behave 2018-11-26 12:50:35 -08:00
Chris Baker
790fe0b1db modified TaskConfig to include AllocID
use this for volume names in drivers/rkt to address #1150
2018-11-26 18:54:26 +00:00
Danielle Tomlinson
d73c2f3c8d taskrunner: Return the restart delay correctly
We were incorrectly returning a 0 duration to the taskrunner when
determining when a task should restart. This would cause tasks to be
restarted immediately, ignoring the restart {} stanza in a users
configuration.

This commit causes us to return the restart duration to the task runner
so it may correctly delay further execution.
2018-11-20 21:52:23 +01:00
Nick Ethier
6dfa2309c4 task_runner: use NodeResources instead of deprecated struct 2018-11-20 13:46:39 -05:00
Nick Ethier
4dfcc80a0b task_runner: use task and alloc copies instead of referencing the original pointer 2018-11-20 13:34:46 -05:00
Nick Ethier
cf69895d31 task_runner: emit event on task exit with exit result details 2018-11-19 22:59:17 -05:00
Nick Ethier
3601e4241d plugins/driver: remove NodeResources from task Resources and use PercentTicks field for docker driver 2018-11-19 22:59:17 -05:00
Nick Ethier
9c154d7135 drivers: added NodeResources to drivers.TaskConfig 2018-11-19 22:59:16 -05:00
Nick Ethier
98b295d617 docker: started work on porting docker driver to new plugin framework 2018-11-19 22:59:15 -05:00
Michael Schurter
31f113ba4d client: support graceful shutdowns
Client.Shutdown now blocks until all AllocRunners and TaskRunners have
exited their Run loops. Tasks are left running.
2018-11-19 16:39:30 -08:00
Mahmood Ali
6b7953af21 Merge pull request #4884 from hashicorp/f-alloc-devices-cli
Report alloc device statistics in API and CLI
2018-11-16 18:04:54 -05:00
Mahmood Ali
fd49039f09 address review comments 2018-11-16 17:13:01 -05:00
Mahmood Ali
58cbafe913 Populate alloc stats API with device stats
This change makes few compromises:

* Looks up the devices associated with tasks at look up time.  Given
that `nomad alloc status` is called rarely generally (compared to stats
telemetry and general job reporting), it seems fine.  However, the
lookup overhead grows bounded by number of `tasks x total-host-devices`,
which can be significant.

* `client.Client` performs the task devices->statistics lookup.  It
passes self to alloc/task runners so they can look up the device statistics
allocated to them.
  * Currently alloc/task runners are responsible for constructing the
entire RPC response for stats
  * The alternatives for making task runners device statistics aware
don't seem appealing (e.g. having task runners contain reference to hostStats)

* On the alloc aggregation resource usage, I did a naive merging of task device statistics.
  * Personally, I question the value of such aggregation, compared to
costs of struct duplication and bloating the response - but opted to be
consistent in the API.
  * With naive concatination, device instances from a single device group used by separate tasks in the alloc, would be aggregated in two separate device group statistics.
2018-11-16 10:26:32 -05:00
Michael Schurter
9189173649 tests: fix tests post-rebase 2018-11-15 17:40:56 -08:00
Michael Schurter
a552a0c42c client/tr: add a bit of context to envbuilder errors 2018-11-15 16:26:25 -08:00
Michael Schurter
2283c8fff3 client: remove old proxy references from comments 2018-11-15 16:26:25 -08:00
Michael Schurter
b878bbf3f7 client: add new nested variables to task's hcl ctx
The error messages are really bad, but it's extremely difficult to
produce good error messages without the original HCL.
2018-11-15 16:26:25 -08:00
Michael Schurter
43b359914b client: interpolate driver configurations
Also add missing SetDriverNetwork calls.
2018-11-15 16:25:57 -08:00
Mahmood Ali
5c906aa085 convert all config durations to strings in tests 2018-11-13 10:21:40 -05:00
Michael Schurter
e58a91b701 client: update alloc status when terminating
Defensively update alloc status whenever killing all tasks.
2018-11-05 15:11:10 -08:00
Michael Schurter
a22205cd8f client: block on context as well as waitCh
For lifecycle operations such as Restart and Kill, the client should not
expect driver plugins to be well behaved and close their waitCh on
context cancelation. Always wait on the passed in context as well as the
waitCh.
2018-11-05 12:32:05 -08:00
Michael Schurter
740ca8e6ca client: fix tr lifecycle logic and shutdown delay
ShutdownDelay must be honored whenever the task is killed or restarted.
Services were not being deregistered prior to restarting.
2018-11-05 12:32:05 -08:00
Michael Schurter
d5c8e5bd26 client: fix ar and tr tests 2018-11-05 12:32:05 -08:00
Michael Schurter
9b82025608 client: do not run terminal allocs 2018-11-05 12:32:05 -08:00
Michael Schurter
fdbe446ea6 client: first pass at implementing task restoring
Task restoring works but dead tasks may be restarted
2018-11-05 12:32:05 -08:00
Nick Ethier
da7563b8c3 Merge pull request #4795 from hashicorp/f-plugin-config
Pass client configuration to plugins through loader
2018-10-29 18:42:27 -07:00
Michael Schurter
d71e7666bd ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00