Commit Graph

3257 Commits

Author SHA1 Message Date
Alex Dadgar
3c0b073513 compile on windows 2018-10-16 16:56:56 -07:00
Alex Dadgar
7b7cb382dc more test fixes 2018-10-16 16:56:56 -07:00
Alex Dadgar
3a492bb33f allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00
Alex Dadgar
f91b269b2a fix test compiling 2018-10-16 16:56:55 -07:00
Alex Dadgar
31d49c72ab skip building deprecated files 2018-10-16 16:56:55 -07:00
Alex Dadgar
2e535aefcc move files around 2018-10-16 16:56:55 -07:00
Michael Schurter
37387bbf6f tests: fix missing logger caused by bad merge 2018-10-16 16:56:55 -07:00
Michael Schurter
3b8da3065e tr: properly comment handle fields 2018-10-16 16:56:55 -07:00
Michael Schurter
8e9289676b ar: AllocState should not mutate ar.state
If ar.state.TaskStates has not been set, set it on the copy of ar.state.
That keeps ar.state manipulations in one location and allows AllocState
to only acquire read-locks.
2018-10-16 16:56:55 -07:00
Michael Schurter
4d1a1ac5bb tests: test logs endpoint against pending task
Although the really exciting change is making WaitForRunning return the
allocations that it started. This should cut down test boilerplate
significantly.
2018-10-16 16:56:55 -07:00
Michael Schurter
01f057e35d tests: make a test client/config easier to generate
Sadly can't move the fingerprint timeout tweak into the helper due to
circular imports.
2018-10-16 16:56:55 -07:00
Michael Schurter
e495a0444b tests: ensure task state is initialized in NewAR
Also expose NoopDB for use in tests.
2018-10-16 16:56:55 -07:00
Michael Schurter
d29d613c02 client: expose task state to client
The interesting decision in this commit was to expose AR's state and not
a fully materialized Allocation struct. AR.clientAlloc builds an Alloc
that contains the task state, so I considered simply memoizing and
exposing that method.

However, that would lead to AR having two awkwardly similar methods:
 - Alloc() - which returns the server-sent alloc
 - ClientAlloc() - which returns the fully materialized client alloc

Since ClientAlloc() could be memoized it would be just as cheap to call
as Alloc(), so why not replace Alloc() entirely?

Replacing Alloc() entirely would require Update() to immediately
materialize the task states on server-sent Allocs as there may have been
local task state changes since the server received an Alloc update.

This quickly becomes difficult to reason about: should Update hooks use
the TaskStates? Are state changes caused by TR Update hooks immediately
reflected in the Alloc? Should AR persist its copy of the Alloc? If so,
are its TaskStates canonical or the TaskStates on TR?

So! Forget that. Let's separate the static Allocation from the dynamic
AR & TR state!

 - AR.Alloc() is for static Allocation access (often for the Job)
 - AR.AllocState() is for the dynamic AR & TR runtime state (deployment
   status, task states, etc).

If code needs to know the status of a task: AllocState()
If code needs to know the names of tasks: Alloc()

It should be very easy for a developer to reason about which method they
should call and what they can do with the return values.
2018-10-16 16:56:55 -07:00
Michael Schurter
737b1d82d2 client: add comment 2018-10-16 16:56:55 -07:00
Michael Schurter
99e2953e23 client: fix potentially dropped streaming errors 2018-10-16 16:56:55 -07:00
Michael Schurter
981acf3f95 tr: remove unneeded lock; chan synchronizes access 2018-10-16 16:56:55 -07:00
Michael Schurter
9f64add14c tr: fix shutdown/destroy/WaitResult handling
Multiple receivers raced for the WaitResult when killing tasks which
could lead to a deadlock if the "wrong" receiver won.

Wrap handlers in an ugly little proxy to avoid this. At first I wanted
to push this into drivers, but the result is tied to the TR's handle
lifecycle -- not the lifecycle of an alloc or task.
2018-10-16 16:56:55 -07:00
Michael Schurter
13f47aa521 client: do not inspect task state to follow logs
"Ask forgiveness, not permission."

Instead of peaking at TaskStates (which are no longer updated on the
AR.Alloc() view of the world) to only read logs for running tasks, just
try to read the logs and improve the error handling if they don't exist.

This should make log streaming less dependent on AR/TR behavior.

Also fixed a race where the log streamer could exit before reading an
error. This caused no logs or errors to be displayed sometimes when an
error occurred.
2018-10-16 16:56:55 -07:00
Michael Schurter
349b827d7d mock_driver: close waitCh after exiting
mock_driver wasn't behaving like other driver handles.
2018-10-16 16:56:55 -07:00
Michael Schurter
9394b989e5 client: fix accessing alloc runners
* GetClientAlloc() gains nothing from using allAllocs()
* getAllocatedResources was calling getAllocRunners() twice
2018-10-16 16:56:55 -07:00
Michael Schurter
95634ae60f tr: remove wip comments 2018-10-16 16:56:55 -07:00
Michael Schurter
1dcde75d30 ar: lock around accessing tasks
Specify that Alloc() does not return updated task states.
2018-10-16 16:56:55 -07:00
Alex Dadgar
e2553a13d4 Fix client reloading and pass the plugin loaders to server and client 2018-10-16 16:56:55 -07:00
Nick Ethier
fc16a5c527 plugin/drivers: plumb in stdout/stderr paths 2018-10-16 16:53:31 -07:00
Nick Ethier
c9f0d2e0b4 driver/raw_exec: port existing raw_exec tests and add some testing utilities 2018-10-16 16:53:31 -07:00
Nick Ethier
650ac5a83e driver/raw_exec: more tests and bug fixes
added wrapper struct for plugin.ReattachConfig to better handle serialization
2018-10-16 16:53:31 -07:00
Nick Ethier
e2bf0a388e clientv2: base driver plugin (#4671)
Driver plugin framework to facilitate development of driver plugins.

Implementing plugins only need to implement the DriverPlugin interface.
The framework proxies this interface to the go-plugin GRPC interface generated
from the driver.proto spec.

A testing harness is provided to allow implementing drivers to test the full
lifecycle of the driver plugin. An example use:

func TestMyDriver(t *testing.T) {
    harness := NewDriverHarness(t, &MyDiverPlugin{})
    // The harness implements the DriverPlugin interface and can be used as such
    taskHandle, err := harness.StartTask(...)
}
2018-10-16 16:53:31 -07:00
Michael Schurter
78c15dcaa5 tr: add comments and cleanup call signature
From review comments on #4649 left post-merge.
2018-10-16 16:53:31 -07:00
Nick Ethier
5b14d24bf4 executor v2 (#4656)
* client/executor: refactor client to remove interpolation

* executor: POC libcontainer based executor

* vendor: use hashicorp libcontainer fork

* vendor: add libcontainer/nsenter dep

* executor: updated executor interface to simplify operations

* executor: implement logging pipe

* logmon: new logmon plugin to manage task logs

* driver/executor: use logmon for log management

* executor: fix tests and windows build

* executor: fix logging key names

* executor: fix test failures

* executor: add config field to toggle between using libcontainer and standard executors

* logmon: use discover utility to discover nomad executable

* executor: only call libcontainer-shim on main in linux

* logmon: use seperate path configs for stdout/stderr fifos

* executor: windows fixes

* executor: created reusable pid stats collection utility that can be used in an executor

* executor: update fifo.Open calls

* executor: fix build

* remove executor from docker driver

* executor: Shutdown func to kill and cleanup executor and its children

* executor: move linux specific universal executor funcs to seperate file

* move logmon initialization to a task runner hook

* client: doc fixes and renaming from code review


* taskrunner: use shared config struct for logmon fifo fields

* taskrunner: logmon only needs to be started once per task
2018-10-16 16:53:31 -07:00
Michael Schurter
da8f053a0d tr: implement stats collection hook
Tested except for the net/rpc specific error case which may need
changing in the gRPC world.
2018-10-16 16:53:31 -07:00
Michael Schurter
796f0ca063 fix build errors post merges 2018-10-16 16:53:31 -07:00
Michael Schurter
d0842e7b00 test: cleanup mock consul service client
Updated to hclog.

It exposed fields that required an unexported lock to access. Created a
getter methodn instead. Only old allocrunner currently used this
feature.
2018-10-16 16:53:31 -07:00
Michael Schurter
b7d3f5948e health_hook: simplify locking; test thoroughly
Use doneCh like @dadgar suggested in the original PR.

Thoroughly test hook as concurrent Update calls make for a tricky
concurrency problem.
2018-10-16 16:53:30 -07:00
Alex Dadgar
9fd0ba1df6 add logger back 2018-10-16 16:53:30 -07:00
Nick Ethier
a203689bbc fifo: add new fifo package for named pipes (#4665)
* fifo: add new fifo package for named pipes
2018-10-16 16:53:30 -07:00
Alex Dadgar
9a2c2a4f68 client uses passed logger and fix fingerprinters 2018-10-16 16:53:30 -07:00
Nick Ethier
e9f3f2cfee Update runc/libcontainer and friends (#4655)
* vendor: bump libcontainer and docker to remove Sirupsen imports

* vendor: fix bad vendoring of archive package

* vendor: fix api changes to cgroups in executor

* vendor: fix docker api changes

* vendor: update github.com/Azure/go-ansiterm to use non capitalized logrus import
2018-10-16 16:53:30 -07:00
Michael Schurter
5cbd01329f health_hook: fix panic and add tests
Still more testing to do, but I want to get this panic fixed ASAP.

All new tests pass with -race
2018-10-16 16:53:30 -07:00
Michael Schurter
15b568c30b Emit events before long operations
Append when there's nothing blocking between appending and sending an
update to the server.
2018-10-16 16:53:30 -07:00
Michael Schurter
61a5a849cd Use a semaphore to block until watcher exits 2018-10-16 16:53:30 -07:00
Michael Schurter
0d7b6e75ee ar: use multierror in update hook loop
Make it match TaskRunner update hook behavior
2018-10-16 16:53:30 -07:00
Michael Schurter
5405313da1 tr: refactor EmitEvents into Emit+Append
* UpdateState: set state, append event, persist, update servers
* EmitEvent: append event, persist, update servers
* AppendEvent: append event, persist

AppendEvent may not even have to persist, but for the sake of
correctness I'm going with that for now.
2018-10-16 16:53:30 -07:00
Michael Schurter
bc26540a0b ar: create health setting shim for health watcher 2018-10-16 16:53:30 -07:00
Michael Schurter
6ba5c7c76a fix detection of task transitioning to running 2018-10-16 16:53:30 -07:00
Michael Schurter
2b5d840eaa arv2: implement alloc health watching
Also remove initial alloc from broadcaster as it just caused useless
extra processing.
2018-10-16 16:53:30 -07:00
Michael Schurter
5f668aa3d3 refactor ar hooks into their own files
minimize passed dependencies to ease testing
2018-10-16 16:53:30 -07:00
Michael Schurter
12171cdc28 make AllocBroadcaster easier to use
And test thoroughly.
2018-10-16 16:53:30 -07:00
Michael Schurter
9da25adc54 client: hclog-ify most of the client
Leaving fingerprinters in case that interface changes with plugins.
2018-10-16 16:53:30 -07:00
Michael Schurter
c95155d45c implement stopping, destroying, and disk migration
* Stopping an alloc is implemented via Updates but update hooks are
  *not* run.
* Destroying an alloc is a best effort cleanup.
* AllocRunner destroy hooks implemented.
* Disk migration and blocking on a previous allocation exiting moved to
  its own package to avoid cycles. Now only depends on alloc broadcaster
  instead of also using a waitch.
* AllocBroadcaster now only drops stale allocations and always keeps the
  latest version.
* Made AllocDir safe for concurrent use

Lots of internal contexts that are currently unused. Unsure if they
should be used or removed.
2018-10-16 16:53:30 -07:00
Michael Schurter
de5426124b lots of comment/log fixes 2018-10-16 16:53:30 -07:00