Commit Graph

518 Commits

Author SHA1 Message Date
Alex Dadgar
ad4c26a1e3 review comments 2018-11-07 11:31:52 -08:00
Alex Dadgar
57f40c7e3e Device manager
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.
2018-11-07 10:43:15 -08:00
Michael Schurter
05365806ac ar: initialize allocwatcher on restore
Fixes a panic. Left a comment on how the behavior could be improved, but
this is what releases <0.9.0 did.
2018-10-19 09:45:45 -07:00
Michael Schurter
d71e7666bd ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Nick Ethier
4f9522dd54 client: review comments and fixup/skip tests 2018-10-16 16:56:56 -07:00
Nick Ethier
ea9ed2282e client: refactor post allocrunnerv2 finalization 2018-10-16 16:56:56 -07:00
Nick Ethier
d335a82859 client: begin driver plugin integration
client: fingerprint driver plugins
2018-10-16 16:56:56 -07:00
Alex Dadgar
3a492bb33f allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00
Alex Dadgar
2e535aefcc move files around 2018-10-16 16:56:55 -07:00
Michael Schurter
d29d613c02 client: expose task state to client
The interesting decision in this commit was to expose AR's state and not
a fully materialized Allocation struct. AR.clientAlloc builds an Alloc
that contains the task state, so I considered simply memoizing and
exposing that method.

However, that would lead to AR having two awkwardly similar methods:
 - Alloc() - which returns the server-sent alloc
 - ClientAlloc() - which returns the fully materialized client alloc

Since ClientAlloc() could be memoized it would be just as cheap to call
as Alloc(), so why not replace Alloc() entirely?

Replacing Alloc() entirely would require Update() to immediately
materialize the task states on server-sent Allocs as there may have been
local task state changes since the server received an Alloc update.

This quickly becomes difficult to reason about: should Update hooks use
the TaskStates? Are state changes caused by TR Update hooks immediately
reflected in the Alloc? Should AR persist its copy of the Alloc? If so,
are its TaskStates canonical or the TaskStates on TR?

So! Forget that. Let's separate the static Allocation from the dynamic
AR & TR state!

 - AR.Alloc() is for static Allocation access (often for the Job)
 - AR.AllocState() is for the dynamic AR & TR runtime state (deployment
   status, task states, etc).

If code needs to know the status of a task: AllocState()
If code needs to know the names of tasks: Alloc()

It should be very easy for a developer to reason about which method they
should call and what they can do with the return values.
2018-10-16 16:56:55 -07:00
Michael Schurter
9394b989e5 client: fix accessing alloc runners
* GetClientAlloc() gains nothing from using allAllocs()
* getAllocatedResources was calling getAllocRunners() twice
2018-10-16 16:56:55 -07:00
Michael Schurter
da8f053a0d tr: implement stats collection hook
Tested except for the net/rpc specific error case which may need
changing in the gRPC world.
2018-10-16 16:53:31 -07:00
Alex Dadgar
9fd0ba1df6 add logger back 2018-10-16 16:53:30 -07:00
Alex Dadgar
9a2c2a4f68 client uses passed logger and fix fingerprinters 2018-10-16 16:53:30 -07:00
Michael Schurter
9da25adc54 client: hclog-ify most of the client
Leaving fingerprinters in case that interface changes with plugins.
2018-10-16 16:53:30 -07:00
Michael Schurter
c95155d45c implement stopping, destroying, and disk migration
* Stopping an alloc is implemented via Updates but update hooks are
  *not* run.
* Destroying an alloc is a best effort cleanup.
* AllocRunner destroy hooks implemented.
* Disk migration and blocking on a previous allocation exiting moved to
  its own package to avoid cycles. Now only depends on alloc broadcaster
  instead of also using a waitch.
* AllocBroadcaster now only drops stale allocations and always keeps the
  latest version.
* Made AllocDir safe for concurrent use

Lots of internal contexts that are currently unused. Unsure if they
should be used or removed.
2018-10-16 16:53:30 -07:00
Michael Schurter
de5426124b lots of comment/log fixes 2018-10-16 16:53:30 -07:00
Michael Schurter
b97bbd9d30 persist alloc state on changes, not periodically
Allow alloc and task runners to persist their own state when something
changes instead of periodically syncing all state.
2018-10-16 16:53:30 -07:00
Michael Schurter
c9e97123e6 Move all encoding and put deduping into state db
Still WIP as it does not handle deletions.
2018-10-16 16:53:30 -07:00
Michael Schurter
63fea0c888 implement all boltdb interactions behind StateDB 2018-10-16 16:53:30 -07:00
Michael Schurter
7ca41a89c4 Implement alloc updates in arv2
Updates are applied asynchronously but sequentially
2018-10-16 16:53:30 -07:00
Michael Schurter
76194c7414 consul service hook
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Michael Schurter
bb273896b5 restore vault client 2018-10-16 16:53:29 -07:00
Alex Dadgar
52ae83d4d5 Update state with server 2018-10-16 16:53:29 -07:00
Michael Schurter
b09b552ae5 missed locking around c.allocs access 2018-10-16 16:53:29 -07:00
Michael Schurter
7e60f0ba77 client: implement all-or-nothing alloc restoration
Restoring calls NewAR -> Restore -> Run

NewAR now calls NewTR
AR.Restore calls TR.Restore
AR.Run calls TR.Run
2018-10-16 16:53:29 -07:00
Alex Dadgar
427eab563a vault hook 2018-10-16 16:53:29 -07:00
Michael Schurter
bfbc95e258 fix hclog level 2018-10-16 16:53:29 -07:00
Michael Schurter
370d92ff2e pass statedb into allocrunnerv2 2018-10-16 16:53:29 -07:00
Michael Schurter
53a4b3fe99 example redis job "runs" on arv2! see below
Tons left to do and lots of churn:
1. No state saving
2. No shutdown or gc
3. Removed AR factory *for now*
4. Made all "Config" structs local to the package they configure
5. Added allocID to GC to avoid a lookup

Really hating how many things use *structs.Allocation. It's not bad
without state saving, but if AllocRunner starts updating its copy things
get racy fast.
2018-10-16 16:53:29 -07:00
Alex Dadgar
e30b20e65e renames 2018-10-04 14:57:25 -07:00
Alex Dadgar
0f2f4797cb fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar
f969298854 Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar
b310a54aa6 Node resources on client 2018-09-29 17:23:41 -07:00
Alex Dadgar
58c889aa94 yamux 2018-09-17 14:22:40 -07:00
Alex Dadgar
40d095fd1a agent + consul 2018-09-13 10:43:40 -07:00
Michael Schurter
6b50475b92 fix race around error handling 2018-09-05 17:34:17 -07:00
Preetha
e2e60795c6 Merge pull request #3882 from burdandrei/telemetry-add-node-class-tag
Added node class to tagged metrics
2018-06-21 17:04:35 -05:00
Alex Dadgar
6091307b77 Merge pull request #4409 from hashicorp/r-client-packages
Refactor client packages
2018-06-13 17:32:25 -07:00
Alex Dadgar
d94bf14e13 Fix gc tests + parallel destroy + small test fixes 2018-06-12 10:23:45 -07:00
Alex Dadgar
a62e412b88 Refactor - wip 2018-06-12 10:23:45 -07:00
Chelsea Holland Komlo
45ff58e40e add client logic to determine whether TLS RPC connections should reload 2018-06-08 14:38:58 -04:00
Chelsea Holland Komlo
1a854c444e add server join info to server and client 2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo
6733d768f0 refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Chelsea Holland Komlo
0f46208cc1 allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Chelsea Holland Komlo
3ebecb676c fix up comments 2018-04-17 11:53:08 -04:00
Alex Dadgar
cc3f264a35 Cleanup 2018-04-16 15:48:34 -07:00
Alex Dadgar
3cf87e0dc8 Copy the config given to the alloc runner 2018-04-16 15:45:52 -07:00
Alex Dadgar
89fa9a1e10 Fix copying drivers 2018-04-16 15:45:51 -07:00
Alex Dadgar
9929019b9a Operate on copy 2018-04-16 15:45:49 -07:00