Commit Graph

3325 Commits

Author SHA1 Message Date
Mahmood Ali
6b7953af21 Merge pull request #4884 from hashicorp/f-alloc-devices-cli
Report alloc device statistics in API and CLI
2018-11-16 18:04:54 -05:00
Mahmood Ali
fd49039f09 address review comments 2018-11-16 17:13:01 -05:00
Mahmood Ali
58cbafe913 Populate alloc stats API with device stats
This change makes few compromises:

* Looks up the devices associated with tasks at look up time.  Given
that `nomad alloc status` is called rarely generally (compared to stats
telemetry and general job reporting), it seems fine.  However, the
lookup overhead grows bounded by number of `tasks x total-host-devices`,
which can be significant.

* `client.Client` performs the task devices->statistics lookup.  It
passes self to alloc/task runners so they can look up the device statistics
allocated to them.
  * Currently alloc/task runners are responsible for constructing the
entire RPC response for stats
  * The alternatives for making task runners device statistics aware
don't seem appealing (e.g. having task runners contain reference to hostStats)

* On the alloc aggregation resource usage, I did a naive merging of task device statistics.
  * Personally, I question the value of such aggregation, compared to
costs of struct duplication and bloating the response - but opted to be
consistent in the API.
  * With naive concatination, device instances from a single device group used by separate tasks in the alloc, would be aggregated in two separate device group statistics.
2018-11-16 10:26:32 -05:00
Michael Schurter
9189173649 tests: fix tests post-rebase 2018-11-15 17:40:56 -08:00
Michael Schurter
a552a0c42c client/tr: add a bit of context to envbuilder errors 2018-11-15 16:26:25 -08:00
Michael Schurter
2283c8fff3 client: remove old proxy references from comments 2018-11-15 16:26:25 -08:00
Michael Schurter
926e3dc706 client: test more env key variations 2018-11-15 16:26:25 -08:00
Michael Schurter
b878bbf3f7 client: add new nested variables to task's hcl ctx
The error messages are really bad, but it's extremely difficult to
produce good error messages without the original HCL.
2018-11-15 16:26:25 -08:00
Michael Schurter
26211408a6 client: turn env into nested objects for task configs 2018-11-15 16:25:57 -08:00
Michael Schurter
43b359914b client: interpolate driver configurations
Also add missing SetDriverNetwork calls.
2018-11-15 16:25:57 -08:00
Mahmood Ali
04ecb5c72a Track Node Device attributes and serve them in API 2018-11-14 14:42:29 -05:00
Mahmood Ali
ba3fe15f7e Add Client Device Stats structs in api package 2018-11-14 14:41:19 -05:00
Mahmood Ali
5af9296bb4 Expose Device Stats in /client/stats API endpoint 2018-11-14 14:41:19 -05:00
Mahmood Ali
dd47c590f0 Allow nullable fields in StatValues
In state values, we need to be able to distinguish between zero values
(e.g. `false`) and unset values (e.g. `nil`).

We can alternatively use protobuf `oneOf` and nested map to ensure
consistency of fields that are set together, but the golang
representation does not represent that well and introducing a mismatch
between representations.  Thus, I opted not to use it.
2018-11-14 14:41:19 -05:00
Mahmood Ali
2f4c510cb7 Move Stat{Object|Value} to plugins/shared/structs
Moving them as they may be useful for other packages/plugins besides
devices.
2018-11-14 09:01:26 -05:00
Mahmood Ali
df694eb3be Regenerate proto files with protoc-gen-go@v1.2.0 2018-11-14 09:01:26 -05:00
Danielle Tomlinson
f16d96bdd8 Merge pull request #4869 from hashicorp/b-executor-stdout
executor: Fix stdout stderr copy/paste
2018-11-13 19:22:37 -08:00
Mahmood Ali
5c906aa085 convert all config durations to strings in tests 2018-11-13 10:21:40 -05:00
Mahmood Ali
179cdc6277 Address review comments 2018-11-13 10:21:40 -05:00
Mahmood Ali
5fe433efe7 avoid setting resource limit on rkt command
Was accidentally modified in 5b14d24bf4 .
2018-11-13 10:21:40 -05:00
Mahmood Ali
9933f4a45c Fix docker log fetching in tests
We no longer use syslog for tracking logs so tracking them explicitly
here
2018-11-13 10:21:40 -05:00
Mahmood Ali
9d8a71dc44 killing should be done with wait client
Incidentally changed in 5b14d24bf4
2018-11-13 10:21:40 -05:00
Mahmood Ali
6b8c6836a9 Prioritize checking consumer context cancellation
Tests expect that as soon as eventer shuts down immediately on context
cancellations; but golang does not guarantee priority when multiple
pending channels are ready in a select statement.
2018-11-13 10:21:40 -05:00
Mahmood Ali
f9295631c4 Set clean config for mock driver
The default job here contains some exec task config (for setting
command and args) that aren't used for mock driver.  Now, the alloc
runner seems stricter about validating fields and errors on unexpected
fields.

Updating configs in tests so we can have an explicit task config
whenever driver is set explicitly.
2018-11-13 10:21:40 -05:00
Mahmood Ali
73077e36fe Update Docker name parsing lookup
`ParseNamed` function changed in e9f3f2cfee
where became `ParsedNormalizedName` with extra checks.
2018-11-13 10:21:40 -05:00
Danielle Tomlinson
04577f7e15 executor: Fix stdout stderr copy/paste 2018-11-12 22:08:04 -08:00
Alex Dadgar
37f239ea74 fix race 2018-11-07 12:22:07 -08:00
Alex Dadgar
ad4c26a1e3 review comments 2018-11-07 11:31:52 -08:00
Alex Dadgar
a8e95502fe tests 2018-11-07 10:43:15 -08:00
Alex Dadgar
57f40c7e3e Device manager
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.
2018-11-07 10:43:15 -08:00
Michael Schurter
e58a91b701 client: update alloc status when terminating
Defensively update alloc status whenever killing all tasks.
2018-11-05 15:11:10 -08:00
Michael Schurter
a22205cd8f client: block on context as well as waitCh
For lifecycle operations such as Restart and Kill, the client should not
expect driver plugins to be well behaved and close their waitCh on
context cancelation. Always wait on the passed in context as well as the
waitCh.
2018-11-05 12:32:05 -08:00
Michael Schurter
740ca8e6ca client: fix tr lifecycle logic and shutdown delay
ShutdownDelay must be honored whenever the task is killed or restarted.
Services were not being deregistered prior to restarting.
2018-11-05 12:32:05 -08:00
Michael Schurter
d5c8e5bd26 client: fix ar and tr tests 2018-11-05 12:32:05 -08:00
Michael Schurter
9b82025608 client: do not run terminal allocs 2018-11-05 12:32:05 -08:00
Michael Schurter
fdbe446ea6 client: first pass at implementing task restoring
Task restoring works but dead tasks may be restarted
2018-11-05 12:32:05 -08:00
Nick Ethier
4b08ef0534 Merge pull request #4765 from jippi/increase-line-scan-limit
fix: increase log rotator line scan limit
2018-10-29 18:46:30 -07:00
Nick Ethier
da7563b8c3 Merge pull request #4795 from hashicorp/f-plugin-config
Pass client configuration to plugins through loader
2018-10-29 18:42:27 -07:00
Nick Ethier
95d381cff7 rename NomadConfig to ClientAgentConfig 2018-10-29 21:34:34 -04:00
Michael Schurter
16c25b8a60 Merge pull request #4803 from hashicorp/b-leader-fixes
AR Fixes: task leader handling, restoring, state updating, AR.Destroy deadlocks
2018-10-29 17:38:59 -05:00
Michael Schurter
0b4e15c366 tests: more fixes due to api changes 2018-10-29 15:25:22 -07:00
Preetha Appan
4231dc4729 Stat path to binary to handle raw exec driver interpolated binary path 2018-10-26 17:24:05 -05:00
Preetha Appan
af3a62e750 Fix test linting 2018-10-26 10:30:12 -05:00
Michael Schurter
05365806ac ar: initialize allocwatcher on restore
Fixes a panic. Left a comment on how the behavior could be improved, but
this is what releases <0.9.0 did.
2018-10-19 09:45:45 -07:00
Michael Schurter
d71e7666bd ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Nick Ethier
7f69bcd4cd added driver specific client config struct to plugin configuration 2018-10-18 23:31:01 -04:00
Michael Schurter
2aed3e8527 ar: refactor task killing into 1 method
Update comments and address some PR comments from #4775
2018-10-17 10:06:59 -07:00
Michael Schurter
e029980b25 tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
Michael Schurter
2417ec5621 ar: fix task leader, update, and stop handling 2018-10-17 10:06:59 -07:00
Michael Schurter
e130fcc0c7 tr: cleanup hook logs 2018-10-17 09:42:32 -07:00