Commit Graph

3041 Commits

Author SHA1 Message Date
Nick Ethier
fdef36ae38 client/driver: remove unused const 'dockerPullProgressEmitInterval' 2018-05-07 16:24:48 -04:00
Nick Ethier
e324e86632 client/driver: add waiting layer status count to pull progress status msg 2018-05-07 12:18:20 -04:00
Nick Ethier
6337f8e32e client/driver: add seperate handler for emitting pull progress 2018-05-07 12:17:34 -04:00
Nick Ethier
9ea3080343 client/driver: remove pull timeout due to race condition that can lead to unexpected timeouts
If two jobs are pulling the same image simultaneously, which ever starts the pull first will set the pull timeout.
This can lead to a poor UX where the first job requested a short timeout while the second job requested a longer timeout
causing the pull to potentially timeout much sooner than expected by the second job.
2018-05-07 12:18:11 -04:00
Nick Ethier
6fb6ecdff6 client/driver: do accounting on layer pull progress 2018-05-07 12:17:53 -04:00
Nick Ethier
c44db4ce79 client/driver: emit progress to all allocs pulling same image 2018-05-07 12:17:34 -04:00
Nick Ethier
77d57e8276 client/driver: add image pull progress monitoring 2018-05-07 12:17:38 -04:00
Michael Schurter
bd4e761c29 Merge pull request #4251 from hashicorp/f-grpc-checks
Support Consul gRPC Health Checks
2018-05-04 14:55:16 -07:00
Michael Schurter
905bef8f2d consul: make grpc checks more like http checks 2018-05-04 11:08:11 -07:00
Michael Schurter
93356e7d70 consul: initial grpc implementation
Needs to be more like http.
2018-05-04 11:08:11 -07:00
Michael Schurter
6858c520b2 framer: fix early exit/truncation in framer 2018-05-02 10:46:16 -07:00
Michael Schurter
361db269c2 framer: fix race and remove unused error var
In the old code `sending` in the `send()` method shared the Data slice's
underlying backing array with its caller. Clearing StreamFrame.Data
didn't break the reference from the sent frame to the StreamFramer's
data slice.
2018-05-02 10:46:16 -07:00
Michael Schurter
aad596bb0f client: squelch errors on cleanly closed pipes 2018-05-02 10:46:16 -07:00
Michael Schurter
cafcb89394 client: don't spin on read errors 2018-05-02 10:46:16 -07:00
Michael Schurter
a7c71c1cdc client: reset encoders between uses
According to go/codec's docs, Reset(...) should be called on
Decoders/Encoders before reuse:

https://godoc.org/github.com/ugorji/go/codec

I could find no evidence that *not* calling Reset() caused bugs, but
might as well do what the docs say?
2018-05-02 10:46:16 -07:00
Alex Dadgar
c8f1c98fd7 version bump and remove generated 2018-04-27 11:10:00 -07:00
Alex Dadgar
df9e8e2f27 generated files 2018-04-27 10:45:40 -07:00
Alex Dadgar
3fd1f48136 Remove generated and version bump 2018-04-26 16:49:19 -07:00
Alex Dadgar
9a4295aa9c generated files 2018-04-26 16:28:58 -07:00
Michael Schurter
c79a820042 Merge pull request #4188 from hashicorp/f-rkt-stats
rkt: create parent cgroup to enable stats
2018-04-24 14:54:36 -07:00
Michael Schurter
a3dba1db78 rkt: test Stats() and always run tests
Remove the NOMAD_TEST_RKT flag as a guard for rkt tests. Still require
Linux, root, and rkt to be installed. Only check for rkt installation
once in hopes of speeding up rkt tests a bit.
2018-04-24 11:05:42 -07:00
Javier Palomo Almena
92cbfd0cf3 docker tests: Fix usage of NewDriverContext 2018-04-23 22:51:06 +02:00
Javier Palomo Almena
3e6c3437af DriverContext: Add the TaskGroup and the Job name
Adding this fields to the DriverContext object, will allow us to pass
them to the drivers.

An use case for this, will be to emit tagged metrics in the drivers,
which contain all relevant information:
- Job
- TaskGroup
- Task
- ...

Ref: https://github.com/hashicorp/nomad/pull/4185
2018-04-23 00:15:29 +02:00
Michael Schurter
c06891956d rkt: create parent cgroup to enable stats
Having the Nomad executor create parent cgroups that rkt is launched
within allows the stats collection code used for the exec driver to Just
Work. The only downside is that now the Nomad executor's resource
utilization counts against the cgroups resource limits just as it does
for the exec driver.
2018-04-19 15:14:56 -07:00
Michael Schurter
a4bf901559 run goimports 2018-04-19 11:16:28 -07:00
Michael Schurter
d6de69b6fc Merge pull request #4168 from ninoles/b-2117-windows-group-process
B 2117 windows group process
2018-04-19 11:10:51 -07:00
Michael Schurter
a516dcd0ea Merge pull request #4058 from hashicorp/f-mock-by-default
[Post-0.8] test: build with mock_driver by default
2018-04-18 15:57:00 -07:00
Michael Schurter
29da24b77b test: build with mock_driver by default
`make release` and `make prerelease` set a `release` tag to disable
enabling the `mock_driver`
2018-04-18 14:45:33 -07:00
Michael Schurter
55dcadc132 tests: fix race in alloc_runner_test.go
I could not reproduce the failure locally even with `stress -cpu ...`
eating all the cpu it could on my machine.

But I think the race was in one of two places:
* The task could restart which could create new events
* I think there could be a race between the updater's version of events
  and alloc runners as updates are async

I fixed both. Here's hoping that fixes this flaky test.
2018-04-17 17:14:59 -07:00
Fabien Ninoles
3eba6ca8c2 Merge branch 'master' into b-2117-windows-group-process 2018-04-17 13:47:25 -04:00
Fabien Ninoles
d6cc8895ca Update based on PR request. 2018-04-17 13:43:04 -04:00
Alex Dadgar
30cd227763 Merge pull request #4166 from hashicorp/b-panic-fix-update
Fixes races accessing node and updating it during fingerprinting
2018-04-17 10:02:19 -07:00
Chelsea Holland Komlo
3ebecb676c fix up comments 2018-04-17 11:53:08 -04:00
Alex Dadgar
cc3f264a35 Cleanup 2018-04-16 15:48:34 -07:00
Alex Dadgar
3cf87e0dc8 Copy the config given to the alloc runner 2018-04-16 15:45:52 -07:00
Alex Dadgar
741975657d fix race node access 2018-04-16 15:45:51 -07:00
Alex Dadgar
89fa9a1e10 Fix copying drivers 2018-04-16 15:45:51 -07:00
Alex Dadgar
9929019b9a Operate on copy 2018-04-16 15:45:49 -07:00
Fabien Ninoles
7023b63ffa - Clean up for windows compilation.
- Set CREATE_NEW_PROCESS_GROUP for Windows subprocess.
- Ensure we only kill actual process that need to.
2018-04-14 13:58:42 -04:00
Michael Schurter
1e67780f1e Merge pull request #3572 from emate/master
Create new process group on process startup.
2018-04-13 11:56:38 -07:00
Alex Dadgar
1468985035 Remove generated structs 2018-04-12 16:35:31 -07:00
Alex Dadgar
8a80a5f7c5 Version bump and generated files 2018-04-12 16:21:50 -07:00
Alex Dadgar
e0171acbdd Move where attribute for driver detection is set 2018-04-12 15:50:25 -07:00
Chelsea Holland Komlo
228a2319c2 delete driver name from only health check attributes 2018-04-12 18:24:41 -04:00
Alex Dadgar
d0605c5229 Fix tests 2018-04-12 14:29:30 -07:00
Alex Dadgar
f45b51a138 Driver health detection cleanups
This PR does:

1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Charlie Voiselle
c728a2feb8 Changed "til" to "until"
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Chelsea Komlo
d32a4822fd Merge pull request #4111 from hashicorp/b-undetected-set-health-to-false
Immediately set driver health status to false when driver moves to undetected
2018-04-10 18:30:31 -04:00
Chelsea Holland Komlo
a40750e596 update comment for when the fingerprinter setting health status 2018-04-10 16:53:00 -04:00
Chelsea Holland Komlo
46ec4633fe fingerprinter should set health check status if health check is not periodic 2018-04-10 15:29:51 -04:00