Commit Graph

13444 Commits

Author SHA1 Message Date
Mahmood Ali
bc6929b8fd fixup! device attributes in nomad node status -verbose 2018-12-12 09:17:31 -05:00
Mahmood Ali
40164b3dc6 Use max 3 precision in displaying floats
When formating floats in `nomad node status`, use a maximum precision of
3.
2018-12-10 12:18:24 -05:00
Mahmood Ali
a802984c9a device attributes in nomad node status -verbose
This reports device attributes like the following:

```
$ nomad node status -self -verbose
ID          = f7adb958-29e1-2a5a-2303-9d61ffaab33a
Name        = mars.local
Class       = <none>
DC          = dc1
Drain       = false
Eligibility = eligible
Status      = ready
Uptime      = 12h40m13s

Drivers
Driver       Detected  Healthy  Message                               Time
docker       true      true     healthy                               2018-12-10T11:47:19-05:00
...

Attributes
cpu.arch                      = amd64
cpu.frequency                 = 2200
cpu.modelname                 = Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
cpu.numcores                  = 12
...

Device Group Attributes
Device Group = nomad/file/mock
block_device = sda1
filesystem   = ext4
size         = 63.2 GB

Meta
```
2018-12-10 12:18:24 -05:00
Mahmood Ali
612f79bd45 Rename helper_stats -> helper_devices 2018-12-10 12:18:24 -05:00
Mahmood Ali
bf17eae7c3 devices/nvidia: memory state as the summary stat 2018-12-10 12:18:24 -05:00
Mahmood Ali
070ed8b654 fix dtestutil.NewDriverHarness ref 2018-12-08 09:58:23 -05:00
Mahmood Ali
2fb5e35012 Merge pull request #4950 from hashicorp/b-exc-libcontainer-kill
executor: kill all container processes
2018-12-08 09:52:42 -05:00
Nick Ethier
3fe18c2f2d Merge pull request #4973 from emate/recover-filerotator-from-io-errors
Recover from any possible io error when invoking Write on FileRotator
2018-12-08 00:05:42 -05:00
Alex Dadgar
9a220c8d40 Merge pull request #4965 from hashicorp/b-gc-running
Don't GC running but desired stop allocations
2018-12-07 13:36:33 -08:00
Marcin Matlaszek
b91fa87d31 Recover from any possible io error when invoking Write on FileRotator
As of now, FileRotator uses bufio.Write under the hood to write data to
configured output file. Due to the way how bufio handles any occurred io
error - saves it into `err` variable never resetting it automatically -
any operation like `Write`, `Flush` etc will become a no-op, returning the very same,
saved error (eg. Out of disk space) even when the problem is fixed (eg. disk
space is available again).

That automatically means that FileRotator will stop writing any logs,
reporting the same error over and over again, even if it's no longer
valid.

This PR fixes it by resetting the bufio Writer, which resets any errors
and tries to write requested data.
2018-12-07 18:22:29 +01:00
Mahmood Ali
c74e2d0243 Merge pull request #4933 from hashicorp/f-mount-device
Mount Devices in container based drivers
2018-12-07 10:32:03 -05:00
Mahmood Ali
379e79ceff Vendor libcontainer/devices 2018-12-07 09:13:27 -05:00
Danielle Tomlinson
e668e55fa5 Merge pull request #4960 from hashicorp/dani/b-gc-tests
Re-enable Client GC tests
2018-12-06 23:18:36 +01:00
Mahmood Ali
aef1c9dc96 Merge pull request #4955 from hashicorp/fix-docker-tests-20181203
Fix docker driver tests
2018-12-06 16:41:33 -05:00
Danielle Tomlinson
f6e2687f5b gc: Fix maxallocs integration test 2018-12-06 21:50:50 +01:00
Mahmood Ali
b78130eaaf Use absolute path in example device plugin
deviceDir is used for specifying mount/device host paths, and those
should be absolute paths.
2018-12-06 15:46:35 -05:00
Mahmood Ali
8c5e2e39a4 driver/rkt: mount plugin devices 2018-12-06 15:46:35 -05:00
Mahmood Ali
bfa2854c8b driver/lxc: mount plugin devices
Also, LXC requires target paths to be relative.  Container paths in LXC
binds should never be absolute paths, so we strip any preceeding `/`,
even if a user sets one.
2018-12-06 15:46:35 -05:00
Mahmood Ali
a0a5847315 fixup: add missed docker utils test 2018-12-06 15:46:35 -05:00
Mahmood Ali
5fd8d3fe68 tests: ensure image is loaded as test setup 2018-12-06 15:36:43 -05:00
Michael Lange
12dc187779 Merge pull request #4967 from hashicorp/b-ui-stat-charts-can-escape-canvas
UI: Keep line charts in their canvases at all times
2018-12-06 10:56:37 -08:00
Danielle Tomlinson
f3c057d7e0 client/gc: Replace GC integration test with unit
The previous integration test was broken during the client refactor, and
it seems to be some sort of race with state updating.

I'm going to try and construct a replacement test as part of work on
performance, but for now, the underlying behaviour is still being
tested.
2018-12-06 12:28:23 +01:00
Danielle Tomlinson
419ae3299c client: Re-enable GC tests 2018-12-06 12:28:23 +01:00
Danielle Tomlinson
62ac40ab09 allocrunner: Basic test alloc runner 2018-12-06 12:28:23 +01:00
Michael Lange
104be1cad6 Grow the default 0 to 1 bounds to the domain of the data when necessary 2018-12-05 22:07:44 -08:00
Alex Dadgar
d7dbf14c46 Merge pull request #4966 from hashicorp/b-failure-event
Fix various bugs with task events
2018-12-05 14:43:50 -08:00
Alex Dadgar
8b624340ad Fix various bugs with task events
Fixes the following:
* Emitting events when the task fails to start
* Don't double emit events on task shutdown (nomad stop)
* Don't emit a OOM kill metric unless actually OOM'd
2018-12-05 14:27:07 -08:00
Alex Dadgar
5a98dfa493 Don't GC running but desired stop allocations
This PR fixes an edge case where we could GC an allocation that was in a
desired stop state but had not terminated yet. This can be hit if the
client hasn't shutdown the allocation yet or if the allocation is still
shutting down (long kill_timeout).

Fixes https://github.com/hashicorp/nomad/issues/4940
2018-12-05 13:01:12 -08:00
Mahmood Ali
04af097008 driver/docker: honor plugin devices 2018-12-04 21:31:28 -05:00
Mahmood Ali
65e9b3b8be refactor device manipulation 2018-12-04 20:55:59 -05:00
Mahmood Ali
7a49e9b68e drivers/exec: refactor stop/kill tests
Simplify the tests to do all assertions within the main goroutine and
account for status propagation delay.
2018-12-04 20:34:43 -05:00
Mahmood Ali
661dc4b386 Merge pull request #4956 from hashicorp/b-vault-client-tweaks-followup
server/vault: Lock Vault expiration tracking
2018-12-04 19:46:59 -05:00
Mahmood Ali
615c525cf6 Merge pull request #4959 from hashicorp/fix-rkt-tests-20181204
tests: fix rkt tests
2018-12-04 19:46:41 -05:00
Mahmood Ali
e9dc31c68f executor: Keep 0.8.6 exit code for wait() failures
0.8.6 uses exit code 1 when `proc.Wait()` fails: https://github.com/hashicorp/nomad/blob/v0.8.6/client/driver/executor/executor.go#L442
2018-12-04 19:38:25 -05:00
Mahmood Ali
92b57036a8 driver/rkt: use rkt environment
The rkt command itself needs an environment with PATH set to find iptables.
2018-12-04 14:00:45 -05:00
Preetha
e5d125e851 Merge pull request #4949 from hashicorp/b-neg-running-summary
Add guards around subtracting summary count
2018-12-04 12:52:58 -06:00
Mahmood Ali
6a5238eb48 tests: stop integration tests tasks explicitly
Also update the new recommended `nomad job` subcommands
2018-12-04 11:50:59 -05:00
Dan Brown
d1cc66ae49 Add Reference Architecture and Deployment Guide (#4768)
* Add Nomad RA

* Add deployment guide and nav

* Deployment Guide update

* Minor typo fixes

* Update diagrams

* Fixes for review

* Link fixes and typo fix

* Edits following review

- Update image text from "zone" to "datacenter" to match Nomad terminology
- Clean up text based on Preetha's feedback

* Text updates

Based on feedback from Rob

* Update diagrams

* fixing spelling

* Add suggestions from Preetha and Omar
2018-12-04 11:49:35 -05:00
Mahmood Ali
2c9b6cd7c4 drivers/rkt: use image isolation for rkt 2018-12-04 11:40:10 -05:00
Mahmood Ali
8f7b78af0c tests: don't assert in WaitForResult
WaitForResult expects body to fail and retries few times before giving
up.  Assertions inside the testfn body causes it to terminate abruptly
without retrying.
2018-12-04 11:40:10 -05:00
Mahmood Ali
2f8b1eb5eb server/nomad: Lock Vault expiration tracking
`currentExpiration` field is accessed in multiple goroutines: Stats and
renewal, so needs locking.

I don't anticipate high contention, so simple mutex suffices.
2018-12-04 09:29:48 -05:00
Mahmood Ali
f6efac6c12 no t.Parallel() in excutor table driven tests (#4948)
When `t.Parallel()` is used inside a `t.Run()` sub-set, the closure
doesn't behave as expected, and some cases effectively get skipped.
More details can be found in
https://gist.github.com/posener/92a55c4cd441fc5e5e85f27bca008721
2018-12-04 09:04:04 -05:00
Mahmood Ali
5f124c7ce3 Update LXC with drivers/testutils changes (#4951) 2018-12-04 08:57:54 -05:00
Mahmood Ali
fc40680163 Fix docker tests
Some tests have containers that die almost immediately, and may die
and cleaned up before `driver.WaitUntilStarted` runs.

The causes for container dying seems special for each test:
* TestDockerDriver_Cleanup: `hello-world` image just emits a message and exits immediately
* TestDockerDriver_ForcePull_RepoDigest: the busybox image in `TestDockerDriver_ForcePull_RepoDigest` test didn't support `-p 0` argument
* TestDockerDriver_Entrypoint: with the entrypoint being `/bin/sh -c`, the command needs to be the entire string; otherwise, it ignores the comments
2018-12-03 23:08:52 -05:00
Mahmood Ali
781d8558f2 Kill all container processes on shutdown
Currently, libcontainer-based executor, upon shutdown, kills the
container initial process.  The children of the killed process remain
running, and the executor is never marked as terminated until they do.

Also, fix a case where we treat processes as successful, when
`proc.Wait()` fails.  In some attempts, I was getting "waitid no child
processes" errors and such error shouldn't get process to be considered
successful.
2018-12-03 20:40:49 -05:00
Mahmood Ali
87c1286172 Test Stopping a multi-process exec
Ensure that exec children processes get killed as well.
2018-12-03 20:40:19 -05:00
Danielle Tomlinson
cd8c5c55bd Merge pull request #4925 from hashicorp/f-driver-plugins-dani
Third Party Driver Plugins Support
2018-12-03 20:48:19 +01:00
Michael Schurter
56bd482880 Merge pull request #4947 from githubfoam/update-demo-vagrantfile
no need to double the work
2018-12-03 11:17:08 -08:00
Preetha Appan
5b39532759 Add guards around subtracting summary count 2018-12-03 11:16:35 -06:00
Mahmood Ali
813f0a2282 libcontainer to manage /dev and /proc (#4945)
libcontainer already manages `/dev`, overriding task_dir - so let's use it for `/proc` as well and remove deadcode.
2018-12-03 10:41:01 -05:00