Commit Graph

347 Commits

Author SHA1 Message Date
Alex Dadgar
da82a6e814 initial watcher 2017-07-07 12:07:08 -07:00
Michael Schurter
2b97f61ac0 Consistently quote alloc ids in client logs 2017-07-06 10:24:52 -07:00
Michael Schurter
4794de99fd Tiny client race condition fix
Plus some logging improvements that may help with #2563
2017-07-05 16:15:19 -07:00
Michael Schurter
e71673e24b Suggest wiping out alloc dir too 2017-07-03 12:29:21 -07:00
Michael Schurter
6b9af8fcc3 Add more logging to restore state errors 2017-07-03 11:58:41 -07:00
Mark Mickan
e418b3e468 Add tests for migrating symlinks in alloc and local directories 2017-06-04 15:56:22 +09:30
Mark Mickan
9e984f429c Include symlinks in snapshots when migrating disks
Fixes #2685
2017-06-04 00:36:18 +09:30
Alex Dadgar
b5fbb1f4cf Fix deadlock 2017-05-31 14:05:47 -07:00
Michael Schurter
236ef21489 Merge pull request #2636 from hashicorp/f-gc-alloc-limit
Add new gc_max_allocs tuneable
2017-05-30 16:14:09 -07:00
Michael Schurter
aac319cd16 Merge pull request #2654 from hashicorp/f-env-consul
Add envconsul-like support and refactor environment handling
2017-05-30 14:40:14 -07:00
Alex Dadgar
f3218378cd Fix perms to just set exec bit 2017-05-25 14:44:13 -07:00
Michael Schurter
a96fb5dbb0 Move task env into execcontext
Also inject PATH into rkt commands since we're no longer appending host
env vars for it.
2017-05-23 13:53:34 -07:00
Michael Schurter
fb72f20bb1 gc_max_allocs should include blocked & migrating 2017-05-12 16:03:22 -07:00
Michael Schurter
cc11d9a563 Add new gc_max_allocs tuneable
More than gc_max_allocs may be running on a node, but terminal allocs
will be garbage collected to try to keep the total number below the
limit.
2017-05-11 17:18:02 -07:00
Alex Dadgar
7fb1b37e09 Fix vet errors 2017-05-11 13:08:08 -07:00
Alex Dadgar
3f1ccf7278 Respond to comments 2017-05-09 10:50:24 -07:00
Alex Dadgar
e22393aeb8 Restore state + upgrade path 2017-05-02 18:21:49 -07:00
Alex Dadgar
85a81f47de Revert "metrics"
This reverts commit 4d6a012c6f.
2017-05-02 09:28:11 -07:00
Alex Dadgar
1d20b11297 Async and sync saving of client state 2017-05-01 16:16:53 -07:00
Alex Dadgar
7dee8ae534 perf 2017-05-01 16:01:50 -07:00
Alex Dadgar
4d6a012c6f metrics 2017-05-01 14:51:27 -07:00
Alex Dadgar
7614feddbd boltDB database for client state 2017-05-01 14:50:34 -07:00
Michael Schurter
10cb924b2c Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar
68ba51c600 Hash host ID so its stable and well distributed
This PR takes the host ID and runs it through a hash so that it is well
distributed. This makes it so that machines that report similar host IDs
are easily distinguished.

Instances of similar IDs occur on EC2 where the ID is prefixed and on
motherboards created in the same batch.

Fixes https://github.com/hashicorp/nomad/issues/2534
2017-04-10 11:44:51 -07:00
Alex Dadgar
d212f6fe94 Track task start/finish time & improve logs errors
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar
d698288f3a Merge pull request #2461 from hashicorp/b-groups
Various fixes for setting user/group of task
2017-03-28 11:13:27 -07:00
Alex Dadgar
564367fa71 Proper reference counting through task restarts
This PR fixes an issue in which the reference count on a Docker image
would become inflated through task restarts.
2017-03-25 17:05:53 -07:00
Alex Dadgar
67c07c9932 Various fixes for setting user/group of task
This PR fixes two issues:
* Folder permissions in -dev mode were incorrect and not suitable for
running as a particular user.
* Was not setting the group membership properly for the launched
process.

Fixes https://github.com/hashicorp/nomad/issues/2160
2017-03-20 14:21:13 -07:00
Alex Dadgar
701537e9c5 Limit parallelism during garbage collection
This PR introduces a parallelism limit during garbage collection. This
is used to avoid large resource usage spikes if garbage collecting many
allocations at once.
2017-03-10 16:27:00 -08:00
Alex Dadgar
4fbe182372 Add metrics to show allocations on the client
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal

Also adds some missing fields to the API version of the evaluation.
2017-03-09 12:37:41 -08:00
Alex Dadgar
07f7e19578 Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
b6991b3357 Allow random UUID 2017-02-27 13:42:37 -08:00
Alex Dadgar
dca06bd1f5 Add allocated/unallocated metrics to client 2017-02-16 18:28:11 -08:00
Sean Chittenden
c3c44d27fc Unconditionally lowercase the node ID read from disk. 2017-02-06 16:20:17 -08:00
Sean Chittenden
31333eecae Add better verification of a host's HostID. 2017-02-02 16:24:32 -08:00
Sean Chittenden
e4df8042e3 Slight mis-merge: secret-id in dev mode is random and needs to be returned. 2017-02-01 22:20:52 -08:00
Sean Chittenden
6c194a47c2 Generate a durable NodeID if possible, otherwise fall back to a random HostID. 2017-02-01 22:11:33 -08:00
Diptanu Choudhury
6b0a1ebb58 Making the GC related fields tunable 2017-01-31 15:51:20 -08:00
Diptanu Choudhury
c254fbfa4f Locking appropriately before closing the channel to indicate migration 2017-01-23 10:46:57 -08:00
Michael Schurter
78c9c00ba8 Fix index we get allocs by 2017-01-20 16:30:40 -08:00
Diptanu Choudhury
49e6735227 Merge pull request #2159 from hashicorp/b-consul-config
Fixed merging consul config
2017-01-18 16:14:54 -08:00
Diptanu Choudhury
6d669fb48e Moved functions to helper from structs 2017-01-18 15:55:14 -08:00
Alex Dadgar
23e84ecc12 Random wait 2017-01-11 13:24:23 -08:00
Alex Dadgar
925622a851 GetAllocs uses a blocking query
This PR makes GetAllocs use a blocking query as well as adding a sanity
check to the clients watchAllocation code to ensure it gets the correct
allocations.

This PR fixes https://github.com/hashicorp/nomad/issues/2119 and
https://github.com/hashicorp/nomad/issues/2153.

The issue was that the client was talking to two different servers, one
to check which allocations to pull and the other to pull those
allocations.  However the latter call was not with a blocking query and
thus the client would not retreive the allocations it requested.

The logging has been improved to make the problem more clear as well.
2017-01-10 13:30:35 -08:00
Michael Schurter
e25274b775 Put a logger in AllocDir/TaskDir 2017-01-05 16:31:56 -08:00
Diptanu Choudhury
5c6adce720 Unlocking if we return before adding a new alloc runner 2017-01-05 13:18:48 -08:00
Diptanu Choudhury
84bbd3de28 Fixed how alloc lock is held 2017-01-05 13:06:56 -08:00
Michael Schurter
09d4f0795a Fix race when shutting down in dev mode
Client.Shutdown holds the allocLock when destroying alloc runners in dev
mode.

Client.updateAllocStatus can be called during AllocRunner shutdown and
calls getAllocRunners which tries to acquire allocLock.RLock. This
deadlocks since Client.Shutdown already has the write lock.

Switching Client.Shutdown to use getAllocRunners and not hold a lock
during AllocRunner shutdown is the solution.
2017-01-03 17:21:50 -08:00
Michael Schurter
fccf115c56 Merge pull request #2054 from hashicorp/f-prestart
Add Driver.Prestart method
2016-12-20 16:18:56 -08:00
Diptanu Choudhury
6f978dd051 Removing the alloc runner from GC if it is destroyed by the server 2016-12-20 11:14:22 -08:00