Commit Graph

333 Commits

Author SHA1 Message Date
Alex Dadgar
7fb1b37e09 Fix vet errors 2017-05-11 13:08:08 -07:00
Alex Dadgar
3f1ccf7278 Respond to comments 2017-05-09 10:50:24 -07:00
Alex Dadgar
e22393aeb8 Restore state + upgrade path 2017-05-02 18:21:49 -07:00
Alex Dadgar
85a81f47de Revert "metrics"
This reverts commit 4d6a012c6f.
2017-05-02 09:28:11 -07:00
Alex Dadgar
1d20b11297 Async and sync saving of client state 2017-05-01 16:16:53 -07:00
Alex Dadgar
7dee8ae534 perf 2017-05-01 16:01:50 -07:00
Alex Dadgar
4d6a012c6f metrics 2017-05-01 14:51:27 -07:00
Alex Dadgar
7614feddbd boltDB database for client state 2017-05-01 14:50:34 -07:00
Michael Schurter
10cb924b2c Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar
68ba51c600 Hash host ID so its stable and well distributed
This PR takes the host ID and runs it through a hash so that it is well
distributed. This makes it so that machines that report similar host IDs
are easily distinguished.

Instances of similar IDs occur on EC2 where the ID is prefixed and on
motherboards created in the same batch.

Fixes https://github.com/hashicorp/nomad/issues/2534
2017-04-10 11:44:51 -07:00
Alex Dadgar
d212f6fe94 Track task start/finish time & improve logs errors
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar
d698288f3a Merge pull request #2461 from hashicorp/b-groups
Various fixes for setting user/group of task
2017-03-28 11:13:27 -07:00
Alex Dadgar
564367fa71 Proper reference counting through task restarts
This PR fixes an issue in which the reference count on a Docker image
would become inflated through task restarts.
2017-03-25 17:05:53 -07:00
Alex Dadgar
67c07c9932 Various fixes for setting user/group of task
This PR fixes two issues:
* Folder permissions in -dev mode were incorrect and not suitable for
running as a particular user.
* Was not setting the group membership properly for the launched
process.

Fixes https://github.com/hashicorp/nomad/issues/2160
2017-03-20 14:21:13 -07:00
Alex Dadgar
701537e9c5 Limit parallelism during garbage collection
This PR introduces a parallelism limit during garbage collection. This
is used to avoid large resource usage spikes if garbage collecting many
allocations at once.
2017-03-10 16:27:00 -08:00
Alex Dadgar
4fbe182372 Add metrics to show allocations on the client
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal

Also adds some missing fields to the API version of the evaluation.
2017-03-09 12:37:41 -08:00
Alex Dadgar
07f7e19578 Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
b6991b3357 Allow random UUID 2017-02-27 13:42:37 -08:00
Alex Dadgar
dca06bd1f5 Add allocated/unallocated metrics to client 2017-02-16 18:28:11 -08:00
Sean Chittenden
c3c44d27fc Unconditionally lowercase the node ID read from disk. 2017-02-06 16:20:17 -08:00
Sean Chittenden
31333eecae Add better verification of a host's HostID. 2017-02-02 16:24:32 -08:00
Sean Chittenden
e4df8042e3 Slight mis-merge: secret-id in dev mode is random and needs to be returned. 2017-02-01 22:20:52 -08:00
Sean Chittenden
6c194a47c2 Generate a durable NodeID if possible, otherwise fall back to a random HostID. 2017-02-01 22:11:33 -08:00
Diptanu Choudhury
6b0a1ebb58 Making the GC related fields tunable 2017-01-31 15:51:20 -08:00
Diptanu Choudhury
c254fbfa4f Locking appropriately before closing the channel to indicate migration 2017-01-23 10:46:57 -08:00
Michael Schurter
78c9c00ba8 Fix index we get allocs by 2017-01-20 16:30:40 -08:00
Diptanu Choudhury
49e6735227 Merge pull request #2159 from hashicorp/b-consul-config
Fixed merging consul config
2017-01-18 16:14:54 -08:00
Diptanu Choudhury
6d669fb48e Moved functions to helper from structs 2017-01-18 15:55:14 -08:00
Alex Dadgar
23e84ecc12 Random wait 2017-01-11 13:24:23 -08:00
Alex Dadgar
925622a851 GetAllocs uses a blocking query
This PR makes GetAllocs use a blocking query as well as adding a sanity
check to the clients watchAllocation code to ensure it gets the correct
allocations.

This PR fixes https://github.com/hashicorp/nomad/issues/2119 and
https://github.com/hashicorp/nomad/issues/2153.

The issue was that the client was talking to two different servers, one
to check which allocations to pull and the other to pull those
allocations.  However the latter call was not with a blocking query and
thus the client would not retreive the allocations it requested.

The logging has been improved to make the problem more clear as well.
2017-01-10 13:30:35 -08:00
Michael Schurter
e25274b775 Put a logger in AllocDir/TaskDir 2017-01-05 16:31:56 -08:00
Diptanu Choudhury
5c6adce720 Unlocking if we return before adding a new alloc runner 2017-01-05 13:18:48 -08:00
Diptanu Choudhury
84bbd3de28 Fixed how alloc lock is held 2017-01-05 13:06:56 -08:00
Michael Schurter
09d4f0795a Fix race when shutting down in dev mode
Client.Shutdown holds the allocLock when destroying alloc runners in dev
mode.

Client.updateAllocStatus can be called during AllocRunner shutdown and
calls getAllocRunners which tries to acquire allocLock.RLock. This
deadlocks since Client.Shutdown already has the write lock.

Switching Client.Shutdown to use getAllocRunners and not hold a lock
during AllocRunner shutdown is the solution.
2017-01-03 17:21:50 -08:00
Michael Schurter
fccf115c56 Merge pull request #2054 from hashicorp/f-prestart
Add Driver.Prestart method
2016-12-20 16:18:56 -08:00
Diptanu Choudhury
6f978dd051 Removing the alloc runner from GC if it is destroyed by the server 2016-12-20 11:14:22 -08:00
Diptanu Choudhury
7ebe4a6972 Added comments 2016-12-20 10:49:48 -08:00
Diptanu Choudhury
61e534d684 Making the gc allocator understand real disk usage 2016-12-16 18:34:59 -08:00
Diptanu Choudhury
79fdad86c3 Added the stats collector to GC 2016-12-14 15:11:11 -08:00
Diptanu Choudhury
41d7ebc5c5 Refactored hoststats collector 2016-12-14 15:07:42 -08:00
Diptanu Choudhury
a38201a220 GC-ing before we start a new allocation 2016-12-14 15:04:06 -08:00
Diptanu Choudhury
615fbbe17a Added a garbage collector for allocations 2016-12-14 15:01:12 -08:00
Alex Dadgar
8b6fcd3483 Merge pull request #2096 from hashicorp/b-addAlloc
Fix race and remove panic
2016-12-13 13:50:17 -08:00
Diptanu Choudhury
76794f037f cancelling waiting for remote allocation if the alloc doesn't need migration 2016-12-13 13:06:33 -08:00
Alex Dadgar
da3e51a72c Fix race and remove panic 2016-12-13 12:34:23 -08:00
Christoffer Kylvåg
c3df9dd73f #1680: Continue after not being able to stat a mountpoint 2016-12-13 12:28:57 +01:00
Diptanu Choudhury
33e7d12d70 Setting the appropriate file permissions which un-archiving compressed alloc dir 2016-12-05 17:04:43 -08:00
Diptanu Choudhury
7c0978ea30 Merge pull request #2017 from hashicorp/b-sticky
Not moving alloc data when sticky is turned off
2016-12-05 14:11:45 -08:00
Diptanu Choudhury
0182b859e8 Not moving alloc data when sticky is turned off 2016-12-05 14:00:01 -08:00
Michael Schurter
ee17940dfe Add Driver.Prestart method
The Driver.Prestart method currently does very little but lays the
foundation for where lifecycle plugins can interleave execution _after_
task environment setup but _before_ the task starts.

Currently Prestart does two things:

* Any driver specific task environment building
* Download Docker images

This change also attaches a TaskEvent emitter to Drivers, so they can
emit events during task initialization.
2016-12-02 11:03:48 -08:00