Commit Graph

365 Commits

Author SHA1 Message Date
Alex Dadgar
a9e3a41407 Enable more linters 2017-09-26 15:26:33 -07:00
Chelsea Holland Komlo
8943a29428 Move setGaugeForAllocationStats to emitClientMetrics 2017-09-25 16:05:49 +00:00
Alex Dadgar
98c47c72d0 changelog and feedback 2017-09-14 14:08:58 -07:00
Alex Dadgar
f23ac5f083 Non-locked accessors to common Node fields
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.

Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Holland Komlo
1ecfb687bf fix panic in emitting tagged metrics 2017-09-11 15:32:37 +00:00
Chelsea Holland Komlo
68686cd69a final code review fixups 2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo
681a3f337a fixups from code review 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
3c0710074c labels depend on full setup of client beforehand 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
fce72a1bc9 refactor to use baseLabels 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
a6eeede7e2 pass in commonly used values 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
50ab667799 create base labels to be used in every metric 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
7a96f92290 emit metrics using labels, add option for backwards compatibility 2017-09-05 14:12:57 +00:00
Armon Dadgar
bda7b36da3 Address @dadgar feedback 2017-09-04 13:05:53 -07:00
Armon Dadgar
fb118b2dfb client: adding token cache for ACL resolution 2017-09-04 13:05:36 -07:00
Armon Dadgar
1da443f29a client: create ACL and Policy cache 2017-09-04 13:05:35 -07:00
Michael Schurter
85b9dd9cce Move migrating state into prevAllocWatcher 2017-08-14 16:02:28 -07:00
Michael Schurter
8c1811911e switch from alloc blocker to new interface
interface has 3 implementations:

1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
2017-08-11 16:21:35 -07:00
Michael Schurter
0f584a0143 initial attempt at refactoring blocked/migrating 2017-08-11 16:21:35 -07:00
Alex Dadgar
da82a6e814 initial watcher 2017-07-07 12:07:08 -07:00
Michael Schurter
2b97f61ac0 Consistently quote alloc ids in client logs 2017-07-06 10:24:52 -07:00
Michael Schurter
4794de99fd Tiny client race condition fix
Plus some logging improvements that may help with #2563
2017-07-05 16:15:19 -07:00
Michael Schurter
e71673e24b Suggest wiping out alloc dir too 2017-07-03 12:29:21 -07:00
Michael Schurter
6b9af8fcc3 Add more logging to restore state errors 2017-07-03 11:58:41 -07:00
Mark Mickan
e418b3e468 Add tests for migrating symlinks in alloc and local directories 2017-06-04 15:56:22 +09:30
Mark Mickan
9e984f429c Include symlinks in snapshots when migrating disks
Fixes #2685
2017-06-04 00:36:18 +09:30
Alex Dadgar
b5fbb1f4cf Fix deadlock 2017-05-31 14:05:47 -07:00
Michael Schurter
236ef21489 Merge pull request #2636 from hashicorp/f-gc-alloc-limit
Add new gc_max_allocs tuneable
2017-05-30 16:14:09 -07:00
Michael Schurter
aac319cd16 Merge pull request #2654 from hashicorp/f-env-consul
Add envconsul-like support and refactor environment handling
2017-05-30 14:40:14 -07:00
Alex Dadgar
f3218378cd Fix perms to just set exec bit 2017-05-25 14:44:13 -07:00
Michael Schurter
a96fb5dbb0 Move task env into execcontext
Also inject PATH into rkt commands since we're no longer appending host
env vars for it.
2017-05-23 13:53:34 -07:00
Michael Schurter
fb72f20bb1 gc_max_allocs should include blocked & migrating 2017-05-12 16:03:22 -07:00
Michael Schurter
cc11d9a563 Add new gc_max_allocs tuneable
More than gc_max_allocs may be running on a node, but terminal allocs
will be garbage collected to try to keep the total number below the
limit.
2017-05-11 17:18:02 -07:00
Alex Dadgar
7fb1b37e09 Fix vet errors 2017-05-11 13:08:08 -07:00
Alex Dadgar
3f1ccf7278 Respond to comments 2017-05-09 10:50:24 -07:00
Alex Dadgar
e22393aeb8 Restore state + upgrade path 2017-05-02 18:21:49 -07:00
Alex Dadgar
85a81f47de Revert "metrics"
This reverts commit 4d6a012c6f.
2017-05-02 09:28:11 -07:00
Alex Dadgar
1d20b11297 Async and sync saving of client state 2017-05-01 16:16:53 -07:00
Alex Dadgar
7dee8ae534 perf 2017-05-01 16:01:50 -07:00
Alex Dadgar
4d6a012c6f metrics 2017-05-01 14:51:27 -07:00
Alex Dadgar
7614feddbd boltDB database for client state 2017-05-01 14:50:34 -07:00
Michael Schurter
10cb924b2c Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar
68ba51c600 Hash host ID so its stable and well distributed
This PR takes the host ID and runs it through a hash so that it is well
distributed. This makes it so that machines that report similar host IDs
are easily distinguished.

Instances of similar IDs occur on EC2 where the ID is prefixed and on
motherboards created in the same batch.

Fixes https://github.com/hashicorp/nomad/issues/2534
2017-04-10 11:44:51 -07:00
Alex Dadgar
d212f6fe94 Track task start/finish time & improve logs errors
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar
d698288f3a Merge pull request #2461 from hashicorp/b-groups
Various fixes for setting user/group of task
2017-03-28 11:13:27 -07:00
Alex Dadgar
564367fa71 Proper reference counting through task restarts
This PR fixes an issue in which the reference count on a Docker image
would become inflated through task restarts.
2017-03-25 17:05:53 -07:00
Alex Dadgar
67c07c9932 Various fixes for setting user/group of task
This PR fixes two issues:
* Folder permissions in -dev mode were incorrect and not suitable for
running as a particular user.
* Was not setting the group membership properly for the launched
process.

Fixes https://github.com/hashicorp/nomad/issues/2160
2017-03-20 14:21:13 -07:00
Alex Dadgar
701537e9c5 Limit parallelism during garbage collection
This PR introduces a parallelism limit during garbage collection. This
is used to avoid large resource usage spikes if garbage collecting many
allocations at once.
2017-03-10 16:27:00 -08:00
Alex Dadgar
4fbe182372 Add metrics to show allocations on the client
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal

Also adds some missing fields to the API version of the evaluation.
2017-03-09 12:37:41 -08:00
Alex Dadgar
07f7e19578 Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
b6991b3357 Allow random UUID 2017-02-27 13:42:37 -08:00