Commit Graph

2644 Commits

Author SHA1 Message Date
Alex Dadgar
ea0eba6e46 Merge pull request #3559 from hashicorp/b-metrics
Don't emit metrics for non-running tasks
2017-11-17 10:33:23 -08:00
Michael Schurter
96f56cebf8 Merge pull request #3562 from hashicorp/b-3561-rkt-rm
Remove rkt pods when exiting
2017-11-16 17:30:21 -08:00
Michael Schurter
d1548863d4 Merge pull request #3551 from hashicorp/b-3419-docker-409-bug
Fix Docker name conflict bug by updating dockerclient
2017-11-16 16:38:54 -08:00
Michael Schurter
9929ac2382 Improve rktRemove error message 2017-11-16 15:45:14 -08:00
Michael Schurter
ce3fbb62b7 Remove rkt pods when exiting
Fixes #3561
2017-11-16 14:33:44 -08:00
Charlie Voiselle
24c8d2c439 Merge pull request #3556 from angrycub/f-fingerprint-log-level
Dropped loglevel for AWS fingerprinter env read misses to DEBUG
2017-11-16 16:27:25 -05:00
Charlie Voiselle
83aa6251c1 Lowered to DEBUG from AD feedback 2017-11-16 14:13:03 -05:00
Alex Dadgar
d427ab70c1 Only publish metric when the task is running and dev mode publishes metrics 2017-11-15 13:21:06 -08:00
Alex Dadgar
167c81ab6c Merge pull request #3546 from hashicorp/f-heuristic
Better interface selection heuristic
2017-11-15 12:51:21 -08:00
Alex Dadgar
00557afb81 Use interface attached to default route 2017-11-15 11:32:32 -08:00
Michael Schurter
0de0e1d342 Handle leader task being dead in RestoreState
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932

While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*

The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Charlie Voiselle
3a88718b2e Dropped loglevel for AWS fingerprinter env reads
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally).  Setting log level to
INFO seems more consistent with this possibility.
2017-11-15 18:20:59 +00:00
Chelsea Komlo
fa9fd4422c Nomad agent reload TLS configuration on SIGHUP (#3479)
* Allow server TLS configuration to be reloaded via SIGHUP

* dynamic tls reloading for nomad agents

* code cleanup and refactoring

* ensure keyloader is initialized, add comments

* allow downgrading from TLS

* initalize keyloader if necessary

* integration test for tls reload

* fix up test to assert success on reloaded TLS configuration

* failure in loading a new TLS config should remain at current

Reload only the config if agent is already using TLS

* reload agent configuration before specific server/client

lock keyloader before loading/caching a new certificate

* introduce a get-or-set method for keyloader

* fixups from code review

* fix up linting errors

* fixups from code review

* add lock for config updates; improve copy of tls config

* GetCertificate only reloads certificates dynamically for the server

* config updates/copies should be on agent

* improve http integration test

* simplify agent reloading storing a local copy of config

* reuse the same keyloader when reloading

* Test that server and client get reloaded but keep keyloader

* Keyloader exposes GetClientCertificate as well for outgoing connections

* Fix spelling

* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter
b37a03a458 Add a test demonstrating the bug
Fails on Docker 17.09, passes on Docker 17.06 and earlier
2017-11-14 15:25:52 -08:00
Alex Dadgar
35cb143965 Better interface selection heuristic
This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.

Fixes https://github.com/hashicorp/nomad/issues/3487
2017-11-13 15:13:43 -08:00
Preetha Appan
8e70fd812a Make device mounting unit test verify configuration via docker inspect 2017-11-13 09:56:54 -06:00
Preetha Appan
b2eeab1b8c Unit test (linux only) that tests mounting a device in the docker driver 2017-11-13 09:56:54 -06:00
Preetha Appan
929a781ae4 Add default value for cgroup permissions for device if not set 2017-11-13 09:56:54 -06:00
Preetha Appan
b0c03e45ff Remove unnecessary check since validate method already checks this 2017-11-13 09:56:54 -06:00
Preetha Appan
85c5218b78 Add support for passing device into docker driver 2017-11-13 09:56:54 -06:00
Alex Dadgar
da852ea653 alway load all templates 2017-11-10 12:35:51 -08:00
Alex Dadgar
6f0c9696ec Handle multiple environment templates
Fixes https://github.com/hashicorp/nomad/issues/3498
2017-11-10 11:08:19 -08:00
Alex Dadgar
80b434d467 Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown
Qemu driver: graceful shutdown feature
2017-11-03 16:41:34 -07:00
Michael Schurter
e93d625d44 Remove noisy log line
Didn't mean to commit this
2017-11-03 16:00:30 -07:00
Matt Mercer
f734d842f5 Qemu driver: clean up logging; fail unsupported features on Windows 2017-11-03 15:40:20 -07:00
Alex Dadgar
a94bab6491 fix spelling mistake 2017-11-03 15:04:59 -07:00
Alex Dadgar
11184c7514 Merge pull request #3459 from multani/docker-oom-notification
docker: log that a container has been killed by the OOM killer
2017-11-03 13:24:03 -07:00
Matt Mercer
66f9840dd0 Qemu driver: tweaks in response to PR feedback
Remove attribute for long qemu monitor path; misc cleanup; update tests
2017-11-03 11:28:56 -07:00
Preetha Appan
b3631f3d32 Remove event GenericSource, and address other code review comments. Also added deprecation info in comments. 2017-11-03 10:10:06 -05:00
Preetha Appan
d63e693679 Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details. 2017-11-03 09:13:01 -05:00
Alex Dadgar
c15f49ae8d Alloc Runner doesn't panic on restoration. 2017-11-02 16:14:13 -07:00
Alex Dadgar
52598bff7e Merge pull request #3493 from hashicorp/f-remove-atlas
Remove Atlas and Scada from codebase
2017-11-02 16:00:44 -07:00
Michael Schurter
4bca2cd669 Merge pull request #3490 from hashicorp/f-gc-logging
Make unable-to-gc log level adaptive
2017-11-02 14:32:40 -07:00
Diptanu Choudhury
5d36408475 Added the node_id as a tag 2017-11-02 13:29:10 -07:00
Alex Dadgar
53dbc4f127 remove atlas 2017-11-02 11:27:21 -07:00
Michael Schurter
cb3a03c829 Make unable-to-gc log level adaptive
WARNing when someone has over 50 non-terminal allocs was just too
confusing.

Tested manually with `gc_max_allocs = 10` and bumping a job from `count
= 19` to `count = 21`:

```
2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations
...
2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations
```
2017-11-02 10:57:42 -07:00
Diptanu Choudhury
103ff5526e Added support for tagged metrics 2017-11-02 10:07:57 -07:00
Diptanu Choudhury
9593e12672 Incrementing the start counter when we are actually starting a container 2017-11-02 09:51:20 -07:00
Diptanu Choudhury
0bade76fd5 Recording counter for dead allocs properly 2017-11-02 09:51:20 -07:00
Diptanu Choudhury
45583d757e Added metrics to track task/alloc start/restarts/dead events 2017-11-02 09:51:20 -07:00
Matt Mercer
185658507f Qemu driver: defer cleanup sooner 2017-11-01 17:37:43 -07:00
Matt Mercer
15d7565931 Qemu driver: clean up test logging; retry integration test for longer 2017-11-01 17:21:56 -07:00
Matt Mercer
60030d89d1 Use strings.Replace() instead of custom function 2017-11-01 15:31:35 -07:00
Matt Mercer
2924bada55 Qemu driver: basic testing of graceful shutdown feature 2017-11-01 15:31:30 -07:00
Matt Mercer
1ff97035f0 Qemu driver: include PIDs in log output 2017-11-01 15:31:24 -07:00
Matt Mercer
200a12cbcc Qemu driver: ensure proper cleanup of resources 2017-11-01 15:31:20 -07:00
Matt Mercer
22f390d75a Qemu driver: minor logging fixes 2017-11-01 15:31:14 -07:00
Matt Mercer
3f6fdfcb9b Standardize driver.qemu logging prefix 2017-11-01 15:30:44 -07:00
Matt Mercer
00e3cc869d Qemu driver: add graceful shutdown feature 2017-11-01 15:30:36 -07:00
Michael Schurter
ec43315e13 Fix regression by returning error on unknown alloc 2017-11-01 15:16:38 -05:00