Commit Graph

3765 Commits

Author SHA1 Message Date
Mahmood Ali
6708a0ccc9 Merge pull request #5726 from hashicorp/b-plugins-via-init
Use init() to handle plugin invocation
2019-06-18 21:09:03 -04:00
Mahmood Ali
8fb9d25041 comment on use of init() for plugin handlers 2019-06-18 20:54:55 -04:00
Chris Baker
03500494b2 cleanup test 2019-06-18 14:15:25 +00:00
Chris Baker
7050e14eb5 formatting and clarity 2019-06-18 14:00:57 +00:00
Chris Baker
240d68765c metrics: add namespace label to allocation metrics 2019-06-17 20:50:26 +00:00
Mahmood Ali
eeaa95ddf9 Use init to handle plugin invocation
Currently, nomad "plugin" processes (e.g. executor, logmon, docker_logger) are started as CLI
commands to be handled by command CLI framework.  Plugin launchers use
`discover.NomadBinary()` to identify the binary and start it.

This has few downsides: The trivial one is that when running tests, one
must re-compile the nomad binary as the tests need to invoke the nomad
executable to start plugin.  This is frequently overlooked, resulting in
puzzlement.

The more significant issue with `executor` in particular is in relation
to external driver:

* Plugin must identify the path of invoking nomad binary, which is not
trivial; `discvoer.NomadBinary()` now returns the path to the plugin
rather than to nomad, preventing external drivers from launching
executors.

* The external driver may get a different version of executor than it
expects (specially if we make a binary incompatible change in future).

This commit addresses both downside by having the plugin invocation
handling through an `init()` call, similar to how libcontainer init
handler is done in [1] and recommened by libcontainer [2].  `init()`
will be invoked and handled properly in tests and external drivers.

For external drivers, this change will cause external drivers to launch
the executor that's compiled against.

There a are a couple of downsides to this approach:
* These specific packages (i.e executor, logmon, and dockerlog) need to
be careful in use of `init()`, package initializers.  Must avoid having
command execution rely on any other init in the package.  I prefixed
files with `z_` (golang processes files in lexical order), but ensured
we don't depend on order.
* The command handling is spread in multiple packages making it a bit
less obvious how plugin starts are handled.

[1] drivers/shared/executor/libcontainer_nsenter_linux.go
[2] eb4aeed24f/libcontainer (using-libcontainer)
2019-06-13 16:48:01 -04:00
Jasmine Dahilig
ce55bf5fba Merge pull request #5664 from hashicorp/f-http-hcl-region
backfill region from hcl for jobUpdate and jobPlan
2019-06-13 12:25:01 -07:00
Jasmine Dahilig
c467a94e2b backfill region from job hcl in jobUpdate and jobPlan endpoints
- updated region in job metadata that gets persisted to nomad datastore
- fixed many unrelated unit tests that used an invalid region value
(they previously passed because hcl wasn't getting picked up and
the job would default to global region)
2019-06-13 08:03:16 -07:00
Mahmood Ali
b70cf3f664 Prepare for 0.9.4 dev cycle 2019-06-12 18:47:50 +00:00
Nomad Release bot
c5e8b66c37 Generate files for 0.9.3 release 2019-06-12 16:11:16 +00:00
Danielle
f6757ff106 Merge pull request #5821 from hashicorp/dani/b-5770
trhooks: Add TaskStopHook interface to services
2019-06-12 17:30:49 +02:00
Danielle Lancashire
1aa6bc80d8 trt: Fix test 2019-06-12 17:06:11 +02:00
Danielle Lancashire
5933528404 trhooks: Add TaskStopHook interface to services
We currently only run cleanup Service Hooks when a task is either
Killed, or Exited. However, due to the implementation of a task runner,
tasks are only Exited if they every correctly started running, which is
not true when you recieve an error early in the task start flow, such as
not being able to pull secrets from Vault.

This updates the service hook to also call consul deregistration
routines during a task Stop lifecycle event, to ensure that any
registered checks and services are cleared in such cases.

fixes #5770
2019-06-12 16:00:21 +02:00
Mahmood Ali
7a01a96cc2 Fallback to alloc.TaskResources for old allocs
When a client is running against an old server (e.g. running 0.8),
`alloc.AllocatedResources` may be nil, and we need to check the
deprecated `alloc.TaskResources` instead.

Fixes https://github.com/hashicorp/nomad/issues/5810
2019-06-11 10:32:53 -04:00
Mahmood Ali
41a7fe8530 client/allocrunner: depend on internal task state
Alloc runner already tracks tasks associated with alloc.  Here, we
become defensive by relying on the alloc runner tracked tasks, rather
than depend on server never updating the job unexpectedly.
2019-06-10 18:42:51 -04:00
Mahmood Ali
6ccfd47cdd Merge pull request #5747 from hashicorp/b-test-fixes-20190521-1
More test fixes
2019-06-05 19:09:18 -04:00
Mahmood Ali
d9ac7c25d5 Merge pull request #5737 from fwkz/fix-restart-attempts
Fix restart attempts of `restart` stanza in `delay` mode.
2019-06-05 19:05:07 -04:00
Mahmood Ali
613235d586 Prepare for 0.9.3 dev cycle 2019-06-05 14:54:00 +00:00
Nomad Release bot
028326684b Generate files for 0.9.2 release 2019-06-05 11:59:27 +00:00
Mahmood Ali
5a597f4947 client config flag to disable remote exec
This exposes a client flag to disable nomad remote exec support in
environments where access to tasks ought to be restricted.

I used `disable_remote_exec` client flag that defaults to allowing
remote exec. Opted for a client config that can be used to disable
remote exec globally, or to a subset of the cluster if necessary.
2019-06-03 15:31:39 -04:00
Mahmood Ali
cdc0e04516 remove 0.9.2-rc1 generated code 2019-05-23 11:14:24 -04:00
Nomad Release bot
98e7525131 Generate files for 0.9.2-rc1 release 2019-05-22 19:29:30 +00:00
Michael Schurter
a52c7c2cbf Merge pull request #5731 from hashicorp/b-ignore-dc
client: drop unused DC field from servers list
2019-05-22 08:42:15 -07:00
Mahmood Ali
d1f12fd3cb client: synchronize client.invalidAllocs access
invalidAllocs may be accessed and manipulated from different goroutines,
so must be locked.
2019-05-22 09:37:49 -04:00
Danielle Lancashire
92527c6b4e client: Pass servers contacted ch to allocrunner
This fixes an issue where batch and service workloads would never be
restarted due to indefinitely blocking on a nil channel.

It also raises the restoration logging message to `Info` to simplify log
analysis.
2019-05-22 13:47:35 +02:00
Mahmood Ali
d06aeab087 tests: fix data race in client/allocrunner/taskrunner/template TestTaskTemplateManager_Rerender_Signal
Given that Signal may be called multiple times, blocking for `SignalCh`
isn't sufficient to synchornizing access to Signals field.
2019-05-21 13:56:58 -04:00
Mahmood Ali
bedf483d37 Merge pull request #5739 from hashicorp/r-rm-logmon-syslog-deadcode
logmon: remove syslog server deadcode
2019-05-21 11:46:48 -04:00
Mahmood Ali
fcceaee9b4 Merge pull request #5742 from hashicorp/b-test-fixes-20190520
Grab bag of (primarily race) test fixes
2019-05-21 11:46:36 -04:00
Mahmood Ali
ae1b110864 Merge pull request #5740 from hashicorp/b-nomad-exec-term-race
exec: allow drivers to handle stream termination
2019-05-21 11:24:12 -04:00
Mahmood Ali
af48f7d3dc client: synchronize access to ar.alloc
`allocRunner.alloc` is protected by `allocRunner.allocLock`, so let's
use `allocRunner.Alloc()` helper function to access it.
2019-05-21 09:55:05 -04:00
Mahmood Ali
ea2f96e585 tests: fix fifo lib race
Accidentally accessed outer `err` variable inside a goroutine
2019-05-21 09:49:56 -04:00
Mahmood Ali
56f564fa75 tests: fix data race in client TestDriverManager_Fingerprint_Periodic 2019-05-21 09:49:56 -04:00
Mahmood Ali
5e17907a0c tests: fix client TestFS_Stream data race
Close is invoked in a different goroutine from test
2019-05-21 09:49:56 -04:00
Mahmood Ali
1492d0c49a exec: allow drivers to handle stream termination
Without this change, alloc_endpoint cancel the context passed to handler
when we detect EOF.  This races driver in setting exit code; and we run
into a case where the exec process terminates cleanly yet we attempt to
mark it as failed with context error.

Here, we rely on the driver to handle errors returned from Stream and
without racing to set an error.
2019-05-21 09:40:25 -04:00
Mahmood Ali
42f0099bc2 logmon: remove syslog server deadcode
Remove unused syslog server related code that got replaced by the docker
logger in Nomad 0.9
2019-05-21 09:36:43 -04:00
fwkz
e2d793cfd9 Fix restart attempts of restart stanza.
Number of restarts during 2nd interval is off by one.
2019-05-21 13:27:19 +02:00
Michael Schurter
edd972519f client: drop unused DC field from servers list
See #5730 for details.
2019-05-20 14:19:15 -07:00
Michael Schurter
abd809d60a docs: changelog entry for #5669 and fix comment 2019-05-14 10:54:00 -07:00
Michael Schurter
796c05b9b8 client: register before restoring
Registration and restoring allocs don't share state or depend on each
other in any way (syncing allocs with servers is done outside of
registration).

Since restoring is synchronous, start the registration goroutine first.

For nodes with lots of allocs to restore or close to their heartbeat
deadline, this could be the difference between becoming "lost" or not.
2019-05-14 10:53:27 -07:00
Michael Schurter
6a2792ad90 client: do not restart dead tasks until server is contacted (try 2)
Refactoring of 104067bc2b2002a4e45ae7b667a476b89addc162

Switch the MarkLive method for a chan that is closed by the client.
Thanks to @notnoop for the idea!

The old approach called a method on most existing ARs and TRs on every
runAllocs call. The new approach does a once.Do call in runAllocs to
accomplish the same thing with less work. Able to remove the gate
abstraction that did much more than was needed.
2019-05-14 10:53:27 -07:00
Michael Schurter
e7042b674b client: do not restart dead tasks until server is contacted
Fixes #1795

Running restored allocations and pulling what allocations to run from
the server happen concurrently. This means that if a client is rebooted,
and has its allocations rescheduled, it may restart the dead allocations
before it contacts the server and determines they should be dead.

This commit makes tasks that fail to reattach on restore wait until the
server is contacted before restarting.
2019-05-14 10:53:27 -07:00
Michael Schurter
bb809ea259 client: log when server list changes
Stop logging in the happy path when nothing has changed.
2019-05-13 15:42:55 -07:00
Michael Schurter
4e3468babd Merge pull request #5492 from hashicorp/f-allocated-mem
client: expose allocated memory per task
2019-05-13 13:31:22 -07:00
Lang Martin
a732cd1f06 Merge pull request #5642 from hashicorp/b-network-fingerprinting-ipv4
network fingerprinting multiple IPs on the configured network device
2019-05-13 11:46:53 -04:00
Michael Schurter
3c19af4bfd client: expose allocated memory per task
Related to #4280

This PR adds
`client.allocs.<job>.<group>.<alloc>.<task>.memory.allocated` as a gauge
in bytes to metrics to ease calculating how close a task is to OOMing.

```
'nomad.client.allocs.memory.allocated.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 268435456.000
'nomad.client.allocs.memory.cache.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 5677056.000
'nomad.client.allocs.memory.kernel_max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.kernel_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8908800.000
'nomad.client.allocs.memory.rss.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 876544.000
'nomad.client.allocs.memory.swap.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8208384.000
```
2019-05-10 11:12:12 -07:00
Lang Martin
c7071a12e3 client improve a comment in updateNetworks 2019-05-10 11:25:04 -04:00
Mahmood Ali
5abbee5d39 Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base
nomad exec part 1: plumbing and docker driver
2019-05-09 18:09:27 -04:00
Mahmood Ali
979a6a1778 implement client endpoint of nomad exec
Add a client streaming RPC endpoint for processing nomad exec tasks, by invoking
the relevant task handler for execution.
2019-05-09 16:49:08 -04:00
Preetha
eb7a3bc616 Merge pull request #5654 from hashicorp/b-hearbeat-lockfix
Remove unnecessary locking and serverlist syncing in heartbeats
2019-05-08 13:36:39 -05:00
Preetha Appan
5bfa35ab81 fix typo and add one more test scenario 2019-05-08 10:54:22 -05:00