Commit Graph

208 Commits

Author SHA1 Message Date
Nick Ethier
387b016ac4 client: improve group service stanza interpolation and check_re… (#6586)
* client: improve group service stanza interpolation and check_restart support

Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.

The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
2019-11-18 13:04:01 -05:00
Michael Schurter
0fcb0d4016 client: fix panic from 0.8 -> 0.10 upgrade
makeAllocTaskServices did not do a nil check on AllocatedResources
which causes a panic when upgrading directly from 0.8 to 0.10. While
skipping 0.9 is not supported we intend to fix serious crashers caused
by such upgrades to prevent cluster outages.

I did a quick audit of the client package and everywhere else that
accesses AllocatedResources appears to be properly guarded by a nil
check.
2019-11-01 07:47:03 -07:00
Seth Hoenig
b7e83591b4 connect: enable setting tags on consul connect sidecar service in jobspec (#6415) 2019-10-17 19:25:20 +00:00
Tim Gross
4f687cfc49 client/connect: ConsulProxy LocalServicePort/Address (#6358)
Without a `LocalServicePort`, Connect services will try to use the
mapped port even when delivering traffic locally. A user can override
this behavior by pinning the port value in the `service` stanza but
this prevents us from using the Consul service name to reach the
service.

This commits configures the Consul proxy with its `LocalServicePort`
and `LocalServiceAddress` fields.
2019-09-23 14:30:48 -04:00
Tim Gross
6a9911d9aa remove resolved TODO from UpdateTTL docstring (#6336) 2019-09-16 16:26:06 -04:00
Tim Gross
40368d2c63 support script checks for task group services (#6197)
In Nomad prior to Consul Connect, all Consul checks work the same
except for Script checks. Because the Task being checked is running in
its own container namespaces, the check is executed by Nomad in the
Task's context. If the Script check passes, Nomad uses the TTL check
feature of Consul to update the check status. This means in order to
run a Script check, we need to know what Task to execute it in.

To support Consul Connect, we need Group Services, and these need to
be registered in Consul along with their checks. We could push the
Service down into the Task, but this doesn't work if someone wants to
associate a service with a task's ports, but do script checks in
another task in the allocation.

Because Nomad is handling the Script check and not Consul anyways,
this moves the script check handling into the task runner so that the
task runner can own the script check's configuration and
lifecycle. This will allow us to pass the group service check
configuration down into a task without associating the service itself
with the task.

When tasks are checked for script checks, we walk back through their
task group to see if there are script checks associated with the
task. If so, we'll spin off script check tasklets for them. The
group-level service and any restart behaviors it needs are entirely
encapsulated within the group service hook.
2019-09-03 15:09:04 -04:00
Evan Ercolano
859861817d Remove unused canary param from MakeTaskServiceID 2019-08-31 16:53:23 -04:00
Michael Schurter
c5023d2cdb consul: ignore connect services when syncing
Consul registers Connect services automatically, however Nomad thinks it
owns them due to the _nomad prefix. Since the services are managed by
Consul, Nomad needs to explicitly ignore them or otherwies they will be
removed.
2019-08-30 11:53:41 -07:00
Jerome Gravel-Niquet
25e38c8257 Consul service meta (#6193)
* adds meta object to service in job spec, sends it to consul

* adds tests for service meta

* fix tests

* adds docs

* better hashing for service meta, use helper for copying meta when registering service

* tried to be DRY, but looks like it would be more work to use the
helper function
2019-08-23 12:49:02 -04:00
Michael Schurter
43d89f864e connect: task hook for bootstrapping envoy sidecar
Fixes #6041

Unlike all other Consul operations, boostrapping requires Consul be
available. This PR tries Consul 3 times with a backoff to account for
the group services being asynchronously registered with Consul.
2019-08-22 08:15:32 -07:00
Michael Schurter
eeacb87f3b connect: register group services with Consul
Fixes #6042

Add new task group service hook for registering group services like
Connect-enabled services.

Does not yet support checks.
2019-08-20 12:25:10 -07:00
Michael Schurter
858d18df04 Merge pull request #5975 from hashicorp/b-check-watcher-deadlock
consul: fix deadlock in check-based restarts
2019-07-18 13:13:40 -07:00
Michael Schurter
17636727e1 consul: add test for check watcher deadlock 2019-07-18 08:24:09 -07:00
Michael Schurter
b4b2b42e28 Update command/agent/consul/check_watcher.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-07-18 07:08:27 -07:00
Michael Schurter
9c418c224b consul: fix deadlock in check-based restarts
Fixes #5395
Alternative to #5957

Make task restarting asynchronous when handling check-based restarts.
This matches the pre-0.9 behavior where TaskRunner.Restart was an
asynchronous signal. The check-based restarting code was not designed
to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and
could easily overwhelm the buffered update chan and deadlock.

Many thanks to @byronwolfman for his excellent debugging, PR, and
reproducer!

I created this alternative as changing the functionality of
TaskRunner.Restart has a much larger impact. This approach reverts to
old known-good behavior and minimizes the number of places changes are
made.
2019-07-17 15:22:21 -07:00
Mahmood Ali
121c97475d address review feedback 2019-07-17 10:43:13 +07:00
Mahmood Ali
90d488bcb5 Avoid de-registering slowly restored services
When a nomad client restarts/upgraded, nomad restores state from running
task and starts the sync loop.  If sync loop runs early, it may
deregister services from Consul prematurely even when Consul has the
running service as healthy.

This is not ideal, as re-registering the service means potentially
waiting a whole service health check interval before declaring the
service healthy.

We attempt to mitigate this by introducing an initialization probation
period.  During this time, we only deregister services and checks that
were explicitly deregistered, and leave unrecognized ones alone.  This
serves as a grace period for restoring to complete, or for operators to
restore should they recognize they restored with the wrong nomad data
directory.
2019-06-14 11:15:21 -04:00
Danielle Lancashire
efdfef868f consul: Include port-label in service registration
It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.

We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.

This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:

```hcl
service {
  name = "redis-cache"
  tags = ["global", "cache"]
  port = "db"
  check {
    name     = "alive"
    type     = "tcp"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "foo"]
  port = "foo"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "bar"]
  port = "bar"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}
```

in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label however.
2019-06-13 15:24:54 +02:00
Mahmood Ali
12114354ab Merge pull request #5668 from hashicorp/flaky-test-20190430
fix flaky test by allowing for call invocation overhead
2019-05-13 12:33:44 -04:00
Danielle Lancashire
3d3b2ca239 consul: Document example check id 2019-05-09 13:22:22 +02:00
Mahmood Ali
538b5c3eba fix flaky test by allowing for call invocation overhead 2019-05-08 18:04:37 -04:00
Danielle Lancashire
14c231aeb9 consul: Do not deregister external checks
This commit causes sync to skip deregistering checks that are not
managed by nomad, such as service maintenance mode checks.  This is
handled in the same way as service registrations - by doing a Nomad
specific prefix match.
2019-05-02 16:54:18 +02:00
Danielle Lancashire
195529efd4 consul: Use a stable identifier for services
The current implementation of Service Registration uses a hash of the
nomad-internal state of a service to register it with Consul, this means that
any update to the service invalidates this name and we then deregister, and
recreate the service in Consul.

While this behaviour slightly simplifies reasoning about service registration,
this becomes problematic when we add consul health checks to a service. When
the service is re-registered, so are the checks, which default to failing for
at least one check period.

This commit migrates us to using a stable identifier based on the
allocation, task, and service identifiers, and uses the difference
between the remote and local state to decide when to push updates.

It uses the existing hashing mechanic to decide when UpdateTask should
regenerate service registrations for providing to Sync, but this should
be removable as part of a future refactor.

It additionally introduces the _nomad-check- prefix for check
definitions, to allow for future allowing of consul features like
maintenance mode.
2019-05-02 16:54:18 +02:00
Michael Schurter
3e52a6aacf consul: squelch noisy useless logs
Only log when syncing actually did something.
2019-02-04 11:07:57 -08:00
Michael Schurter
ceee126241 Remove old comment; it's been fixed! 2019-01-14 09:56:53 -08:00
Mahmood Ali
c0162fab35 move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Nick Ethier
39ca1b00dd client/drivermananger: add driver manager
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.

Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
2018-12-18 22:55:18 -05:00
Michael Schurter
383c85ae6f consul: add ScriptExecutor context wrapper
Since d335a82859 ScriptExecutors now take
a timeout duration instead of a context. This broke the script check
removal code which used context cancelation propagation to remove
script checks while they were executing.

This commit adds a wrapper around ScriptExecutors that obeys context
cancelation again. The only downside is that it leaks a goroutine until
the underlying Exec call completes or timeouts.

Since check removal is relatively rare, check timeouts usually low, and
scripts usually fast, the risk of leaking a goroutine seems very small.
2018-12-03 20:26:31 -08:00
Michael Schurter
104bbf78d9 consul: fix script checks exiting after 1 run
Fixes a regression caused in d335a82859

The removal of the inner context made the remaining cancels cancel the
outer context and cause script checks to exit prematurely.
2018-12-03 18:50:02 -08:00
Alex Dadgar
429c5bb885 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Preetha Appan
3cf22d2903 Pass service metadata "external-source" for consul UI integration 2018-11-16 11:28:56 -06:00
Mahmood Ali
2357e886ce mark and skip failing consul failing tests 2018-11-13 10:21:40 -05:00
Michael Schurter
d2e48e35c0 tests: get consul integration tests building 2018-11-05 12:32:05 -08:00
Michael Schurter
0b4e15c366 tests: more fixes due to api changes 2018-10-29 15:25:22 -07:00
Michael Schurter
2361c1904b tests: get tests building if not yet passing 2018-10-16 16:56:57 -07:00
Nick Ethier
ea9ed2282e client: refactor post allocrunnerv2 finalization 2018-10-16 16:56:56 -07:00
Nick Ethier
d335a82859 client: begin driver plugin integration
client: fingerprint driver plugins
2018-10-16 16:56:56 -07:00
Alex Dadgar
627e20801d Fix lints 2018-10-16 16:56:56 -07:00
Michael Schurter
76194c7414 consul service hook
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Alex Dadgar
0f2f4797cb fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar
32f9da9e07 small fixes 2018-09-15 16:42:38 -07:00
Alex Dadgar
40d095fd1a agent + consul 2018-09-13 10:43:40 -07:00
Alex Dadgar
98c7abe541 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Alex Dadgar
d94bf14e13 Fix gc tests + parallel destroy + small test fixes 2018-06-12 10:23:45 -07:00
Preetha Appan
a7668cd4ec Fix tests and move isClient to constructor 2018-06-01 15:59:53 -05:00
Preetha Appan
e27caadca6 Fix unnecessary deregistration in consul sync
This commit fixes an issue where if a nomad client and server shared the same consul instance, the server would deregister any services and checks registered by clients for running tasks.
2018-06-01 14:48:25 -05:00
Alex Dadgar
796e623dde Use Tags when CanaryTags isn't specified
This PR fixes a bug where we weren't defaulting to `tags` when
`canary_tags` was empty and adds documentation.
2018-05-23 13:07:47 -07:00
Michael Schurter
435a6bddce consul: remove services with/without canary tags
Guard against Canary being set to false at the same time as an
allocation is being stopped: this could cause RemoveTask to be called
with the wrong Canary value and leaking a service.

Deleting both Canary values is the safest route.
2018-05-07 14:55:01 -05:00
Michael Schurter
17c6eb8629 consul: support canary tags for services
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.

Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Michael Schurter
905bef8f2d consul: make grpc checks more like http checks 2018-05-04 11:08:11 -07:00