Commit Graph

737 Commits

Author SHA1 Message Date
Lang Martin
15ffae2798 csi: server-side plugin state tracking and api (#6966)
* structs: CSIPlugin indexes jobs acting as plugins and node updates

* schema: csi_plugins table for CSIPlugin

* nomad: csi_endpoint use vol.Denormalize, plugin requests

* nomad: csi_volume_endpoint: rename to csi_endpoint

* agent: add CSI plugin endpoints

* state_store_test: use generated ids to avoid t.Parallel conflicts

* contributing: add note about registering new RPC structs

* command: agent http register plugin lists

* api: CSI plugin queries, ControllerHealthy -> ControllersHealthy

* state_store: copy on write for volumes and plugins

* structs: copy on write for volumes and plugins

* state_store: CSIVolumeByID returns an unhealthy volume, denormalize

* nomad: csi_endpoint use CSIVolumeDenormalizePlugins

* structs: remove struct errors for missing objects

* nomad: csi_endpoint return nil for missing objects, not errors

* api: return meta from Register to avoid EOF error

* state_store: CSIVolumeDenormalize keep allocs in their own maps

* state_store: CSIVolumeDeregister error on missing volume

* state_store: CSIVolumeRegister set indexes

* nomad: csi_endpoint use CSIVolumeDenormalizePlugins tests
2020-03-23 13:58:29 -04:00
Lang Martin
44a86bc7d1 api: csi 2020-03-23 13:58:29 -04:00
Danielle Lancashire
603a0099b3 api: Include CSI metadata on nodes 2020-03-23 13:58:29 -04:00
Danielle Lancashire
d296efd2c6 CSI Plugin Registration (#6555)
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:

* A `csi_plugin` stanza as part of a Nomad task configuration, to
  allow a task to expose that it is a plugin.

* A new task runner hook: `csi_plugin_supervisor`. This hook does two
  things. When the `csi_plugin` stanza is detected, it will
  automatically configure the plugin task to receive bidirectional
  mounts to the CSI intermediary directory. At runtime, it will then
  perform an initial heartbeat of the plugin and handle submitting it to
  the new `dynamicplugins.Registry` for further use by the client, and
  then run a lightweight heartbeat loop that will emit task events
  when health changes.

* The `dynamicplugins.Registry` for handling plugins that run
  as Nomad tasks, in contrast to the existing catalog that requires
  `go-plugin` type plugins and to know the plugin configuration in
  advance.

* The `csimanager` which fingerprints CSI plugins, in a similar way to
  `drivermanager` and `devicemanager`. It currently only fingerprints
  the NodeID from the plugin, and assumes that all plugins are
  monolithic.

Missing features

* We do not use the live updates of the `dynamicplugin` registry in
  the `csimanager` yet.

* We do not deregister the plugins from the client when they shutdown
  yet, they just become indefinitely marked as unhealthy. This is
  deliberate until we figure out how we should manage deploying new
  versions of plugins/transitioning them.
2020-03-23 13:58:28 -04:00
Jasmine Dahilig
6c1474398f change jobspec lifecycle stanza to use sidecar attribute instead of
block_until status
2020-03-21 17:52:57 -04:00
Jasmine Dahilig
43fab7d46d remove deadline code for now 2020-03-21 17:52:56 -04:00
Jasmine Dahilig
f46f96d8d5 remove api package dependency on structs package 2020-03-21 17:52:55 -04:00
Jasmine Dahilig
4be7d056ac put lifecycle nil and empty checks in api Canonicalize 2020-03-21 17:52:50 -04:00
Jasmine Dahilig
92ef35b41f remove api dependency on structs package, copy lifecycle defaults to api package 2020-03-21 17:52:49 -04:00
Jasmine Dahilig
ae2a4bc796 add canonicalize in the right place 2020-03-21 17:52:41 -04:00
Jasmine Dahilig
8fac2b5094 change TaskLifecycle RunLevel to Hook and add Deadline time duration 2020-03-21 17:52:37 -04:00
Mahmood Ali
a556c0d923 add lifecycle to api and parser 2020-03-21 17:52:36 -04:00
James Rasell
5d5469e6fa Merge pull request #5970 from jrasell/bug-gh-5506
Fix returned EOF error when calling Nodes GC/GcAlloc API
2020-03-12 10:04:17 +01:00
Michael Schurter
d145b395e8 jobspec: fixup vault_grace deprecation
Followup to #7170

- Moved canonicalization of VaultGrace back into `api/` package.
- Fixed tests.
- Made docs styling consistent.
2020-03-10 14:58:49 -07:00
Michael Schurter
64c40af018 Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade
Update consul-template to v0.24.1 and remove deprecated vault grace
2020-03-10 14:15:47 -07:00
Michael Schurter
fac5f9c8e8 Merge pull request #7231 from hashicorp/b-alloc-dev-panic
api: fix panic when displaying devices w/o stat
2020-03-09 07:34:59 -07:00
Mahmood Ali
c50f295629 api: alloc exec recovers from bad client connection
If alloc exec fails to connect to the nomad client associated with the
alloc, fail over to using a server.

The code attempted to special case `net.Error` for failover to rule out
other permanent non-networking errors, by reusing a pattern in the
logging handling.

But this pattern does not apply here.  `net/http.Http` wraps all errors
as `*url.Error` that is net.Error.  The websocket doesn't, and instead
returns the raw error.  If the raw error isn't a `net.Error`, like in
the case of TLS handshake errors, the api package would fail immediately
rather than failover.
2020-03-04 17:43:00 -05:00
Michael Schurter
ab4950b684 api: fix panic when displaying devices w/o stat
"<none>" mathces `node status -verbose` output
2020-02-26 21:24:31 -05:00
Fredrik Hoem Grelland
26cca14f27 Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170) 2020-02-23 16:24:53 +01:00
James Rasell
e1545d718f Fix panic when canonicalizing a jobspec with incorrect job type.
When canonicalizing the ReschedulePolicy a panic was possible if
the passed job type was not valid. This change protects against
this possibility, in a verbose way to ensure the code path is
clear.
2020-02-21 09:14:36 +01:00
James Rasell
d890ddbfd9 api: check response content length before decoding.
The API decodeBody function will now check the content length
before attempting to decode. If the length is zero, and the out
interface is nil then it is safe to assume the API call is not
returning any data to the user. This allows us to better handle
passing nil to API calls in a single place.
2020-02-20 10:07:44 +01:00
Mahmood Ali
1d9ffa640b implement MinQuorum 2020-02-16 16:04:59 -06:00
Seth Hoenig
6bfd86b1f8 client: enable configuring enable_tag_override for services
Consul provides a feature of Service Definitions where the tags
associated with a service can be modified through the Catalog API,
overriding the value(s) configured in the agent's service configuration.

To enable this feature, the flag enable_tag_override must be configured
in the service definition.

Previously, Nomad did not allow configuring this flag, and thus the default
value of false was used. Now, it is configurable.

Because Nomad itself acts as a state machine around the the service definitions
of the tasks it manages, it's worth describing what happens when this feature
is enabled and why.

Consider the basic case where there is no Nomad, and your service is provided
to consul as a boring JSON file. The ultimate source of truth for the definition
of that service is the file, and is stored in the agent. Later, Consul performs
"anti-entropy" which synchronizes the Catalog (stored only the leaders). Then
with enable_tag_override=true, the tags field is available for "external"
modification through the Catalog API (rather than directly configuring the
service definition file, or using the Agent API). The important observation
is that if the service definition ever changes (i.e. the file is changed &
config reloaded OR the Agent API is used to modify the service), those
"external" tag values are thrown away, and the new service definition is
once again the source of truth.

In the Nomad case, Nomad itself is the source of truth over the Agent in
the same way the JSON file was the source of truth in the example above.
That means any time Nomad sets a new service definition, any externally
configured tags are going to be replaced. When does this happen? Only on
major lifecycle events, for example when a task is modified because of an
updated job spec from the 'nomad job run <existing>' command. Otherwise,
Nomad's periodic re-sync's with Consul will now no longer try to restore
the externally modified tag values (as long as enable_tag_override=true).

Fixes #2057
2020-02-10 08:00:55 -06:00
Seth Hoenig
0040c75e8e command, docs: create and document consul token configuration for connect acls (gh-6716)
This change provides an initial pass at setting up the configuration necessary to
enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul
Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert`
commands (similar to Vault tokens).

These values are not actually used yet in this changeset.
2020-01-31 19:02:53 -06:00
Drew Bailey
2dbcad3f45 fix tests, update changelog 2020-01-29 13:55:39 -05:00
Nick Ethier
64f4e9e691 consul: add support for canary meta 2020-01-27 09:53:30 -05:00
Drew Bailey
a58b8a5e9c refactor api profile methods
comment why we ignore errors parsing params
2020-01-09 15:15:12 -05:00
Drew Bailey
ad86438fc0 adds qc param, address pr feedback 2020-01-09 15:15:11 -05:00
Drew Bailey
549045fcbb Rename profile package to pprof
Address pr feedback, rename profile package to pprof to more accurately
describe its purpose. Adds gc param for heap lookup profiles.
2020-01-09 15:15:10 -05:00
Drew Bailey
1776458956 address pr feedback 2020-01-09 15:15:09 -05:00
Drew Bailey
cd7652fed8 comments for api usage of agent profile 2020-01-09 15:15:09 -05:00
Drew Bailey
328075591f region forwarding; prevent recursive forwards for impossible requests
prevent region forwarding loop, backfill tests

fix failing test
2020-01-09 15:15:06 -05:00
Drew Bailey
b0410a4792 api agent endpoints
helper func to return serverPart based off of serverID
2020-01-09 15:15:05 -05:00
Drew Bailey
240c0ee0ec agent pprof endpoints
wip, agent endpoint and client endpoint for pprof profiles

agent endpoint test
2020-01-09 15:15:02 -05:00
Mahmood Ali
792fe74fc0 Merge pull request #6831 from hashicorp/add_inmemory_certificate
Add option to set certificate in-memory
2019-12-19 08:54:32 -05:00
Drew Bailey
672b76056b shutdown delay for task groups
copy struct values

ensure groupserviceHook implements RunnerPreKillhook

run deregister first

test that shutdown times are delayed

move magic number into variable
2019-12-16 11:38:16 -05:00
Michel Vocks
8439654c0c Add raw field for ClientCert and ClientKey 2019-12-16 14:30:00 +01:00
Michel Vocks
1ca70ac86c Update go mod 2019-12-16 12:47:10 +01:00
Michel Vocks
3d9701f6f1 Add option to set certificate in-memory via SDK 2019-12-16 10:59:27 +01:00
Michael Schurter
9d571322d3 Merge pull request #6370 from pmcatominey/tls-server-name
command: add -tls-server-name flag
2019-11-20 08:44:54 -08:00
Michael Schurter
75d6d4ec5e core: add semver constraint
The existing version constraint uses logic optimized for package
managers, not schedulers, when checking prereleases:

- 1.3.0-beta1 will *not* satisfy ">= 0.6.1"
- 1.7.0-rc1 will *not* satisfy ">= 1.6.0-beta1"

This is due to package managers wishing to favor final releases over
prereleases.

In a scheduler versions more often represent the earliest release all
required features/APIs are available in a system. Whether the constraint
or the version being evaluated are prereleases has no impact on
ordering.

This commit adds a new constraint - `semver` - which will use Semver
v2.0 ordering when evaluating constraints. Given the above examples:

- 1.3.0-beta1 satisfies ">= 0.6.1" using `semver`
- 1.7.0-rc1 satisfies ">= 1.6.0-beta1" using `semver`

Since existing jobspecs may rely on the old behavior, a new constraint
was added and the implicit Consul Connect and Vault constraints were
updated to use it.
2019-11-19 08:40:19 -08:00
Luiz Aoqui
10241039d4 api: add StartedAt in Node.DrainStrategy 2019-11-13 17:54:40 -05:00
Mahmood Ali
f118def827 api: go-uuid is no longer needed 2019-11-12 11:02:33 -05:00
Mahmood Ali
7f027a68ea api: avoid depending on helper internal package 2019-11-12 11:02:33 -05:00
Chris Raborg
ddfa9a8ad5 Update MonitorDrain comment to indicate channel is closed on errors (#6671)
Fixes #6645
2019-11-11 14:15:17 -05:00
Drew Bailey
d91a5e619f update test 2019-11-08 15:49:04 -05:00
Drew Bailey
8c891fcb94 switch to uuid helper package 2019-11-08 09:28:06 -05:00
Drew Bailey
03a4f59a05 Remove response body from websocket error
If a websocket connection errors we currently return the error with a
copy of the response body. The response body from the websocket can
often times be completely illegible so remove it from the error string.

make alloc id empty for more reliable failure

un-gzip if content encoding header present
2019-11-08 09:28:02 -05:00
Ben Barnard
a56b880570 Escape job ID in API requests (#2411)
Jobs can be created with user-provided IDs containing any character
except spaces. The jobId needs to be escaped when used in a request
path, otherwise jobs created with names such as "why?" can't be managed
after they are created.
2019-11-07 08:35:39 -05:00
James Rasell
c01e495aa3 Remove trailing dot on drain message to ensure better consistency. (#5956) 2019-11-05 16:53:38 -05:00