Commit Graph

19162 Commits

Author SHA1 Message Date
Lang Martin
9e7d044fcf debug command archive content changes (#8462)
* command/debug: print interval data so the operator knows its waiting

* command/debug: use the Consul/Vault env for queries

* command/debug: capture the operator endpoints

* command/debug: capture API errors in the archive bundle
2020-08-11 13:14:28 -04:00
Lang Martin
c0bf46da1e CSI RPC Token (#8626)
* client/allocrunner/csi_hook: use the Node SecretID
* client/allocrunner/csi_hook: include the namespace for Claim
2020-08-11 13:08:39 -04:00
Lang Martin
8a095fca90 CSI: volume and plugin allocations in the API (#8590)
* command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints
2020-08-11 12:24:41 -04:00
Tim Gross
8888ab380b msgpack-rpc errors cannot be wrapped (#8633)
Our RPC calls mangle the errors we get, which prevents us from using wrapped
errors and `errors.Is`.

Also fixes log message fields.
2020-08-11 10:25:43 -04:00
Tim Gross
fbefdb98c3 csi: nomad volume detach command (#8584)
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.

The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
2020-08-11 10:18:54 -04:00
Tim Gross
d21ef34cbc RPC errors must be wrapped in order to wrap internal errors (#8632)
The CSI client RPC uses error wrapping to detect the type of error bubbling up
from plugins, but if the errors we get aren't wrapped at each layer, we can't
unwrap the inner error.

Also eliminates some unused args.
2020-08-11 09:13:52 -04:00
James Rasell
55821da83b docs: fix minor formatting issues on docker driver page. (#8578)
Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>
2020-08-11 12:41:50 +02:00
Lang Martin
083ec181f8 nomad/state/state_store: two cases of incorrect CSIPlugin in-place (#8630) 2020-08-10 18:15:29 -04:00
Tim Gross
c097c40776 changelog entries for 0.12.2 CSI work (#8620) 2020-08-10 16:47:26 -04:00
Tim Gross
297bef8295 e2e: spread CSI controller plugins across multiple DCs (#8629)
Controller plugins that land on the same node will collide over their CSI
`mount_dir`, so give them enough room in our tests that they don't land on the
same host.

Also, version bump the EBS node plugins to match the controllers.
2020-08-10 16:41:39 -04:00
Seth Hoenig
b515fe89eb Merge pull request #8614 from hashicorp/f-consul-passfail
consul: able to set pass/fail thresholds on consul service checks
2020-08-10 14:25:22 -05:00
Seth Hoenig
a22a7caf71 consul: clarify consecutive checks in docs 2020-08-10 14:08:09 -05:00
Seth Hoenig
9a49740230 consul: validate script type when ussing check thresholds 2020-08-10 14:08:09 -05:00
Seth Hoenig
6d7476cd3d consul: grrrrr hclfmt test resource file 2020-08-10 14:08:09 -05:00
Seth Hoenig
e664f9b69a consul: able to set pass/fail thresholds on consul service checks
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
  https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical

Nomad doesn't do much besides pass the fields through to Consul.

Fixes #6913
2020-08-10 14:08:09 -05:00
Tim Gross
9e9ca50842 changelog entry for MRD bugfix (#8625) 2020-08-10 14:29:30 -04:00
Seth Hoenig
e5c314811b Merge pull request #8624 from hashicorp/docs-cl-cnbridge
docs: add cl entry for connect native bridge mode
2020-08-10 13:24:16 -05:00
Drew Bailey
c08555247f update vault integration docs (#8543)
* update vault integration docs

docs/integrations/vault-integration was a copy of the learn guide. Remove that and move /docs/vault-integration to this location instead

fix link

fix link

Update website/pages/docs/integrations/vault-integration.mdx

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>

Update website/pages/docs/integrations/vault-integration.mdx

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>

Update website/pages/docs/integrations/vault-integration.mdx

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>

Update website/pages/docs/integrations/vault-integration.mdx

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>

Update website/pages/docs/integrations/vault-integration.mdx

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>

Update website/pages/docs/integrations/vault-integration.mdx

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>

Update website/pages/docs/integrations/vault-integration.mdx

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>

Update website/pages/docs/integrations/vault-integration.mdx

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>

* revert accidental deletion

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
2020-08-10 14:23:43 -04:00
Seth Hoenig
b822e6ee54 docs: add cl entry for connect native bridge mode 2020-08-10 13:21:26 -05:00
Tim Gross
bf67737f56 csi: missing plugins during node delete are not an error (#8619)
When deregistering a client, CSI plugins running on that client may not get a
chance to fingerprint before being stopped. Account for the case where a
plugin allocation is the last instance of the plugin and has been deleted from
the state store to avoid errors during node deregistration.
2020-08-10 11:02:01 -04:00
Tim Gross
7dd307dc8c e2e: CSI EBS test should expect 2 controllers (#8617) 2020-08-10 09:41:21 -04:00
Tim Gross
0f878159e4 e2e: CSI EBS version bump to 0.6.0 (#8618) 2020-08-10 09:41:13 -04:00
Mahmood Ali
867ad1d8d8 Merge pull request #8613 from alrs/state-test-errs
nomad/state: fix dropped scaling_policy test errors
2020-08-10 08:14:19 -04:00
Lars Lehtonen
3ee54e3920 nomad/state: fix dropped scaling_policy test errors 2020-08-07 23:05:33 -07:00
Michael Schurter
5da78b72da Merge pull request #8601 from hashicorp/build-go1.14.7
build: update from Go 1.14.6 to Go 1.14.7
2020-08-07 15:30:33 -07:00
Kent 'picat' Gruber
387426350b Merge pull request #8451 from hashicorp/getting-started-on-gcp
Add Getting Started with Nomad on GCP Documentation
2020-08-07 18:20:36 -04:00
Charlie Voiselle
9f051ff33c Merge pull request #8612 from hashicorp/docs-cherrypick-be0ed7d
Removed version tag
2020-08-07 17:58:52 -04:00
Charlie Voiselle
94c500c893 Removed version tag
In order to prevent staleness, changed driver links to point to releases page rather than a specific version.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2020-08-07 17:53:36 -04:00
Charlie Voiselle
4bb52a09f5 [docs] Updating install instructions to add packages. (#8534)
* Updated install, added packages
* Apply suggestions from code review

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2020-08-07 16:27:17 -04:00
Tim Gross
33e7d16139 CSI: fix missing ACL tokens for leader-driven RPCs (#8607)
The volumewatcher and GC job in the leader can't make CSI RPCs when ACLs are
enabled without the leader ACL token being passed thru.
2020-08-07 15:37:27 -04:00
Michael Schurter
680dcd620d Merge pull request #8608 from hashicorp/docs-release-0.11.4
docs: add v0.11.4 release to master changelog
2020-08-07 12:16:13 -07:00
Michael Schurter
79a8f01c41 docs: add v0.11.4 release to master changelog 2020-08-07 12:11:57 -07:00
Michael Lange
db363cc3e3 Merge pull request #8593 from hashicorp/f-ui/scaling-events-chart
UI: Task group scaling timeline
2020-08-07 10:35:00 -07:00
Kevin Pruett
7834e022d7 Merge pull request #8592 from pruett/pruett.search
Implement search via Algolia
2020-08-07 12:58:12 -04:00
Kevin Pruett
e3fcdd1e98 Implement search via Algolia 2020-08-07 12:36:38 -04:00
Tim Gross
135041375e docs: always use -ignore-system on node drain with CSI (#8606)
Postrun hooks for allocation runners don't currently block the registration of
terminal health with the servers, which is what allows system jobs to be
drained. So draining nodes with jobs that claim CSI volumes requires the
`-ignore-system` job to ensure that the postrun hook for service jobs gets a
chance to execute.
2020-08-07 11:22:28 -04:00
Tim Gross
079f60cd63 csi: client RPCs should return wrapped errors for checking (#8605)
When the client-side actions of a CSI client RPC succeed but we get
disconnected during the RPC or we fail to checkpoint the claim state, we want
to be able to retry the client RPC without getting blocked by the client-side
state (ex. mount points) already having been cleaned up in previous calls.
2020-08-07 11:01:36 -04:00
Tim Gross
a264dd29ad csi: controller unpublish should check current alloc count (#8604)
Using the count of node claims from earlier in the `CSIVolume.Unpublish RPC
doesn't correctly account for cases where the RPC was interrupted but
checkpointed. Instead, we'll check the current allocation count and status to
determine whether we need to send a controller unpublish.
2020-08-07 10:43:45 -04:00
Seth Hoenig
d3e9707119 Merge pull request #8603 from hashicorp/f-upgrade-consul-api
deps: upgrade import of consul/api
2020-08-07 08:46:19 -05:00
Seth Hoenig
7bdbef2597 deps: upgrade import of consul/api
Upgrade our consul/api import to the equivelent of consul@v1.8.1 which includes
a bug fix necessary for #6913. If consul would publish a proper api/ submodule tag
we could reference that.
2020-08-06 21:02:33 -05:00
Michael Lange
950c2bd39d Make eq-by helper resilient to a lack of prop since handlebars doesn't short-circuit evaluation 2020-08-06 17:59:26 -07:00
Michael Lange
7dde9ab100 Key the annotations each loop by annotationKey for stable dom nodes 2020-08-06 17:58:43 -07:00
Michael Lange
40401960cb Add integration test for line-chart annotation staggering 2020-08-06 17:37:09 -07:00
Michael Lange
339bccbeb0 Add missing word "two" to test name
Co-authored-by: Buck Doyle <buck@hashicorp.com>
2020-08-06 15:43:29 -07:00
Tim Gross
9384b1f77e csi: release claims via csi_hook postrun unpublish RPC (#8580)
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.

By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
2020-08-06 14:51:46 -04:00
Michael Schurter
30c6df8efc build: update from Go 1.14.6 to Go 1.14.7
Go 1.14.7 fixes CVE-2020-16845 which is not believed to impact Nomad.
2020-08-06 11:50:29 -07:00
Michael Schurter
713b5f5007 Merge pull request #8597 from hashicorp/b-vault-revoke-log-line
vault: log once per interval if batching revocation
2020-08-06 11:32:47 -07:00
Tim Gross
46d6f68b4e csi: update volumewatcher to use unpublish RPC (#8579)
This changeset updates `nomad/volumewatcher` to take advantage of the
`CSIVolume.Unpublish` RPC. This lets us eliminate a bunch of code and
associated tests. The raft batching code can be safely dropped, as the
characteristic times of the CSI RPCs are on the order of seconds or even
minutes, so batching up raft RPCs added complexity without any real world
performance wins.

Includes refactor w/ test cleanup and dead code elimination in volumewatcher
2020-08-06 14:31:18 -04:00
Tim Gross
acc1c0b751 csi: add unpublish RPC (#8572)
This changeset is plumbing for a `nomad volume detach` command that will be
reused by the volumewatcher claim GC as well.
2020-08-06 13:51:29 -04:00
Tim Gross
07ff0b94dc csi: retry controller client RPCs on next controller (#8561)
The documentation encourages operators to run multiple controller plugin
instances for HA, but the client RPCs don't take advantage of this by retrying
when the RPC fails in cases when the plugin is unavailable (because the node
has drained or the alloc has failed but we haven't received an updated
fingerprint yet).

This changeset tries all known controllers on ready nodes before giving up,
and adds tests that exercise the client RPC routing and retries.
2020-08-06 13:24:24 -04:00