Commit Graph

22582 Commits

Author SHA1 Message Date
Ashlee M Boyer
2f14ceef05 docs: Fixing path for autoscaling/agent/source nav item (#12166) 2022-03-01 17:24:12 -05:00
Luiz Aoqui
b246227869 api: paginated results with different ordering (#12128)
The paginator logic was built when go-memdb iterators would return items
ordered lexicographically by their ID prefixes, but #12054 added the
option for some tables to return results ordered by their `CreateIndex`
instead, which invalidated the previous paginator assumption.

The iterator used for pagination must still return results in some order
so that the paginator can properly handle requests where the next_token
value is not present in the results anymore (e.g., the eval was GC'ed).

In these situations, the paginator will start the returned page in the
first element right after where the requested token should've been.

This commit moves the logic to generate pagination tokens from the
elements being paginated to the iterator itself so that callers can have
more control over the token format to make sure they are properly
ordered and stable.

It also allows configuring the paginator as being ordered in ascending
or descending order, which is relevant when looking for a token that may
not be present anymore.
2022-03-01 15:36:49 -05:00
Tim Gross
9cf99ce5ec csi: subcommand for volume snapshot (#12152) 2022-03-01 13:30:30 -05:00
Tim Gross
907c795874 CSI: set plugin socket path on restore (#12149)
The Prestart hook for task runner hooks doesn't get called when we
restore a task, because the task is already running. The Postrun hook
for CSI plugin supervisors needs the socket path to have been
populated so that the client has a valid path.
2022-03-01 10:22:52 -05:00
Tim Gross
03a8d72dba CSI: implement support for topology (#12129) 2022-03-01 10:15:46 -05:00
Tim Gross
3fd968310d CSI: use HTTP headers for passing CSI secrets (#12144) 2022-03-01 08:47:01 -05:00
Tim Gross
8ccb9a3271 csi: fix redaction of volume status mount flags (#12150)
The `volume status` command and associated API redacts the entire
mount options instead of just the `MountFlags` field that can contain
sensitive data. Return a redacted value so that the return value makes
sense to operators who have set this field.
2022-03-01 08:34:03 -05:00
Tim Gross
c06f31eef0 CSI: sort capabilities in plugin status (#12154)
Also fix `LIST_SNAPSHOTS` capability name
2022-03-01 07:59:31 -05:00
Tim Gross
05034a6cd0 docs: clarify that plugin commands are for CSI only (#12151) 2022-03-01 07:57:41 -05:00
Tim Gross
8c8b997f1e csi: respect -verbose flag for allocs in volume status (#12153) 2022-03-01 07:57:29 -05:00
Kevin Wang
636345a167 fix(website): hide version select on /plugins & /tools (#12145)
* fix(website/plugins): display version select

* fix: hide version select on `/tools` + `/plugins`
2022-02-28 12:44:08 -05:00
Tim Gross
1cb00e8998 CI: increase test run timeout (#12143) 2022-02-28 11:30:59 -05:00
Seth Hoenig
0b3ba5c77c Merge pull request #12137 from hashicorp/rpc-advertise-docs
docs: clairfy advertise.rpc effect
2022-02-28 08:15:28 -06:00
Seth Hoenig
3a6c824ea0 docs: clairfy advertise.rpc effect
The advertise.rpc config option is not intuitive. At first glance you'd
assume it works like advertise.http or advertise.serf, but it does not.

The current behavior is working as intended, but the documentation is
very hard to parse and doesn't draw a clear picture of what the setting
actually does.

Closes https://github.com/hashicorp/nomad/issues/11075
2022-02-25 16:02:29 -06:00
Jai
b76e04dc51 Merge pull request #12134 from hashicorp/b-ui/target-link
ui:  external links open in new tabs
2022-02-25 10:29:04 -05:00
Seth Hoenig
5683fdf75a Merge pull request #12130 from hashicorp/flakey-serf-non-voter
tests: deflake test that joins a server with non-voting servers to form quorum
2022-02-25 09:12:42 -06:00
Jai Bhagat
1d90cbff8c ui: external links open in new tabs 2022-02-25 09:24:37 -05:00
Seth Hoenig
bd03d2548f tests: deflake test that joins a server with non-voting servers to form qourum
This PR
 - upgrades the serf library
 - has the test start the join process using the un-joined server first
 - disables schedulers on the servers
 - uses the WaitForLeader and wantPeers helpers

Not sure which, if any of these actually improves the flakiness of this test.
2022-02-24 17:02:58 -06:00
Zachary Shilton
a10af1bce7 chore: bump docs-page for code-block fix (#12117)
* chore: bump to latest docs-page

* fix: bump to react-consent-manager patch

* chore: bump to consent-manager with events dep

* chore: bump to stable consent-manager release
2022-02-24 15:34:54 -05:00
Tim Gross
21aa764154 CSI: ensure all fields are mapped from structs to api response (#12124)
In PR #12108 we added missing fields to the plugin response, but we
didn't include the manual serialization steps that we need until
issue #10470 is resolved.
2022-02-24 14:17:15 -05:00
Tim Gross
59f6c75361 CSI: display plugin capabilities in verbose status (#12116)
The behaviors of CSI plugins are governed by their capabilities as
defined by the CSI specification. When debugging plugin issues, it's
useful to know which behaviors are expected so they can be matched
against RPC calls made to the plugin allocations.

Expose the plugin capabilities as named in the CSI spec in the `nomad
plugin status -verbose` output.
2022-02-24 13:51:38 -05:00
Luiz Aoqui
48184772b0 docs: add docs for the autoscaler on_error and on_check_error configuration (#12083) 2022-02-24 12:12:29 -05:00
James Rasell
d8352186d1 Merge pull request #12122 from hashicorp/b-api-remove-namespace-test-ent-tag
api: remove ent build tag on namespace test file.
2022-02-24 17:13:15 +01:00
James Rasell
e7d3220d40 api: remove ent build tag on namespace test file. 2022-02-24 16:40:04 +01:00
Tim Gross
649f1e3967 CSI: retry claims from client when max claims are reached (#12113)
When the alloc runner claims a volume, an allocation for a previous
version of the job may still have the volume claimed because it's
still shutting down. In this case we'll receive an error from the
server. Retry this error until we succeed or until a very long timeout
expires, to give operators a chance to recover broken plugins.

Make the alloc runner hook tolerant of temporary RPC failures.
2022-02-24 10:39:07 -05:00
Tim Gross
6b6b82799c CSI: enforce usage at claim time (#12112)
* Remove redundant schedulable check in `FreeWriteClaims`. If a volume
  has been created but not yet claimed, its capabilities will be checked
  in `WriteSchedulable` at both scheduling time and claim time. We don't
  need to also check them in the `FreeWriteClaims` method.

* Enforce maximum volume claims for writers.

  When the scheduler checks feasibility for CSI volumes, the check is
  fairly loose: earlier versions of the same job are not counted as
  active claims. This allows the scheduler to place new allocations
  for the new version of a job, under the assumption that we'll replace
  the existing allocations and their volume claims.

  But when the alloc runner claims the volume, we need to enforce the
  active claims even if they're for allocations of an earlier version of
  the job. Otherwise we'll try to mount a volume that's currently being
  unmounted, and this will cause replacement allocations to frequently
  fail.

* Enforce single-node reader check for read-only volumes. When the
  alloc runner makes a claim for a read-only volume, we only check that
  the volume is potentially schedulable and not that it actually has
  free read claims.
2022-02-24 09:37:37 -05:00
Sander Mol
0ae76b1af4 add go-sockaddr templating support to nomad consul address (#12084) 2022-02-24 09:34:54 -05:00
Florian Apolloner
b84f70aef6 namespaces: allow enabling/disabling allowed drivers per namespace 2022-02-24 09:27:32 -05:00
Seth Hoenig
5b65c97cec Merge pull request #12107 from hashicorp/use-bbolt
core: swap bolt impl and enable configuring raft freelist sync behavior
2022-02-24 08:25:54 -06:00
Seth Hoenig
96a6f2c956 docs: emphasize snapshot before upgrading 2022-02-24 08:22:41 -06:00
Tim Gross
bfbb6509a3 csi: tolerate missing plugins on job delete (#12114)
If a plugin job fails before successfully fingerprinting the plugins,
the plugin will not exist when we try to delete the job. Tolerate
missing plugins.
2022-02-24 08:53:15 -05:00
Seth Hoenig
42c6d5a5c5 command: switch from raft-boltdb to raft-boltdb/v2 2022-02-23 14:43:59 -06:00
Seth Hoenig
a6cc062c14 client: resolve rebase conflict 2022-02-23 14:32:32 -06:00
Seth Hoenig
615d08bba0 build: disallow old boltdb during build 2022-02-23 14:28:31 -06:00
Seth Hoenig
b2fe196e42 agent: switch to go.etc.io/bbolt for state store
This PR modifies the server and client agents to use `go.etc.io/bbolt` as the
implementation for their state stores.
2022-02-23 14:28:31 -06:00
Seth Hoenig
16efcf4e71 core: switch to go.etc.io/bbolt
This PR swaps the underlying BoltDB implementation from boltdb/bolt
to go.etc.io/bbolt.

In addition, the Server has a new configuration option for disabling
NoFreelistSync on the underlying database.

Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720
2022-02-23 14:26:41 -06:00
Tim Gross
7bcf0afd81 CSI: allow for concurrent plugin allocations (#12078)
The dynamic plugin registry assumes that plugins are singletons, which
matches the behavior of other Nomad plugins. But because dynamic
plugins like CSI are implemented by allocations, we need to handle the
possibility of multiple allocations for a given plugin type + ID, as
well as behaviors around interleaved allocation starts and stops.

Update the data structure for the dynamic registry so that more recent
allocations take over as the instance manager singleton, but we still
preserve the previous running allocations so that restores work
without racing.

Multiple allocations can run on a client for the same plugin, even if
only during updates. Provide each plugin task a unique path for the
control socket so that the tasks don't interfere with each other.
2022-02-23 15:23:07 -05:00
Tim Gross
822285fa0c CSI: add missing plugin capabilities to api response (#12108)
Detection of the full set of plugin capabilities was added in Nomad
1.1 for the volume creation workflow, but these were not added to the
API response for plugins.
2022-02-23 15:22:29 -05:00
Tim Gross
85fb42fb61 csi: fix broken test (#12110) 2022-02-23 13:48:39 -05:00
Charlie Voiselle
53d55ee98a Fixed scheduler config examples (#12049) 2022-02-23 12:58:29 -05:00
Tim Gross
7f5a0c541c CSI: minor refactoring (#12105)
* rename method checking that free write claims are available
* use package-level variables for claim errors
* semgrep fix for testify
2022-02-23 11:13:51 -05:00
Tim Gross
88a80828c6 csi: fix mocked modes in volumewatcher test (#12104)
The volumewatcher test incorrectly represents the change in attachment
and access modes introduced in Nomad 1.1.0 to support volume
creation. This leads to a test that happens to pass but only
accidentally.

Update the test to correctly represent the volume modes set by the
existing claims on the test volumes.
2022-02-23 09:51:20 -05:00
Mike Nomitch
950ccaf177 Merge pull request #12065 from hashicorp/docs-add-form-link
Adding link to interview form
2022-02-22 11:05:20 -08:00
Tim Gross
89ca3d9d75 csi: don't wait to fire initial unmount RPC (#12102)
In PR #11892 we updated the `csi_hook` to unmount the volume locally
via the CSI node RPCs before releasing the claim from the server. The
timer for this hook was initialized with the retry time, forcing us to
wait 1s before making the first unmount RPC calls.

Use the new helper for timers to ensure we clean up the timer nicely.
2022-02-22 13:43:06 -05:00
Luiz Aoqui
a9407111aa docs: update link to mount in Docker task driver (#12101) 2022-02-22 13:39:49 -05:00
Michael Schurter
85abc2dea6 Merge pull request #11600 from hashicorp/f-remove-unused-version
core: remove all traces of unused protocol version
2022-02-22 09:51:42 -08:00
Michael Schurter
62ea60d02f docs: add changelog for #11600 2022-02-18 16:16:19 -08:00
Michael Schurter
2411d3afd2 core: remove all traces of unused protocol version
Nomad inherited protocol version numbering configuration from Consul and
Serf, but unlike those projects Nomad has never used it. Nomad's
`protocol_version` has always been `1`.

While the code is effectively unused and therefore poses no runtime
risks to leave, I felt like removing it was best because:

1. Nomad's RPC subsystem has been able to evolve extensively without
   needing to increment the version number.
2. Nomad's HTTP API has evolved extensively without increment
   `API{Major,Minor}Version`. If we want to version the HTTP API in the
   future, I doubt this is the mechanism we would choose.
3. The presence of the `server.protocol_version` configuration
   parameter is confusing since `server.raft_protocol` *is* an important
   parameter for operators to consider. Even more confusing is that
   there is a distinct Serf protocol version which is included in `nomad
   server members` output under the heading `Protocol`. `raft_protocol`
   is the *only* protocol version relevant to Nomad developers and
   operators. The other protocol versions are either deadcode or have
   never changed (Serf).
4. If we were to need to version the RPC, HTTP API, or Serf protocols, I
   don't think these configuration parameters and variables are the best
   choice. If we come to that point we should choose a versioning scheme
   based on the use case and modern best practices -- not this 6+ year
   old dead code.
2022-02-18 16:12:36 -08:00
Adrián López
1ad08c2eed Update autoscaler AWS ASG target docs: AWS keypair can be empty (#11977) 2022-02-18 17:29:19 -05:00
James Rasell
36cc17024c docs: add autoscaler hcloud target plugin link. (#12087) 2022-02-18 17:28:38 -05:00