Commit Graph

22713 Commits

Author SHA1 Message Date
James Rasell
e89dd5dcf7 test: move remaining tests to use ci.Parallel. 2022-03-24 08:45:13 +01:00
James Rasell
bd377c9ade Merge pull request #12330 from hashicorp/f-gh-263
cli: add service commands for list, info, and delete.
2022-03-23 15:49:29 +01:00
James Rasell
6d002e9698 core: remove node service registrations when node is down.
When a node fails its heart beating a number of actions are taken
to ensure state is cleaned. Service registrations a loosely tied
to nodes, therefore we should remove these from state when a node
is considered terminally down.
2022-03-23 09:42:46 +01:00
James Rasell
d49cf2388a Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
Tim Gross
73afdea08d csi: fix handling of garbage collected node in node unpublish (#12350)
When a node is garbage collected, we assume that the volume is no
longer attached to it and ignore the `ErrUnknownNode` error. But we
used `errors.Is` to check for a wrapped error, and RPC flattens the
errors during serialization. This results in an error check that works
in automated testing but not in real clusters. Use a string contains
check instead.
2022-03-22 15:40:24 -04:00
Luiz Aoqui
717053adbb core: use the new Raft API when removing peers (#12340)
Raft v3 introduced a new API for adding and removing peers that takes
the peer ID instead of the address.

Prior to this change, Nomad would use the remote peer Raft version for
deciding which API to use, but this would not work in the scenario where
a Raft v3 server tries to remove a Raft v2 server; the code running uses
v3 so it's unable to call the v2 API.

This change uses the Raft version of the server running the code to
decide which API to use. If the remote peer is a Raft v2, it uses the
server address as the ID.
2022-03-22 15:07:31 -04:00
Luiz Aoqui
bf5006ec01 set raft v3 as the default config (#12341) 2022-03-22 15:06:25 -04:00
Tim Gross
879e137457 drainer: defer CSI plugins until last (#12324)
When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
2022-03-22 10:26:56 -04:00
Jonathan Tey
65e41fdb54 demo: add missing file for Kadalu CSI demo (#12336) 2022-03-22 09:50:50 -04:00
Tim Gross
70b752736a CSI: presentation improvements (#12325)
* Fix plugin capability sorting.
  The `sort.StringSlice` method in the stdlib doesn't actually sort, but
  instead constructs a sorting type which you call `Sort()` on.
* Sort allocations for plugins by modify index.
  Present allocations in modify index order so that newest allocations
  show up at the top of the list. This results in sorted allocs in
  `nomad plugin status :id`, just like `nomad job status :id`.
* Sort allocations for volumes in HTTP response.
  Present allocations in modify index order so that newest allocations
  show up at the top of the list. This results in sorted allocs in
  `nomad volume status :id`, just like `nomad job status :id`.
  This is implemented in the HTTP response and not in the state store
  because the state store maintains two separate lists of allocs that
  are merged before sending over the API.
* Fix length of alloc IDs in `nomad volume status` output
2022-03-22 09:48:38 -04:00
James Rasell
b7d83ae527 Merge pull request #12332 from hashicorp/b-node-fixup-drainupdate-msg-spelling
core: fixup node drain update message spelling.
2022-03-22 10:44:03 +01:00
James Rasell
b6f1e9d656 Merge pull request #12329 from hashicorp/f-gh-266-touchdown
client: hookup service wrapper for use in clients
2022-03-22 10:42:05 +01:00
Tim Gross
02d26ceb1a CSI: set plugin CSI_ENDPOINT env var only if unset by user (#12257)
* Use unix:// prefix for CSI_ENDPOINT variable by default
* Some plugins have strict validation over the format of the
  `CSI_ENDPOINT` variable, and unfortunately not all plugins
  agree. Allow the user to override the `CSI_ENDPOINT` to workaround
  those cases.
* Update all demos and tests with CSI_ENDPOINT
2022-03-21 11:48:47 -04:00
Tim Gross
c3f53fa8d0 E2E: ensure ConnectACLsE2ETest has clean state before starting (#12334)
The `ConnectACLsE2ETest` checks that the SI tokens have been properly
cleaned up between tests, but following the change to use HCP the
previous `Connect` test suite will often have SI tokens that haven't
been cleaned up by the time this test suite runs. Wait for the SI
tokens to be cleaned up at the start of the test to ensure we have a
clean state.
2022-03-21 11:05:02 -04:00
James Rasell
0a2981fcba core: fixup node drain update message spelling. 2022-03-21 13:37:08 +01:00
James Rasell
9944777919 cli: add service commands for list, info, and delete. 2022-03-21 11:59:03 +01:00
James Rasell
f0be952cb5 client: hookup service wrapper for use within client hooks. 2022-03-21 10:29:57 +01:00
James Rasell
131cda2824 client: modify service wrapper to accomodate restore behaviour. 2022-03-21 09:49:39 +01:00
James Rasell
3668bacf5a Merge pull request #12298 from hashicorp/f-gh-266-nomad
client: add Nomad service registration implementation.
2022-03-21 08:58:45 +01:00
Seth Hoenig
7ff71ad19c Merge pull request #12322 from hashicorp/ci-gha
ci: turn on testing in github actions
2022-03-18 12:58:09 -05:00
Seth Hoenig
665967cc31 ci: scope to push, ignore more dirs, update go update script 2022-03-18 12:47:38 -05:00
Seth Hoenig
5c2aa93d6e ci: turn on testing in github actions 2022-03-18 11:12:24 -05:00
Seth Hoenig
bd71f20979 Merge pull request #12321 from hashicorp/ci-less-logging
ci: limit gotestsum to circle ci
2022-03-18 10:02:13 -05:00
Seth Hoenig
a44c55ae84 ci: limit gotestsum to circle ci
Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255

This PR makes it so gotestsum is invoked only in CircleCI. Also the
HCLogger(t) is plumbed more correctly in TestServer and TestAgent so
that they respect NOMAD_TEST_LOG_LEVEL.

The reason for these is we'll want to disable logging in GHA,
where spamming the disk with logs really drags performance.
2022-03-18 09:15:01 -05:00
Tim Gross
09cdaca5ba api: fix ENT-only test imports for moved testutil package (#12320)
The `api/testutil` package was moved to `api/internal/testutil` but
this wasn't caught in the ENT tests because they're not run here in
the OSS repo.
2022-03-18 10:12:28 -04:00
Tim Gross
020fa6f8ba E2E with HCP Consul/Vault (#12267)
Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits:

* Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting.
* Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting
* Our E2E exercises HCP, which is important to us as an organization
* With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner.
* Packer builds for Linux and Windows become much simpler.

tl;dr way less janky shell scripting!
2022-03-18 09:27:28 -04:00
Seth Hoenig
4bf0decbdb Merge pull request #12313 from hashicorp/purge-parallel-2
ci: more parallel removal
2022-03-17 13:48:37 -05:00
Seth Hoenig
fec8d6e030 ci: do not exclude Parallel semgrep rule 2022-03-17 13:45:56 -05:00
Luiz Aoqui
eca4ac67f3 cli: display Raft version in server members (#12317)
The previous output of the `nomad server members` command would output a
column named `Protocol` that displayed the Serf protocol being currently
used by servers.

This is not a configurable option, so it holds very little value to
operators. It is also easy to confuse it with the Raft Protocol version,
which is configurable and highly relevant to operators.

This commit replaces the previous `Protocol` column with the new `Raft
Version`. It also updates the `-detailed` flag to be called `-verbose`
so it matches other commands. The detailed output now also outputs the
same information as the standard output with the addition of the
previous `Protocol` column and `Tags`.
2022-03-17 14:15:10 -04:00
Luiz Aoqui
81687c1ce5 api: add related evals to eval details (#12305)
The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.

Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>
2022-03-17 13:56:14 -04:00
Luiz Aoqui
dfe520a9d8 server: transfer leadership in case of error (#12293)
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.

In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
2022-03-17 11:10:57 -04:00
Seth Hoenig
bee4c07fe2 ci: missing import for nomad09upgrade 2022-03-17 08:49:15 -05:00
Seth Hoenig
ae21af4f9b ci: semgrep rule for parallel tests
Adds a semgrep rule warning about using ci.Parallel instead of t.Parallel
2022-03-17 08:43:37 -05:00
Seth Hoenig
7e6767ddf8 e2e: have e2e use ci.Parallel
This is a followup to having tests run in serial in CI.

The e2e package isn't in CI, but lets use the helper anyway
so we can setup semgrep rules covering the entire repository.
2022-03-17 08:37:34 -05:00
Seth Hoenig
3695901441 ci: use serial testing for api in CI
This is a followup to running tests in serial in CI.
Since the API package cannot import anything outside of api/,
copy the ci.Parallel function into api/internal/testutil, and
have api tests use that.
2022-03-17 08:35:01 -05:00
James Rasell
9185316857 client: add Nomad service registration implementation. 2022-03-17 09:17:13 +01:00
James Rasell
93e902472d Merge pull request #12295 from hashicorp/f-gh-266-wrapper
client: add service registration wrapper to handle providers.
2022-03-17 08:52:45 +01:00
James Rasell
74c886064f Merge pull request #12307 from hashicorp/b-groupservices-avoid-double-tg-lookup
client: avoid double group lookup within groupservice hook setup.
2022-03-16 17:00:01 +01:00
Luiz Aoqui
8f2b21ab95 tests: move state store namespace tests from ENT (#12308) 2022-03-16 11:56:11 -04:00
Seth Hoenig
ea91dbd114 Merge pull request #12299 from hashicorp/ci-parallel
ci: trade test parallelization for unconstrained gomaxprocs
2022-03-16 08:55:39 -05:00
Seth Hoenig
bca014b2cb ci: explain why ci runs tests in serial now 2022-03-16 08:38:42 -05:00
James Rasell
98e7430086 client: avoid double group lookup within groupservice hook setup. 2022-03-16 09:42:57 +01:00
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Luiz Aoqui
dfd0beead7 fix alloc list test (#12297)
The alloc list test with pagination was creating allocs before the
target namespace existed. This works in OSS but fails in ENT because
quotas are checked before the alloc can be created, so the namespace
must exist beforehand.
2022-03-15 10:41:07 -04:00
James Rasell
47229473b9 client: add service registration wrapper to handle providers.
The service registration wrapper handles sending requests to
backend providers without the caller needing to know this
information. This will be used within the task and alloc runner
service hooks when performing service registration activities.
2022-03-15 12:43:52 +01:00
James Rasell
3fcec92dd8 Merge pull request #12272 from hashicorp/f-gh-266-refactor
client: refactor common service registration objects from Consul.
2022-03-15 10:26:33 +01:00
James Rasell
6e8f32a290 client: refactor common service registration objects from Consul.
This commit performs refactoring to pull out common service
registration objects into a new `client/serviceregistration`
package. This new package will form the base point for all
client specific service registration functionality.

The Consul specific implementation is not moved as it also
includes non-service registration implementations; this reduces
the blast radius of the changes as well.
2022-03-15 09:38:30 +01:00
Tim Gross
d371f456dc docs: clarify restart inheritance and add examples (#12275)
Clarify the behavior of `restart` inheritance with respect to Connect
sidecar tasks. Remove incorrect language about the scheduler being
involved in restart decisions. Try to make the `delay` mode
documentation more clear, and provide examples of delay vs fail.
2022-03-14 15:49:08 -04:00
Luiz Aoqui
ebbbedd984 docs: initial docs for the new API features (#12094) 2022-03-14 10:58:42 -04:00
Lars Lehtonen
8c8a8d2e23 scheduler: fix unused dstate variable (#12268) 2022-03-14 10:00:59 -04:00