Commit Graph

22795 Commits

Author SHA1 Message Date
Derek Strickland
786180601d reconciler: support disconnected clients (#12058)
* Add merge helper for string maps
* structs: add statuses, MaxClientDisconnect, and helper funcs
* taintedNodes: Include disconnected nodes
* upsertAllocsImpl: don't use existing ClientStatus when upserting unknown
* allocSet: update filterByTainted and add delayByMaxClientDisconnect
* allocReconciler: support disconnecting and reconnecting allocs
* GenericScheduler: upsert unknown and queue reconnecting

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-05 17:10:37 -04:00
Shishir
4042c28223 cli: add -quiet to nomad node status command. (#12426) 2022-04-05 15:53:43 -04:00
Jai
1d28553786 ui: eval filter (#12243)
* ui:  add triggeredBy filter

* add namespace filter

* fix:  namespace is a reserved keyword

* ui: filter by type and search

* fix:  rename closure action to

* chore:  fix data-test-attr
2022-04-05 15:30:36 -04:00
Jai
3e0a1e19ad Epic: Evaluation Detail Sidebar (#12370)
* chore: prettify gutter-menu

* chore:  add portal packages

* styling:  add styles sidebar and portal behavior

* ui:  sidebar component

* ui:  create and implement statechart for evals

* ui:  actor-relationship service and provider component

* ui:  d3 hierarchy computation

* chore:  add render-modifiers and curved arrows

* ui:  create evaluation actor div

* fix related evaluations schema

* ui:  register/deregister evaluation divs

* ui:  handle resize behavior

* bug:  infinite re-render cycle

* fix:  conditional logic to prevent infinite render of flex resizing

* ui: related evaluations schema and request param

* ui: fix testing for evaluations

* refact: make related-evals a proper has-many

* chore: don't pauseTest

* temp:  debug d3 hierarchy

* ui:  move derived state logic into backing component class for detail

* ui:  deprecated related evaluations logic in statechart

* ui:  update evaluation models

* ui:  update logic to paint svg in non-viewable scroll region

* ui:  update styling

* ui:  testing for eval detail view

* ui:  delete detail from template directory

* ui:  break detail component down

* ui:  static data for /evaluation/:id endpoint

* ui:  fix styling of d3 viz

* ui:  add query parameter adapter for evals

* ui:  last minute design requests

* wip:  address browser updating detail view behavior

* refact: handle query-state change in statechart

* conditional class looking for currentEval equality (#12411)

* F UI/evaluation detail sidebar rel evals (#12415)

* ui:  remove busy id alias from statechart

* ui: edit related evaluations viz error message

* ui:  bug fixes on related evaluations view (#12423)

* ui:  remove busy id alias from statechart

* ui: edit related evaluations viz error message

* ui:  update error state

* ui:  related evaluation outline styling

* Related evaluation stylefile and non-link if it matches the active sidebar (#12428)

* Adds tabbable and keyboard pressable evaluation table rows (#12433)

* ui:  fix failing eval list tests (#12437)

* ui:  move styling into classes (#12438)

* fix test failures (#12444)

* ui:  move styling into classes

* ui:  eslint disable

* ui:  allocations have evaluations as async relationships

* ui:  fix evaluation refresh button (#12447)

* ui:  move styling into classes

* ui:  eslint disable

* ui:  allocations have evaluations as async relationships

* ui:  refresh bug

* ui:  final touches on sidebar (#12462)

* chore: turn off template linting rules

Temporarily turning off template linting because we dont have a set CSS convention and the release needs to go out ASAP.

* doc:  deprecate out of date comments and vars

* ui:  edit mirage server fetch logic

* ui:  style sidebar relative

* Modification to mocked related evals and manually set 100% height on svg (#12460)

* F UI/evaluation detail sidebar final touches (#12463)

* chore: turn off template linting rules

Temporarily turning off template linting because we dont have a set CSS convention and the release needs to go out ASAP.

* doc:  deprecate out of date comments and vars

* ui:  edit mirage server fetch logic

* ui:  style sidebar relative

* ui:  account for new related eval added to chain

Co-authored-by: Michael Klein <michael@firstiwaslike.com>
Co-authored-by: Phil Renaud <phil@riotindustries.com>
2022-04-05 14:34:37 -04:00
Luiz Aoqui
d412f7b497 Support Vault entity aliases (#12449)
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.

Make Vault job validation its own function so it's easier to expand it.

Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.

Set `ChangeMode` on `Vault.Canonicalize`.

Add some missing tests.

Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.

An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.

Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
2022-04-05 14:18:10 -04:00
Tim Gross
a8d5e5e7a3 CSI: don't block client shutdown for node unmount (#12457)
When we unmount a volume we need to be able to recover from cases
where the plugin has been shutdown before the allocation that needs
it, so in #11892 we blocked shutting down the alloc runner hook. But
this blocks client shutdown if we're in the middle of unmounting. The
client won't be able to communicate with the plugin or send the
unpublish RPC anyways, so we should cancel the context and assume that
we'll resume the unmounting process when the client restarts.

For `-dev` mode we don't send the graceful `Shutdown()` method and
instead destroy all the allocations. In this case, we'll never be able
to communicate with the plugin but also never close the context we
need to prevent the hook from blocking. To fix this, move the retries
into their own goroutine that doesn't block the main `Postrun`.
2022-04-05 13:05:10 -04:00
James Rasell
b7d19a60b8 Merge pull request #12454 from hashicorp/f-rename-service-event-stream
events: add service API logic and rename topic to service from serviceregistration
2022-04-05 16:19:14 +02:00
Grant Griffiths
a2859059ff CSI: Add secrets flag support for delete volume (#11245) 2022-04-05 08:59:11 -04:00
James Rasell
cebe704572 events: add API helpers for service events stream topics. 2022-04-05 08:26:02 +01:00
James Rasell
85baf8f5ae events: fixup service events and rename topic to service. 2022-04-05 08:25:22 +01:00
Danish Prakash
ff6ae5fad2 command/operator_debug: add pprof interval (#11938) 2022-04-04 15:24:12 -04:00
Michael Schurter
5de999d21a Merge pull request #12442 from hashicorp/f-sd-add-mixed-auth-read-endpoints
service-disco: add mixed auth to list and read RPC endpoints.
2022-04-04 12:19:29 -07:00
Tim Gross
f718c132b4 CSI: volume watcher shutdown fixes (#12439)
The volume watcher design was based on deploymentwatcher and drainer,
but has an important difference: we don't want to maintain a goroutine
for the lifetime of the volume. So we stop the volumewatcher goroutine
for a volume when that volume has no more claims to free. But the
shutdown races with updates on the parent goroutine, and it's possible
to drop updates. Fortunately these updates are picked up on the next
core GC job, but we're most likely to hit this race when we're
replacing an allocation and that's the time we least want to wait.

Wait until the volume has "settled" before stopping this goroutine so
that the race between shutdown and the parent goroutine sending on
`<-updateCh` is pushed to after the window we most care about quick
freeing of claims.

* Fixes a resource leak when volumewatchers are no longer needed. The
  volume is nil and can't ever be started again, so the volume's
  `watcher` should be removed from the top-level `Watcher`.

* De-flakes the GC job test: the test throws an error because the
  claimed node doesn't exist and is unreachable. This flaked instead of
  failed because we didn't correctly wait for the first pass through the
  volumewatcher.

  Make the GC job wait for the volumewatcher to reach the quiescent
  timeout window state before running the GC eval under test, so that
  we're sure the GC job's work isn't being picked up by processing one
  of the earlier claims. Update the claims used so that we're sure the
  GC pass won't hit a node unpublish error.

* Adds trace logging to unpublish operations
2022-04-04 10:46:45 -04:00
Seth Hoenig
f8d693b079 Merge pull request #12403 from hashicorp/dependabot/go_modules/github.com/creack/pty-1.1.18
build(deps): bump github.com/creack/pty from 1.1.17 to 1.1.18
2022-04-04 09:43:09 -05:00
dependabot[bot]
1ce6bc5bbf build(deps): bump github.com/creack/pty from 1.1.17 to 1.1.18
Bumps [github.com/creack/pty](https://github.com/creack/pty) from 1.1.17 to 1.1.18.
- [Release notes](https://github.com/creack/pty/releases)
- [Commits](https://github.com/creack/pty/compare/v1.1.17...v1.1.18)

---
updated-dependencies:
- dependency-name: github.com/creack/pty
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-04-04 14:25:02 +00:00
Seth Hoenig
1764a4da4c Merge pull request #12446 from shoenig/no-pkg-err
cleanup: purge github.com/pkg/errors
2022-04-04 09:22:44 -05:00
Tim Gross
b91d0e73cb E2E: ensure that CSI EBS tests are isolated from each other (#12443)
Tear down the volume-consuming job between subtests, rather than after
all the tests are complete. For good measure, use a different ID for
the volume-consuming job as well.
2022-04-04 09:44:55 -04:00
James Rasell
e839640d15 service-disco: add mixed auth to list and read RPC endpoints.
In the same manner as the delete RPC, the list and read service
registration endpoints can be called either by external operators
or Nomad nodes. The latter occurs when a template is being
rendered which includes Nomad API template funcs. In this case,
the auth token is looked up as the node secret ID for auth.
2022-04-04 13:45:43 +01:00
James Rasell
245dd801dd Merge pull request #12304 from th0m/tlefebvre/fix-wrong-drivernetworkmanager-interface
fix: update incorrect DriverNetworkManager interface implementation
2022-04-04 11:29:22 +02:00
Seth Hoenig
6f37b28b87 cleanup: purge github.com/pkg/errors 2022-04-01 19:24:02 -05:00
Tim Gross
7c589fd773 Test lint touchup (#12434)
* lint: require should not be aliased in core_sched_test
* lint: require should not be aliased in volumes_watcher_test
* testing: don't alias state package in core_sched_test
2022-04-01 15:17:58 -04:00
Seth Hoenig
054ccc6050 Merge pull request #12432 from hashicorp/ci-gha-ignore-subpaths
ci: correctly ignore subpaths in gha
2022-04-01 09:58:23 -05:00
Seth Hoenig
cdc0258913 Merge pull request #12431 from hashicorp/docs-sysbatch-exists-typo
docs: fix typo in system batch description
2022-04-01 09:58:06 -05:00
Seth Hoenig
64fd4781a4 ci: correctly ignore subpaths in gha 2022-04-01 09:49:40 -05:00
Seth Hoenig
b54c9c82ae docs: fix typo in system batch description 2022-04-01 09:46:03 -05:00
Bryce Kalow
afd460758a website: redirect /api to api-docs and update internal links (#12410) 2022-03-31 11:33:27 -05:00
Tim Gross
6668ce022a docs: remove deprecated client options parameters docs (#12416)
The client configuration options for drivers have been deprecated
since 0.9. We haven't torn them out completely but because they're
deprecated it's been hard to guarantee correct behavior. Remove the
documentation so that users aren't misled about their viability.
2022-03-31 11:45:51 -04:00
Seth Hoenig
845ab88847 Merge pull request #12417 from hashicorp/tests-remove-08-groups-services
tests: remove update 08 groups services test
2022-03-31 10:40:50 -05:00
Seth Hoenig
efb978e678 tests: remove update 08 groups services test
This is a test around upgrading from Nomad 0.8, which is long since
no longer supported. The test is slow, flaky, and imports consul/sdk.

Remove this test as it is no longer relevant.
2022-03-31 10:14:22 -05:00
Seth Hoenig
83d3817359 Merge pull request #12414 from hashicorp/tests-docker-dns-sadness
tests: create fresh harness for each docker dns test
2022-03-31 10:05:30 -05:00
Seth Hoenig
2e5e7428b1 tests: create fresh harness for each docker dns test
Not actually sure this fixes the flaky tests, but seems
like it could be related.
2022-03-31 08:17:34 -05:00
Seth Hoenig
f7bfba3e43 Merge pull request #12404 from hashicorp/tests-client-waits
tests: wait on client in a couple of tests
2022-03-30 15:23:47 -05:00
Seth Hoenig
0c7d260ffc tests: wait on client in a couple of tests
These tend to fail on GHA, where I believe the client is not
starting up fast enough before making requests. So wait on
the client agent first.

```
=== RUN   TestDebug_CapturedFiles
    operator_debug_test.go:422: serverName: TestDebug_CapturedFiles.global, clientID, 1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4
    operator_debug_test.go:492:
        	Error Trace:	operator_debug_test.go:492
        	Error:      	Should be empty, but was No node(s) with prefix "1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4" found
        	            	Failed to retrieve clients, 0 nodes found in list: 1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4
        	Test:       	TestDebug_CapturedFiles
--- FAIL: TestDebug_CapturedFiles (0.08s)
```
2022-03-30 08:48:23 -05:00
Seth Hoenig
d2b69b61da Merge pull request #12405 from hashicorp/ci-format-release-metadata-file
ci: hcl format release metadata file
2022-03-30 08:13:15 -05:00
Seth Hoenig
fb760d154b ci: add trailing newline to release metadata 2022-03-30 08:12:55 -05:00
Tim Gross
43eba61e4f E2E disconnected clients test refactor (#12402)
* Wait longer for node to go down in disconnected clients test.
  The existing helper only waits 10s, but there's a jitter on heartbeats
  that we need to account for. Wait for 30s for node to go down to give
  us plenty of room
* Port disconnected clients to stdlib-style test
2022-03-30 09:12:44 -04:00
Seth Hoenig
1b06e01d0f ci: hcl format release metadata file 2022-03-30 08:02:55 -05:00
Michele Degges
e4b52a6e63 [RelAPI Onboarding] Add release API metadata file (#12353) 2022-03-29 15:38:50 -07:00
Michael Schurter
dfc1519b0a Merge pull request #12312 from hashicorp/f-writeToFile
template: disallow `writeToFile` by default
2022-03-29 13:41:59 -07:00
Tim Gross
b075e0a6a8 csi: allow namespace field to be passed in volume spec (#12400)
Use the volume spec's `namespace` field to override the value of the
`-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.
2022-03-29 14:46:39 -04:00
Michael Schurter
3ca38ee4ed template: fix comments and docs
Review notes from @lgfa29

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-03-29 09:25:23 -07:00
Tim Gross
e8da15cae5 E2E: test exercising node drain behavior for CSI volumes (#12384) 2022-03-29 11:19:23 -04:00
dependabot[bot]
71cc8b4122 build(deps): bump github.com/mitchellh/hashstructure from 1.0.0 to 1.1.0 (#12399)
Bumps [github.com/mitchellh/hashstructure](https://github.com/mitchellh/hashstructure) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/mitchellh/hashstructure/releases)
- [Commits](https://github.com/mitchellh/hashstructure/compare/v1.0.0...v1.1.0)

---
updated-dependencies:
- dependency-name: github.com/mitchellh/hashstructure
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-29 11:17:09 -04:00
Tim Gross
98e122c7e2 CSI: reorder controller volume detachment (#12387)
In #12112 and #12113 we solved for the problem of races in releasing
volume claims, but there was a case that we missed. During a node
drain with a controller attach/detach, we can hit a race where we call
controller publish before the unpublish has completed. This is
discouraged in the spec but plugins are supposed to handle it
safely. But if the storage provider's API is slow enough and the
plugin doesn't handle the case safely, the volume can get "locked"
into a state where the provider's API won't detach it cleanly.

Check the claim before making any external controller publish RPC
calls so that Nomad is responsible for the canonical information about
whether a volume is currently claimed.

This has a couple side-effects that also had to get fixed here:

* Changing the order means that the volume will have a past claim
  without a valid external node ID because it came from the client, and
  this uncovered a separate bug where we didn't assert the external node
  ID was valid before returning it. Fallthrough to getting the ID from
  the plugins in the state store in this case. We avoided this
  originally because of concerns around plugins getting lost during node
  drain but now that we've fixed that we may want to revisit it in
  future work.
* We should make sure we're handling `FailedPrecondition` cases from
  the controller plugin the same way we handle other retryable cases.
* Several tests had to be updated because they were assuming we fail
  in a particular order that we're no longer doing.
2022-03-29 09:44:00 -04:00
Michael Schurter
f87ec7e64e template: disallow writeToFile by default
Resolves #12095 by WONTFIXing it.

This approach disables `writeToFile` as it allows arbitrary host
filesystem writes and is only a small quality of life improvement over
multiple `template` stanzas.

This approach has the significant downside of leaving people who have
altered their `template.function_denylist` *still vulnerable!* I added
an upgrade note, but we should have implemented the denylist as a
`map[string]bool` so that new funcs could be denied without overriding
custom configurations.

This PR also includes a bug fix that broke enabling all consul-template
funcs. We repeatedly failed to differentiate between a nil (unset)
denylist and an empty (allow all) one.
2022-03-28 17:05:42 -07:00
Ryo Nakao
97dc6875e0 Ensure to close StreamFrame channel (#12248) 2022-03-28 10:28:23 -04:00
Tim Gross
167cfcdf9e docs: changelog entry (#12393) 2022-03-28 09:44:58 -04:00
Shishir
eea1b1f27c Display OS name in nomad node status command. (#12388)
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2022-03-28 09:28:14 -04:00
Seth Hoenig
43b64b749a Merge pull request #12381 from hashicorp/ci-gha-off
ci: set test log level off in gha
2022-03-25 15:13:42 -05:00
Tim Gross
37d831712f E2E: namespace HCP vault and consul policies to avoid collisions (#12386)
Concurrent E2E runs can collide when provisioning policies on HCP
Consul and HCP Vault. Namespace these by the test run name, as we do
for most everything else.
2022-03-25 16:05:59 -04:00