Commit Graph

22977 Commits

Author SHA1 Message Date
James Rasell
8f331fe7b3 changelog: add entry for #12368; native service discovery. (#12474) 2022-04-06 18:21:34 +02:00
claire labry
0becc4a9b7 [Main] Onboard to CRT (#12276) 2022-04-06 11:47:02 -04:00
Derek Strickland
4190388646 disconnected clients: Add changelog entry (#12477) 2022-04-06 11:44:26 -04:00
Phil Renaud
f2bd3d0c90 Inlines related evaluations flexbox (#12475) 2022-04-06 11:35:25 -04:00
Benjamin Buzbee
6e1270dd08 Use cleanhttp.DefaultPooledTransport for the default API client (#12409)
The only difference is DefaultTransport sets DisableKeepAlives

This doesn't make much sense to me - every http connection from the
nomad client goes to the same NOMAD_ADDR so it's a great case for keep
alive. Except round robin DNS and anycast perhaps.

Consul does this already
1e47e3c82b/api/api.go (L397)
2022-04-06 11:34:55 -04:00
Jorge Marey
cf6ca95f79 Fix in-place updates over ineligible nodes (#12264) 2022-04-06 11:30:40 -04:00
Derek Strickland
12b7647220 Merge pull request #12476 from hashicorp/f-disconnected-client-allocation-handling
disconnected clients: Feature branch merge
2022-04-06 10:11:57 -04:00
Derek Strickland
8863d1e45a disconnected clients: Support operator manual interventions (#12436)
* allocrunner: Remove Shutdown call in Reconnect
* Node.UpdateAlloc: Stop orphaned allocs.
* reconciler: Stop failed reconnects.
* Apply feedback from code review. Handle rebase conflict.
* Apply suggestions from code review

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-06 09:33:32 -04:00
James Rasell
bca64ad988 Merge pull request #12459 from hashicorp/b-fix-service-delete-cli-flake
cli: fixup service test delete by using atomic actions.
2022-04-06 15:22:08 +02:00
Mike Nomitch
84937300c3 Add max client disconnect docs (#12467)
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-04-06 08:54:14 -04:00
Phil Renaud
5740642fa8 Merge pull request #12473 from hashicorp/f-ui/evals-unshown-copy-change
Copy change, simplifies explanation for no related eval chart
2022-04-06 08:13:28 -04:00
James Rasell
3909253f6c cli: fixup service test delete by using more atomic actions. 2022-04-06 08:36:23 +01:00
Luiz Aoqui
c24c1bf07f ui: hide triggered by and status filters for now (#12472) 2022-04-05 21:14:16 -04:00
Seth Hoenig
133471282b Merge pull request #12419 from hashicorp/exec-cleanup
raw_exec: make raw exec driver work with cgroups v2
2022-04-05 16:42:01 -05:00
Derek Strickland
6791147254 disconnected clients: TaskGroup validation (#12418)
* TaskGroup: Validate that max_client_disconnect and stop_after_client_disconnect are mutually exclusive.
2022-04-05 17:14:50 -04:00
Tim Gross
ca14fb0cc8 docs: updates for CSI plugin improvements for 1.3.0 (#12466) 2022-04-05 17:13:51 -04:00
Derek Strickland
8ac3e642e6 reconciler: 2 phase reconnects and tests (#12333)
* structs: Add alloc.Expired & alloc.Reconnected functions. Add Reconnect eval trigger by.

* node_endpoint: Emit new eval for reconnecting unknown allocs.

* filterByTainted: handle 2 phase commit filtering rules.

* reconciler: Append AllocState on disconnect. Logic updates from testing and 2 phase reconnects.

* allocs: Set reconnect timestamp. Destroy if not DesiredStatusRun. Watch for unknown status.
2022-04-05 17:13:10 -04:00
Derek Strickland
9a82b63686 comments: update some stale comments referencing deprecated config name (#12271)
* comments: update some stale comments referencing deprecated config name
2022-04-05 17:12:23 -04:00
Derek Strickland
bab317300e Add description for allocs stopped due to reconnect (#12270) 2022-04-05 17:12:23 -04:00
Derek Strickland
b317aaa8fe Add unknown to TaskGroupSummary (#12269) 2022-04-05 17:12:23 -04:00
Derek Strickland
6329f44148 disconnected clients: ensure servers meet minimum required version (#12202)
* planner: expose ServerMeetsMinimumVersion via Planner interface
* filterByTainted: add flag indicating disconnect support
* allocReconciler: accept and pass disconnect support flag
* tests: update dependent tests
2022-04-05 17:12:23 -04:00
Derek Strickland
83dd636bf1 MaxClientDisconnect Jobspec checklist (#12177)
* api: Add struct, conversion function, and tests
* TaskGroup: Add field, validation, and tests
* diff: Add diff handler and test
* docs: Update docs
2022-04-05 17:12:23 -04:00
Derek Strickland
5b5c853597 disconnected clients: Observability plumbing (#12141)
* Add disconnects/reconnect to log output and emit reschedule metrics

* TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics
2022-04-05 17:12:23 -04:00
Derek Strickland
b3fb9430bb Fix client test reconnect test; Remove guard test (#12173)
* Update reconnect test to new algorithm and interface; remove guard test
2022-04-05 17:12:23 -04:00
Derek Strickland
35752655b0 disconnected clients: Add reconnect task event (#12133)
* Add TaskClientReconnectedEvent constant
* Add allocRunner.Reconnect function to manage task state manually
* Removes server-side push
2022-04-05 17:12:23 -04:00
DerekStrickland
97ce949f0e reconciler: fix loop control bug 2022-04-05 17:12:22 -04:00
DerekStrickland
2cea9920ab evaluateNodePlan: validate plans for disconnected nodes 2022-04-05 17:12:22 -04:00
DerekStrickland
d06155e701 NodeStatusDisconnected: support state transitions for new node status 2022-04-05 17:12:18 -04:00
DerekStrickland
042a07bcaf client: reconnect unknown allocations and sync state 2022-04-05 17:10:41 -04:00
Derek Strickland
786180601d reconciler: support disconnected clients (#12058)
* Add merge helper for string maps
* structs: add statuses, MaxClientDisconnect, and helper funcs
* taintedNodes: Include disconnected nodes
* upsertAllocsImpl: don't use existing ClientStatus when upserting unknown
* allocSet: update filterByTainted and add delayByMaxClientDisconnect
* allocReconciler: support disconnecting and reconnecting allocs
* GenericScheduler: upsert unknown and queue reconnecting

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-05 17:10:37 -04:00
Phil Renaud
34760eed39 Copy change, simplifies explanation for no related eval chart 2022-04-05 16:59:50 -04:00
Seth Hoenig
1d2e2c0d3c raw_exec: fixup review comments 2022-04-05 15:21:28 -05:00
Shishir
4042c28223 cli: add -quiet to nomad node status command. (#12426) 2022-04-05 15:53:43 -04:00
Jai
1d28553786 ui: eval filter (#12243)
* ui:  add triggeredBy filter

* add namespace filter

* fix:  namespace is a reserved keyword

* ui: filter by type and search

* fix:  rename closure action to

* chore:  fix data-test-attr
2022-04-05 15:30:36 -04:00
Jai
3e0a1e19ad Epic: Evaluation Detail Sidebar (#12370)
* chore: prettify gutter-menu

* chore:  add portal packages

* styling:  add styles sidebar and portal behavior

* ui:  sidebar component

* ui:  create and implement statechart for evals

* ui:  actor-relationship service and provider component

* ui:  d3 hierarchy computation

* chore:  add render-modifiers and curved arrows

* ui:  create evaluation actor div

* fix related evaluations schema

* ui:  register/deregister evaluation divs

* ui:  handle resize behavior

* bug:  infinite re-render cycle

* fix:  conditional logic to prevent infinite render of flex resizing

* ui: related evaluations schema and request param

* ui: fix testing for evaluations

* refact: make related-evals a proper has-many

* chore: don't pauseTest

* temp:  debug d3 hierarchy

* ui:  move derived state logic into backing component class for detail

* ui:  deprecated related evaluations logic in statechart

* ui:  update evaluation models

* ui:  update logic to paint svg in non-viewable scroll region

* ui:  update styling

* ui:  testing for eval detail view

* ui:  delete detail from template directory

* ui:  break detail component down

* ui:  static data for /evaluation/:id endpoint

* ui:  fix styling of d3 viz

* ui:  add query parameter adapter for evals

* ui:  last minute design requests

* wip:  address browser updating detail view behavior

* refact: handle query-state change in statechart

* conditional class looking for currentEval equality (#12411)

* F UI/evaluation detail sidebar rel evals (#12415)

* ui:  remove busy id alias from statechart

* ui: edit related evaluations viz error message

* ui:  bug fixes on related evaluations view (#12423)

* ui:  remove busy id alias from statechart

* ui: edit related evaluations viz error message

* ui:  update error state

* ui:  related evaluation outline styling

* Related evaluation stylefile and non-link if it matches the active sidebar (#12428)

* Adds tabbable and keyboard pressable evaluation table rows (#12433)

* ui:  fix failing eval list tests (#12437)

* ui:  move styling into classes (#12438)

* fix test failures (#12444)

* ui:  move styling into classes

* ui:  eslint disable

* ui:  allocations have evaluations as async relationships

* ui:  fix evaluation refresh button (#12447)

* ui:  move styling into classes

* ui:  eslint disable

* ui:  allocations have evaluations as async relationships

* ui:  refresh bug

* ui:  final touches on sidebar (#12462)

* chore: turn off template linting rules

Temporarily turning off template linting because we dont have a set CSS convention and the release needs to go out ASAP.

* doc:  deprecate out of date comments and vars

* ui:  edit mirage server fetch logic

* ui:  style sidebar relative

* Modification to mocked related evals and manually set 100% height on svg (#12460)

* F UI/evaluation detail sidebar final touches (#12463)

* chore: turn off template linting rules

Temporarily turning off template linting because we dont have a set CSS convention and the release needs to go out ASAP.

* doc:  deprecate out of date comments and vars

* ui:  edit mirage server fetch logic

* ui:  style sidebar relative

* ui:  account for new related eval added to chain

Co-authored-by: Michael Klein <michael@firstiwaslike.com>
Co-authored-by: Phil Renaud <phil@riotindustries.com>
2022-04-05 14:34:37 -04:00
Luiz Aoqui
d412f7b497 Support Vault entity aliases (#12449)
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.

Make Vault job validation its own function so it's easier to expand it.

Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.

Set `ChangeMode` on `Vault.Canonicalize`.

Add some missing tests.

Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.

An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.

Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
2022-04-05 14:18:10 -04:00
Tim Gross
a8d5e5e7a3 CSI: don't block client shutdown for node unmount (#12457)
When we unmount a volume we need to be able to recover from cases
where the plugin has been shutdown before the allocation that needs
it, so in #11892 we blocked shutting down the alloc runner hook. But
this blocks client shutdown if we're in the middle of unmounting. The
client won't be able to communicate with the plugin or send the
unpublish RPC anyways, so we should cancel the context and assume that
we'll resume the unmounting process when the client restarts.

For `-dev` mode we don't send the graceful `Shutdown()` method and
instead destroy all the allocations. In this case, we'll never be able
to communicate with the plugin but also never close the context we
need to prevent the hook from blocking. To fix this, move the retries
into their own goroutine that doesn't block the main `Postrun`.
2022-04-05 13:05:10 -04:00
James Rasell
b7d19a60b8 Merge pull request #12454 from hashicorp/f-rename-service-event-stream
events: add service API logic and rename topic to service from serviceregistration
2022-04-05 16:19:14 +02:00
Grant Griffiths
a2859059ff CSI: Add secrets flag support for delete volume (#11245) 2022-04-05 08:59:11 -04:00
James Rasell
cebe704572 events: add API helpers for service events stream topics. 2022-04-05 08:26:02 +01:00
James Rasell
85baf8f5ae events: fixup service events and rename topic to service. 2022-04-05 08:25:22 +01:00
Seth Hoenig
be7ec8de3e raw_exec: make raw exec driver work with cgroups v2
This PR adds support for the raw_exec driver on systems with only cgroups v2.

The raw exec driver is able to use cgroups to manage processes. This happens
only on Linux, when exec_driver is enabled, and the no_cgroups option is not
set. The driver uses the freezer controller to freeze processes of a task,
issue a sigkill, then unfreeze. Previously the implementation assumed cgroups
v1, and now it also supports cgroups v2.

There is a bit of refactoring in this PR, but the fundamental design remains
the same.

Closes #12351 #12348
2022-04-04 16:11:38 -05:00
Danish Prakash
ff6ae5fad2 command/operator_debug: add pprof interval (#11938) 2022-04-04 15:24:12 -04:00
Michael Schurter
5de999d21a Merge pull request #12442 from hashicorp/f-sd-add-mixed-auth-read-endpoints
service-disco: add mixed auth to list and read RPC endpoints.
2022-04-04 12:19:29 -07:00
Tim Gross
f718c132b4 CSI: volume watcher shutdown fixes (#12439)
The volume watcher design was based on deploymentwatcher and drainer,
but has an important difference: we don't want to maintain a goroutine
for the lifetime of the volume. So we stop the volumewatcher goroutine
for a volume when that volume has no more claims to free. But the
shutdown races with updates on the parent goroutine, and it's possible
to drop updates. Fortunately these updates are picked up on the next
core GC job, but we're most likely to hit this race when we're
replacing an allocation and that's the time we least want to wait.

Wait until the volume has "settled" before stopping this goroutine so
that the race between shutdown and the parent goroutine sending on
`<-updateCh` is pushed to after the window we most care about quick
freeing of claims.

* Fixes a resource leak when volumewatchers are no longer needed. The
  volume is nil and can't ever be started again, so the volume's
  `watcher` should be removed from the top-level `Watcher`.

* De-flakes the GC job test: the test throws an error because the
  claimed node doesn't exist and is unreachable. This flaked instead of
  failed because we didn't correctly wait for the first pass through the
  volumewatcher.

  Make the GC job wait for the volumewatcher to reach the quiescent
  timeout window state before running the GC eval under test, so that
  we're sure the GC job's work isn't being picked up by processing one
  of the earlier claims. Update the claims used so that we're sure the
  GC pass won't hit a node unpublish error.

* Adds trace logging to unpublish operations
2022-04-04 10:46:45 -04:00
Seth Hoenig
f8d693b079 Merge pull request #12403 from hashicorp/dependabot/go_modules/github.com/creack/pty-1.1.18
build(deps): bump github.com/creack/pty from 1.1.17 to 1.1.18
2022-04-04 09:43:09 -05:00
dependabot[bot]
1ce6bc5bbf build(deps): bump github.com/creack/pty from 1.1.17 to 1.1.18
Bumps [github.com/creack/pty](https://github.com/creack/pty) from 1.1.17 to 1.1.18.
- [Release notes](https://github.com/creack/pty/releases)
- [Commits](https://github.com/creack/pty/compare/v1.1.17...v1.1.18)

---
updated-dependencies:
- dependency-name: github.com/creack/pty
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-04-04 14:25:02 +00:00
Seth Hoenig
1764a4da4c Merge pull request #12446 from shoenig/no-pkg-err
cleanup: purge github.com/pkg/errors
2022-04-04 09:22:44 -05:00
Tim Gross
b91d0e73cb E2E: ensure that CSI EBS tests are isolated from each other (#12443)
Tear down the volume-consuming job between subtests, rather than after
all the tests are complete. For good measure, use a different ID for
the volume-consuming job as well.
2022-04-04 09:44:55 -04:00
James Rasell
e839640d15 service-disco: add mixed auth to list and read RPC endpoints.
In the same manner as the delete RPC, the list and read service
registration endpoints can be called either by external operators
or Nomad nodes. The latter occurs when a template is being
rendered which includes Nomad API template funcs. In this case,
the auth token is looked up as the node secret ID for auth.
2022-04-04 13:45:43 +01:00