Commit Graph

23664 Commits

Author SHA1 Message Date
Phil Renaud
c83e52e469 Merge pull request #14495 from hashicorp/ui-services-checks-uniq-by-alloc
[ui] Service health checks unique by allocation
2022-09-08 17:00:04 -04:00
Phil Renaud
7b31757ed6 Reflect alloc-id-having mocks in tests 2022-09-08 16:03:56 -04:00
Charlie Voiselle
61a6dbcfcb Add client scheduling eligibility to heartbeat (#14483) 2022-09-08 14:31:36 -04:00
Phil Renaud
b69219d228 Lintfixes 2022-09-08 13:54:50 -04:00
Tim Gross
f2186be02c CSI: failed allocation should not block its own controller unpublish (#14484)
A Nomad user reported problems with CSI volumes associated with failed
allocations, where the Nomad server did not send a controller unpublish RPC.

The controller unpublish is skipped if other non-terminal allocations on the
same node claim the volume. The check has a bug where the allocation belonging
to the claim being freed was included in the check incorrectly. During a normal
allocation stop for job stop or a new version of the job, the allocation is
terminal. But allocations that fail are not yet marked terminal at the point in
time when the client sends the unpublish RPC to the server.

For CSI plugins that support controller attach/detach, this means that the
controller will not be able to detach the volume from the allocation's host and
the replacement claim will fail until a GC is run. This changeset fixes the
conditional so that the claim's own allocation is not included, and makes the
logic easier to read. Include a test case covering this path.

Also includes two minor extra bugfixes:

* Entities we get from the state store should always be copied before
altering. Ensure that we copy the volume in the top-level unpublish workflow
before handing off to the steps.

* The list stub object for volumes in `nomad/structs` did not match the stub
object in `api`. The `api` package also did not include the current
readers/writers fields that are expected by the UI. True up the two objects and
add the previously undocumented fields to the docs.
2022-09-08 13:30:05 -04:00
Phil Renaud
2a8c8c2bac Index check only using alloc-specific ones too 2022-09-08 12:03:08 -04:00
Phil Renaud
f81faa9c48 Move uniquing to the sidebar itself 2022-09-08 11:31:38 -04:00
James Rasell
6a6e4a3634 client: fix RPC forwarding when querying checks for alloc. (#14498)
When querying the checks for an allocation, the request must be
forwarded to the agent that is running the allocation. If the
initial request is made to a server agent, the request can be made
directly to the client agent running the allocation. If the
request is made to a client agent not running the alloc, the
request needs to be forwarded to a server and then the correct
client.
2022-09-08 16:55:23 +02:00
Seth Hoenig
1d9c9964f9 Merge pull request #14497 from hashicorp/b-guard-random-stagger
helper: guard against negative inputs into random stagger
2022-09-08 09:44:56 -05:00
Tim Gross
3dab0249b4 test: fix concurrent map access in TestStatsFetcher (#14496)
The map of in-flight RPCs gets cleared by a goroutine in the test without first
locking it to make sure that it's not being accessed concurrently by the stats
fetcher itself. This can cause a panic in tests.
2022-09-08 10:41:15 -04:00
Seth Hoenig
75b30c2210 helper: guard against negative inputs into random stagger
This PR modifies RandomStagger to protect against negative input
values. If the given interval is negative, the value returned will
be somewhere in the stratosphere. Instead, treat negative inputs
like zero, returning zero.
2022-09-08 09:17:48 -05:00
James Rasell
22403d357c e2e: fixup token expiration test to account for longer forced GC. (#14491) 2022-09-08 14:43:04 +02:00
Michael Schurter
cde9f6c594 docs: add quota panic fix changelog entry (#14485)
See https://github.com/hashicorp/nomad-enterprise/pull/839 for original
(Enterprise only)
2022-09-07 17:04:46 -07:00
Phil Renaud
dd44596c36 Midway point on only showing alloc-specific checks 2022-09-07 15:35:58 -04:00
Phil Renaud
0029de8a5f Merge pull request #14408 from hashicorp/f-ui/service-discovery
Service discovery in the Nomad UI
2022-09-07 13:40:09 -04:00
Phil Renaud
e9169e1b07 Remove an unused internal test comment 2022-09-07 13:39:04 -04:00
Phil Renaud
9a1d72109c Further unused test deletion 2022-09-07 10:34:09 -04:00
Phil Renaud
5674dd4295 Changelog added and unused tests removed 2022-09-07 10:31:39 -04:00
Phil Renaud
e68f07e924 [ui] Service Discovery: Allocation Service fly-out (#14389)
* Bones of a new flyout section

* Basic sidebar behaviour and style edits

* Concept of a refID for service fragments to disambiguate task and group

* A11y audit etc

* Moves health check aggregation to serviceFragment model and retains history

* Has to be a getter

* flyout populated

* Sidebar styling

* Sidebar table and details added

* Mirage fixture

* Active status and table styles

* Unit test mock updated

* Acceptance tests for alloc services table and flyout

* Chart styles closer to mock

* Without a paused test

* Consul and Nomad icons in services table

* Alloc services test updates in light of new column changes

* without using an inherited scenario
2022-09-07 10:24:34 -04:00
Jai
9f0d9c923b ui: add to collection (#14472) 2022-09-07 10:24:34 -04:00
Phil Renaud
e39246f164 [ui] Jobs/jobid/services table sorting (#14410)
* Sortable job service table

* sorting removed from specific service page
2022-09-07 10:24:34 -04:00
Phil Renaud
bf73a51adb Job Services: fixtures and acceptance tests (#14319)
* Added to subnav and basic table implemented

* Existing services become service fragments, and services tab aggregated beneath job route

* Index page within jobs/job/services

* Watchable services

* Lintfixes

* Links to clients and individual services set up

* Child service route

* Keyboard shortcuts on service page

* Model that shows consul services as well, plus level and provider cols

* lintfix

* Level as query param

* Watch job for service name changes too

* Group level service fixtures established

* Progress at task level and job-linked services

* Task and group services on update

* Fixture side-effect cleanup

* Basic acceptance tests for job services

* Testmodel cleanup

* Disabled mirage logging

* New cluster type specifically for services

* Without explicit job-model binding

* Trying to isolate a tostring error

* Account for new tab in keyboardnav

* More test isolation attempts

* Remove skipped tests and link task to parent group by id

ui: add service health viz to table (#14369)

* ui: add service-status-bar

* test: service-status-bar

* refact: update component api for new data struct

* ui: format service health struct

* ui:  add service health viz to table

* temp: add placeholder to remind conditional watcher

* test: write tests for transformation algorithm

* refact: update transformation algo

* ui: conditionally long poll checks endpoint

* refact: add conditional logic for nomad provider

refact: update service-fragment model to include owner info

ui: differentiate between task and group-level in derived state comp

test: add test to document behavior

refact: update tests for api change

refact: update integration test for API change

chore: remove unsused vars

chore: elvis operator to protect mirage

refact: create refId instead of internalModel

refact: update algo

refact: update conditional template logic

refact: update test for api change:

chore: cant use if and not in hbs conditional
2022-09-07 10:24:33 -04:00
Jai
08fe41fee3 ui: long poll /checks endpoint (#14354)
* chore: add lodash isEqual package

* ui: fetch non ember-data records

* ui: create watcher to poll non ember-data records
2022-09-07 10:24:33 -04:00
Phil Renaud
90fbaa15b4 Nomad Services: job routes, model, and serializer updates (#14226)
* Added to subnav and basic table implemented

* Existing services become service fragments, and services tab aggregated beneath job route

* Index page within jobs/job/services

* Watchable services

* Lintfixes

* Links to clients and individual services set up

* Child service route

* Keyboard shortcuts on service page

* Model that shows consul services as well, plus level and provider cols

* lintfix

* Level as query param

* Watch job for service name changes too

* Lintfix

* Testfixes

* Placeholder mirage route
2022-09-07 10:24:33 -04:00
Tim Gross
534869eb67 autopilot: deflake tests (#14475)
Includes:

* Remove leader upgrade raft version test, as older versions of raft are now
  incompatible with our autopilot library.

* Remove attempt to assert initial non-voter status on the `PromoteNonVoter`
  test, as this happens too quickly to reliably detect.

* Unskip some previously-skipped tests which we should make stable.

* Remove the `consul/sdk` retry helper for these tests; this uses panic recovery
  in a kind of a clever/gross way to reduce LoC but it seems to introduce some
  timing issues in the process.

* Add more test step logging and reduce logging noise from the scheduler
  goroutines to make it easier to debug failing tests.

* Be more consistent about using the `waitForStableLeadership` helper so that we
  can assert the cluster is fully stable and not just that we've added peers.
2022-09-07 09:35:01 -04:00
Luiz Aoqui
05908e3b1a ui: remove extra space in menu footer (#14457) 2022-09-06 16:53:17 -04:00
James Rasell
11496d1816 hcl2: add strlen function and update docs. (#14463) 2022-09-06 18:42:40 +02:00
James Rasell
e6e7afdd75 core: clarify ACL token expiry GC messages to show global param. (#14466) 2022-09-06 15:42:45 +02:00
Tim Gross
bbd87eed66 cli: remove network from quota status output (#14468)
Network quotas were removed in Nomad 1.0.4. Remove the fields no longer in use
from the `quota status` output.
2022-09-06 09:37:16 -04:00
Kellen Fox
08de94b0c0 Add a log line to help track node eligibility (#14125)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2022-09-06 14:03:33 +02:00
Yan
d33f1eac71 warn destructive update only when count > 1 (#13103) 2022-09-02 15:30:06 -04:00
Giovani Avelar
fe9e4532d8 [ui] Show a different message when there are no tasks in a job (#14071)
Different mesage when there are not tasks in a job
2022-09-02 15:20:45 -04:00
Tiernan
df043d747c Fix error handling in Client consulDiscoveryImpl (#14431)
Added a missing `continue` on non-nil error to avoid accidentally using a bad peer.
2022-09-02 15:13:03 -04:00
Luiz Aoqui
7d88937751 connect: interpolate task env in config values (#14445)
When configuring Consul Service Mesh, it's sometimes necessary to
provide dynamic value that are only known to Nomad at runtime. By
interpolating configuration values (in addition to configuration keys),
user are able to pass these dynamic values to Consul from their Nomad
jobs.
2022-09-02 15:00:28 -04:00
James Rasell
c9b6d0e71b e2e: add test to exercise ACL tokens with role and policy links. (#14432) 2022-09-02 08:56:00 +02:00
Luiz Aoqui
4f9beb0b39 docs: add warning about changing region config (#14443) 2022-09-01 16:47:06 -04:00
Tim Gross
1815517a19 migrate autopilot implementation to raft-autopilot (#14441)
Nomad's original autopilot was importing from a private package in Consul. It
has been moved out to a shared library. Switch Nomad to use this library so that
we can eliminate the import of Consul, which is necessary to build Nomad ENT
with the current version of the Consul SDK. This also will let us pick up
autopilot improvements shared with Consul more easily.
2022-09-01 14:27:10 -04:00
Luiz Aoqui
99ebd0ab26 cli: set -hcl2-strict to false if -hcl1 is defined (#14426)
These options are mutually exclusive but, since `-hcl2-strict` defaults
to `true` users had to explicitily set it to `false` when using `-hcl1`.

Also return `255` when job plan fails validation as this is the expected 
code in this situation.
2022-09-01 10:42:08 -04:00
Tim Gross
3a4345ef32 docs: clarify CSI plugin compatibility (#14434)
Nomad is generally compliant with the CSI specification for Container
Orchestrators (CO), except for unimplemented features. However, some storage
vendors have built CSI plugins that are not compliant with the specification or
which expect that they're only deployed on Kubernetes. Nomad cannot vouch for
the compatibility of any particular plugin, so clarify this in the docs.

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-09-01 10:06:44 -04:00
Luiz Aoqui
774101d54c cli: ignore VaultToken when generating job diff (#14424) 2022-09-01 10:01:53 -04:00
James Rasell
25e7c2ffa4 chore: remove use of "err" a log line context key for errors. (#14433)
Log lines which include an error should use the full term "error"
as the context key. This provides consistency across the codebase
and avoids a Go style which operators might not be aware of.
2022-09-01 15:06:10 +02:00
dependabot[bot]
1a59a0f5fc build(deps): bump github.com/hashicorp/go-version from 1.4.0 to 1.6.0 (#14364)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2022-09-01 11:55:42 +02:00
James Rasell
785b4dfad7 e2e: add acl test for token expiration. (#14418)
In order to add an E2E test to cover token expiration, the server
config has been updated to include a low minimum allowed TTL
value. For ease of reading, the max value is also set.
2022-09-01 09:36:09 +02:00
Brett Larson
baa3b05d00 Update ephemeral_disk.mdx (#14356)
It is really unclear on how to use this feature. it took me a while to find this, so I thought I would purpose how to use this.
2022-08-31 20:17:41 -04:00
Derek Strickland
79c08ae577 Merge release 1.3.5 files (#14425)
* Merge release 1.3.5 files

* Generate files for 1.3.5 release

* Prepare for next release

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2022-08-31 18:31:56 -04:00
Luiz Aoqui
2c2f6dc4a9 changelog: add entry for #14374 (#14419) 2022-08-31 10:59:19 -04:00
Luiz Aoqui
b4a053ddc7 changelog: add entry for #14381 (#14416) 2022-08-31 10:41:48 -04:00
James Rasell
20f71cf3e4 docs: add documentation for ACL token expiration and ACL roles. (#14332)
The ACL command docs are now found within a sub-dir like the
operator command docs. Updates to the ACL token commands to
accommodate token expiry have also been added.

The ACL API docs are now found within a sub-dir like the operator
API docs. The ACL docs now include the ACL roles endpoint as well
as updated ACL token endpoints for token expiration.

The configuration section is also updated to accommodate the new
ACL and server parameters for the new ACL features.
2022-08-31 16:13:47 +02:00
James Rasell
91e7d8497b e2e: add ACL test suite with ACL Role test. (#14398)
This adds a new ACL test suite to the e2e framework which includes
an initial test for ACL roles. The ACL test includes a helper to
track and clean created Nomad resources which keeps the test
cluster clean no matter if the test fails early or not.
2022-08-31 10:11:28 +02:00
Luiz Aoqui
53da285c25 ci: fix TestNomad_BootstrapExpect_NonVoter test (#14407)
PR #12130 refactored the test to use the `wantPeers` helper, but this
function only returns the number of voting peers, which in this test
should be equal to 2.

I think the tests were passing back them because of a bug in Raft
(https://github.com/hashicorp/raft/pull/483) where a non-voting server
was able to transition to candidate state.

One possible evidence of this is that a successful test run would have
the following log line:

```
raft@v1.3.5/raft.go:1058: nomad.raft: updating configuration: command=AddVoter server-id=127.0.0.1:9101 server-addr=127.0.0.1:9101 servers="[{Suffrage:Voter ID:127.0.0.1:9107 Address:127.0.0.1:9107} {Suffrage:Voter ID:127.0.0.1:9105 Address:127.0.0.1:9105} {Suffrage:Voter ID:127.0.0.1:9103 Address:127.0.0.1:9103} {Suffrage:Voter ID:127.0.0.1:9101 Address:127.0.0.1:9101}]"
```

This commit reverts the test logic to check for peer count, regardless
of voting status.
2022-08-30 16:32:54 -04:00