Commit Graph

22895 Commits

Author SHA1 Message Date
Michael Schurter
19bac3caa8 docs: add plan for node rejected details and more (#12564)
- Moved federation docs to the bottom since *everyone* is potentially
  affected by the other sections on the page, but only users of
  federation are affected by it.
- Added section on the plan for node rejected bug since it is fairly
  easy to diagnose and removing affected nodes is a fairly reliable
  workaround.
- Mention 5s cliff for wait_for_index.
- Remove the lie that we do not have job status metrics! How old was
  that?!
- Reinforce the importance of monitoring basic system resources
2022-04-14 16:09:33 -07:00
Tim Gross
4ca980311c E2E: add debugging outputs for disconnected clients test (#12572)
This test has a failure that's happening only occassionally and not
very reproducibly. Print out the allocation status on test failure so
that we can do some post-mortum debugging of the test on nightly.
2022-04-14 17:03:57 -04:00
Tim Gross
33cc69cdda ui: remove beta tag from gutter menu for CSI (#12570) 2022-04-14 14:56:04 -04:00
Tim Gross
d2aab5d53d fix data race in dynamic plugin registry tests (#12554)
These tests have a data race where the test assertion is reading a
value that's being set in the `listenFunc` goroutines that are
subscribing to registry update events. Move the assertion into the
subscribing goroutine to remove the race. This bug was discovered
in #12098 but does not impact production Nomad code.
2022-04-14 14:55:56 -04:00
Seth Hoenig
6e0e423b98 Merge pull request #12543 from idrennanvmware/add-allocid-to-sidecar
Add alloc_id to sidecar bootstrap
2022-04-14 13:27:09 -05:00
Luiz Aoqui
6cb520cee0 ci: fix backport target branch pattern (#12571) 2022-04-14 14:12:41 -04:00
Seth Hoenig
f2ea1fab5a connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs
This PR expands on the work done in #12543 to
- prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags
- merge into pre-existing envoy_stats_tags fields
- update the upgrade guide docs
- update changelog
2022-04-14 12:52:52 -05:00
Ian Drennan
5ca35cf49d Add alloc_id to sidecar bootstrap 2022-04-14 11:46:06 -05:00
Michael Schurter
29af9891f8 test: test the buffered pipe used by nsd (#12563)
Nomad Service Discovery uses an in-memory buffered pipe implementation
to connect consul-template to the Nomad API.

This adds a basic test for that helper functionality.
2022-04-14 08:38:25 -07:00
James Rasell
281ce5ed21 jobspec: add max_client_disconnect to hcl1 group parsing. (#12568) 2022-04-14 14:56:58 +02:00
Derek Strickland
8f7abae89f Update E2E terraform output command (#12561) 2022-04-13 16:46:09 -04:00
James Rasell
281a0fb38e service discovery: add pagination and filtering support to info requests (#12552)
* services: add pagination and filter support to info RPC.
* cli: add filter flag to service info command.
* docs: add pagination and filter details to services info API.
* paginator: minor updates to comment and func signature.
2022-04-13 07:41:44 +02:00
claire labry
36c89f61bb updates for backport assistant (#12311) 2022-04-12 14:01:19 -04:00
Tim Gross
9d5b3bcc53 CSI: fix data race in plugin manager (#12553)
The plugin manager for CSI hands out instances of a plugin for callers
that need to mount a volume. The `MounterForPlugin` method accesses
the internal instances map without a lock, and can be called
concurrently from outside the plugin manager's main run-loop.

The original commit for the instances map included a warning that it
needed to be accessed only from the main loop but that comment was
unfortunately ignored shortly thereafter, so this bug has existed in
the code for a couple years without being detected until we ran tests
with `-race` in #12098. Lesson learned here: comments make for lousy
enforcement of invariants!
2022-04-12 12:18:04 -04:00
Luiz Aoqui
8dec033bd6 add some godocs for the API pagination tokenizer options (#12547) 2022-04-12 10:27:22 -04:00
Tim Gross
247e20e10b scripts: fix interpreter for bash (#12549)
Many of our scripts have a non-portable interpreter line for bash and
use bash-specific variables like `BASH_SOURCE`. Update the interpreter
line to be portable between various Linuxes and macOS without
complaint from posix shell users.
2022-04-12 10:08:21 -04:00
Tim Gross
86ca8f7e73 E2E: fix flaky event stream test (#12548)
This changeset fixes two sources of flakiness in the event stream test.

First, the stream request gets the event *closest* to the index, not
the exact match. Although events are written before raft entries
they're written asynchronously, so it's possible to race and get a
raft index from this query higher than the current head of the event
buffer. Ensure the job is running before we try to get the index, so
that we've given the event enough time to land in the buffer.

Second, the assertion that the found index is greater than the start
index is only true if the `PlanResult` event manages to land before we
do the second registration. Although it should now with the first fix
above, it's not a correct assertion for what we're testing.
2022-04-12 08:35:39 -04:00
Luiz Aoqui
8bde164eaa ci: change notification channel to feed-nomad-releases (#12550) 2022-04-11 19:12:58 -04:00
claire labry
5a0a8f606f move nomad.service out of etc (#12541) 2022-04-11 18:26:10 -04:00
Seth Hoenig
24eb703e74 Merge pull request #12532 from greut/feat/remove-consul-lib
feat: remove dependency to consul/lib
2022-04-11 13:52:05 -05:00
Karan Sharma
210a45718e feat: add nomctx and nomad-events-sink (#12542) 2022-04-11 14:47:03 -04:00
Yoan Blanc
148fe61cc5 fix: use NewSafeTimer
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2022-04-11 19:37:14 +02:00
Tim Gross
1c13deec86 E2E: oversubscription assertion needs to wait for stats (#12540)
The oversubscription test expects an output that requires the client
has polled the task for stats at least once. Wait long enough to
ensure that we've polled the stats before failing the test.
2022-04-11 11:40:51 -04:00
Tim Gross
62f2cd77fa E2E: test for nodes disconnected by netsplit (#12407) 2022-04-11 11:34:27 -04:00
Tim Gross
69cdc80984 allocs without max_client_disconnect should be lost on disconnect (#12529)
In the reconciler's filtering for tainted nodes, we use whether the
server supports disconnected clients as a gate to a bunch of our
logic, but this doesn't account for cases where the job doesn't have
`max_client_disconnect`. The only real consequence of this appears to
be that allocs on disconnected nodes are marked "complete" instead of
"lost".
2022-04-11 11:24:49 -04:00
Seth Hoenig
a550ed9bb3 Merge pull request #12527 from fynxiu/plugins/drivers/ctxdone
fix(plugins): should return when ctx.Done
2022-04-11 07:46:39 -05:00
James Rasell
bd415dfd85 e2e: add initial service discovery tests. (#12512)
Some tests may chose to deregister jobs to check Nomad cleanup
logic, however, it is still possible for the test to fail and exit
before this is hit. This therefore adds a cancellable cleanup func
which can be deferred, using context to control whether it gets
run or not.
2022-04-11 11:12:24 +02:00
Yoan Blanc
bda7b1ece0 feat: remove dependency to consul/lib
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2022-04-09 13:22:44 +02:00
Tim Gross
6a49a0fb81 set minimum version for disconnected client mode to 1.3.0 (#12530) 2022-04-08 16:48:37 -04:00
Luiz Aoqui
d4f8263483 changelog: update #12476 entry to highlight the feature (#12528) 2022-04-08 13:28:23 -04:00
Luiz Aoqui
5722aa80fc Merge pull request #12506 from hashicorp/merge-release-1.3.0-beta.1-branch 2022-04-08 13:21:33 -04:00
fyn
b6ec83b59b fix(plugins): should return when ctx.Done 2022-04-09 01:04:29 +08:00
Seth Hoenig
dba098ec34 Merge pull request #12524 from hashicorp/docs-cleanup-up-docs
docs: fixup title formatting in upgrade guide
2022-04-08 11:58:49 -05:00
Seth Hoenig
be80a63584 docs: fixup title formatting in upgrade guide 2022-04-08 11:50:54 -05:00
Luiz Aoqui
c3e36bb367 docs: fix upgrade specific broken link and conflict tag (#12521) 2022-04-08 12:36:47 -04:00
Luiz Aoqui
4066321ac1 add Nomad v1.3.0-beta.1 download box (#12517) 2022-04-08 12:04:14 -04:00
James Rasell
52acfcd867 docs: add nomad services template jobspec example. (#12514) 2022-04-08 17:29:19 +02:00
Luiz Aoqui
9849ceb0bf ci: add semgrep rule to catch usage of invalid string extensions (#12509) 2022-04-08 10:58:32 -04:00
Seth Hoenig
53eb6ed817 Merge pull request #12508 from twunderlich-grapl/custom-variable-validation
Add custom variable validation to docs
2022-04-08 08:53:03 -05:00
Seth Hoenig
7e0e4a86fd docs: tweak hcl2 validation example 2022-04-08 08:43:42 -05:00
Thomas Wunderlich
12126efe83 Add custom variable validation to docs
Custom variable validation is a useful feature that is supported by
Nomad and not just Terraform. As such it should be documented on the
input variable page.
I've cribbed the content from the terraform docs so this should be
consistent across projects
2022-04-07 19:06:06 -04:00
Luiz Aoqui
4dac8e97c8 remove generated files and prepare for next release 2022-04-07 18:51:18 -04:00
Luiz Aoqui
211f6e694b Merge remote-tracking branch 'origin/release/1.3.0-beta.1' into merge-release-1.3.0-beta.1-branch 2022-04-07 18:46:18 -04:00
Jasmine Dahilig
ccaaadf493 docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505) 2022-04-07 15:12:41 -07:00
hc-github-team-nomad-core
2eba643965 Generate files for release 2022-04-07 20:21:26 +00:00
Luiz Aoqui
fcfb8d9e37 update ci.hcl, version.go and CHANGELOG to v1.3.0-beta.1 2022-04-07 16:13:49 -04:00
Luiz Aoqui
e5de3c4643 ci: skip prerelease if triggered by the generate assets workflow (#12504) 2022-04-07 16:04:53 -04:00
Phil Renaud
f04fc21761 Importing string methods directly from @ember/string (#12499)
* Capitalize methods

* Let ESLint yell at us again

* Dasherize
2022-04-07 15:51:41 -04:00
Tim Gross
ab6f13db1d Fix flaky operator debug test (#12501)
We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes:

* Make first pprof collection synchronous to preserve the existing
  behavior for the common case where the pprof interval matches the
  duration.

* Clamp `operator debug` pprof timing to that of the command. The
  `pprof-duration` should be no more than `duration` and the
  `pprof-interval` should be no more than `pprof-duration`. Clamp the
  values rather than throwing errors, which could change the commands
  that existing users might already have in debugging scripts

* Testing: remove test parallelism

  The `operator debug` tests that stand up servers can't be run in
  parallel, because we don't have a way of canceling the API calls for
  pprof. The agent will still be running the last pprof when we exit,
  and that breaks the next test that talks to that same agent.
  (Because you can only run one pprof at a time on any process!)

  We could split off each subtest into its own server, but this test
  suite is already very slow. In future work we should fix this "for
  real" by making the API call cancelable.


* Testing: assert against unexpected errors in `operator debug` tests.

  If we assert there are no unexpected error outputs, it's easier for
  the developer to debug when something is going wrong with the tests
  because the error output will be presented as a failing test, rather
  than just a failing exit code check. Or worse, no failing exit code
  check!

  This also forces us to be explicit about which tests will return 0
  exit codes but still emit (presumably ignorable) error outputs.

Additional minor bug fixes (mostly in tests) and test refactorings:

* Fix text alignment on pprof Duration in `operator debug` output

* Remove "done" channel from `operator debug` event stream test. The
  goroutine we're blocking for here already tells us it's done by
  sending a value, so block on that instead of an extraneous channel

* Event stream test timer should start at current time, not zero

* Remove noise from `operator debug` test log output. The `t.Logf`
  calls already are picked out from the rest of the test output by
  being prefixed with the filename.

* Remove explicit pprof args so we use the defaults clamped from
  duration/interval
2022-04-07 15:00:07 -04:00
Seth Hoenig
2c6e84c521 Merge pull request #12496 from hashicorp/f-cores-env
client: set environment variable indicating set of reserved cpu cores
2022-04-07 12:07:57 -05:00