Commit Graph

22855 Commits

Author SHA1 Message Date
James Rasell
52acfcd867 docs: add nomad services template jobspec example. (#12514) 2022-04-08 17:29:19 +02:00
Luiz Aoqui
9849ceb0bf ci: add semgrep rule to catch usage of invalid string extensions (#12509) 2022-04-08 10:58:32 -04:00
Seth Hoenig
53eb6ed817 Merge pull request #12508 from twunderlich-grapl/custom-variable-validation
Add custom variable validation to docs
2022-04-08 08:53:03 -05:00
Seth Hoenig
7e0e4a86fd docs: tweak hcl2 validation example 2022-04-08 08:43:42 -05:00
Thomas Wunderlich
12126efe83 Add custom variable validation to docs
Custom variable validation is a useful feature that is supported by
Nomad and not just Terraform. As such it should be documented on the
input variable page.
I've cribbed the content from the terraform docs so this should be
consistent across projects
2022-04-07 19:06:06 -04:00
Jasmine Dahilig
ccaaadf493 docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505) 2022-04-07 15:12:41 -07:00
Luiz Aoqui
e5de3c4643 ci: skip prerelease if triggered by the generate assets workflow (#12504) 2022-04-07 16:04:53 -04:00
Phil Renaud
f04fc21761 Importing string methods directly from @ember/string (#12499)
* Capitalize methods

* Let ESLint yell at us again

* Dasherize
2022-04-07 15:51:41 -04:00
Tim Gross
ab6f13db1d Fix flaky operator debug test (#12501)
We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes:

* Make first pprof collection synchronous to preserve the existing
  behavior for the common case where the pprof interval matches the
  duration.

* Clamp `operator debug` pprof timing to that of the command. The
  `pprof-duration` should be no more than `duration` and the
  `pprof-interval` should be no more than `pprof-duration`. Clamp the
  values rather than throwing errors, which could change the commands
  that existing users might already have in debugging scripts

* Testing: remove test parallelism

  The `operator debug` tests that stand up servers can't be run in
  parallel, because we don't have a way of canceling the API calls for
  pprof. The agent will still be running the last pprof when we exit,
  and that breaks the next test that talks to that same agent.
  (Because you can only run one pprof at a time on any process!)

  We could split off each subtest into its own server, but this test
  suite is already very slow. In future work we should fix this "for
  real" by making the API call cancelable.


* Testing: assert against unexpected errors in `operator debug` tests.

  If we assert there are no unexpected error outputs, it's easier for
  the developer to debug when something is going wrong with the tests
  because the error output will be presented as a failing test, rather
  than just a failing exit code check. Or worse, no failing exit code
  check!

  This also forces us to be explicit about which tests will return 0
  exit codes but still emit (presumably ignorable) error outputs.

Additional minor bug fixes (mostly in tests) and test refactorings:

* Fix text alignment on pprof Duration in `operator debug` output

* Remove "done" channel from `operator debug` event stream test. The
  goroutine we're blocking for here already tells us it's done by
  sending a value, so block on that instead of an extraneous channel

* Event stream test timer should start at current time, not zero

* Remove noise from `operator debug` test log output. The `t.Logf`
  calls already are picked out from the rest of the test output by
  being prefixed with the filename.

* Remove explicit pprof args so we use the defaults clamped from
  duration/interval
2022-04-07 15:00:07 -04:00
Seth Hoenig
2c6e84c521 Merge pull request #12496 from hashicorp/f-cores-env
client: set environment variable indicating set of reserved cpu cores
2022-04-07 12:07:57 -05:00
James Rasell
a30c3f36bf e2e: fix eventual consistency failure within consultemplate suite. (#12494) 2022-04-07 17:03:10 +02:00
Seth Hoenig
5c5607a000 docs: update cl 2022-04-07 10:02:00 -05:00
Lars Lehtonen
7596ec9225 nomad/state: fix dropped test errors (#12406) 2022-04-07 10:48:10 -04:00
Seth Hoenig
8cfe123a3a client: set environment variable indicating set of reserved cpu cores
This PR injects the 'NOMAD_CPU_CORES' environment variable into
tasks that have been allocated reserved cpu cores. The value uses
normal cpuset notation, as found in cpuset.cpu cgroup interface files.

Note this value is not necessiarly the same as the content of the actual
cpuset.cpus interface file, which will also include shared cpu cores when
using cgroups v2. This variable is a workaround for users who used to be
able to read the reserved cgroup cpuset file, but lose the information
about distinct reserved cores when using cgroups v2.

Side discussion in: https://github.com/hashicorp/nomad/issues/12374
2022-04-07 09:09:35 -05:00
Derek Strickland
92999966e3 plan_apply: Add missing unit test for validating plans for disconnected clients (#12495) 2022-04-07 09:58:09 -04:00
Tim Gross
daa982425e api: use cleanhttp.DefaultPooledTransport for default API client (#12492)
We expect every Nomad API client to use a single connection to any
given agent, so take advantage of keep-alive by switching the default
transport to `DefaultPooledClient`. Provide a facility to close idle
connections for testing purposes.

Restores the previously reverted #12409


Co-authored-by: Ben Buzbee <bbuzbee@cloudflare.com>
2022-04-06 16:14:53 -04:00
Luiz Aoqui
fb6da72a25 changelog: make breaking change note for raft v3 (#12493) 2022-04-06 16:00:38 -04:00
Luiz Aoqui
3d29c6ffb1 changelog: add entry for #12435 (#12491) 2022-04-06 14:22:09 -04:00
Seth Hoenig
0b7a2175a2 Merge pull request #12484 from hashicorp/tests-handler-exec-failure
exec: fix exec handler test
2022-04-06 13:13:07 -05:00
Luiz Aoqui
b221ac1940 changelog: minor fixes (#12487) 2022-04-06 14:05:10 -04:00
James Rasell
a024b15796 client: account for service provider namespace updates in hooks. (#12479)
When a service is updated, the service hooks update a number of
internal fields which helps generate the new workload. This also
needs to update the namespace for the service provider. It is
possible for these to be different, and in the case of Nomad and
Consul running OSS, this is to be expected.
2022-04-06 19:26:22 +02:00
James Rasell
9dc0b88cb5 client: add Nomad template service functionality to runner. (#12458)
This change modifies the template task runner to utilise the
new consul-template which includes Nomad service lookup template
funcs.

In order to provide security and auth to consul-template, we use
a custom HTTP dialer which is passed to consul-template when
setting up the runner. This method follows Vault implementation.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-04-06 19:17:05 +02:00
Seth Hoenig
c7836c6c8a exec: fix exec handler test
Fixup this test to handle cgroups v2, as well as the :misc: cgroup
2022-04-06 12:11:37 -05:00
Jasmine Dahilig
cab30e5423 metrics: emit stats for vault token next_renewal & last_renewal #5222 (#12435) 2022-04-06 10:03:11 -07:00
Jasmine Dahilig
c6583b27c8 docs: update vault-token note in job run command #8040 (#12385) 2022-04-06 10:01:38 -07:00
Tim Gross
f217185992 Revert "Use cleanhttp.DefaultPooledTransport for the default API client (#12409)" (#12480)
This reverts commit 6e1270dd08.
2022-04-06 12:58:51 -04:00
Luiz Aoqui
a573dec05e ci: make version script match ENT to avoid unnecessary merge conflicts (#12482) 2022-04-06 12:56:52 -04:00
James Rasell
9e20a34d75 website: add initial website docs for Nomad service discovery. (#12456) 2022-04-06 18:51:14 +02:00
James Rasell
8f331fe7b3 changelog: add entry for #12368; native service discovery. (#12474) 2022-04-06 18:21:34 +02:00
claire labry
0becc4a9b7 [Main] Onboard to CRT (#12276) 2022-04-06 11:47:02 -04:00
Derek Strickland
4190388646 disconnected clients: Add changelog entry (#12477) 2022-04-06 11:44:26 -04:00
Phil Renaud
f2bd3d0c90 Inlines related evaluations flexbox (#12475) 2022-04-06 11:35:25 -04:00
Benjamin Buzbee
6e1270dd08 Use cleanhttp.DefaultPooledTransport for the default API client (#12409)
The only difference is DefaultTransport sets DisableKeepAlives

This doesn't make much sense to me - every http connection from the
nomad client goes to the same NOMAD_ADDR so it's a great case for keep
alive. Except round robin DNS and anycast perhaps.

Consul does this already
1e47e3c82b/api/api.go (L397)
2022-04-06 11:34:55 -04:00
Jorge Marey
cf6ca95f79 Fix in-place updates over ineligible nodes (#12264) 2022-04-06 11:30:40 -04:00
Derek Strickland
12b7647220 Merge pull request #12476 from hashicorp/f-disconnected-client-allocation-handling
disconnected clients: Feature branch merge
2022-04-06 10:11:57 -04:00
Derek Strickland
8863d1e45a disconnected clients: Support operator manual interventions (#12436)
* allocrunner: Remove Shutdown call in Reconnect
* Node.UpdateAlloc: Stop orphaned allocs.
* reconciler: Stop failed reconnects.
* Apply feedback from code review. Handle rebase conflict.
* Apply suggestions from code review

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-06 09:33:32 -04:00
James Rasell
bca64ad988 Merge pull request #12459 from hashicorp/b-fix-service-delete-cli-flake
cli: fixup service test delete by using atomic actions.
2022-04-06 15:22:08 +02:00
Mike Nomitch
84937300c3 Add max client disconnect docs (#12467)
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-04-06 08:54:14 -04:00
Phil Renaud
5740642fa8 Merge pull request #12473 from hashicorp/f-ui/evals-unshown-copy-change
Copy change, simplifies explanation for no related eval chart
2022-04-06 08:13:28 -04:00
James Rasell
3909253f6c cli: fixup service test delete by using more atomic actions. 2022-04-06 08:36:23 +01:00
Luiz Aoqui
c24c1bf07f ui: hide triggered by and status filters for now (#12472) 2022-04-05 21:14:16 -04:00
Seth Hoenig
133471282b Merge pull request #12419 from hashicorp/exec-cleanup
raw_exec: make raw exec driver work with cgroups v2
2022-04-05 16:42:01 -05:00
Derek Strickland
6791147254 disconnected clients: TaskGroup validation (#12418)
* TaskGroup: Validate that max_client_disconnect and stop_after_client_disconnect are mutually exclusive.
2022-04-05 17:14:50 -04:00
Tim Gross
ca14fb0cc8 docs: updates for CSI plugin improvements for 1.3.0 (#12466) 2022-04-05 17:13:51 -04:00
Derek Strickland
8ac3e642e6 reconciler: 2 phase reconnects and tests (#12333)
* structs: Add alloc.Expired & alloc.Reconnected functions. Add Reconnect eval trigger by.

* node_endpoint: Emit new eval for reconnecting unknown allocs.

* filterByTainted: handle 2 phase commit filtering rules.

* reconciler: Append AllocState on disconnect. Logic updates from testing and 2 phase reconnects.

* allocs: Set reconnect timestamp. Destroy if not DesiredStatusRun. Watch for unknown status.
2022-04-05 17:13:10 -04:00
Derek Strickland
9a82b63686 comments: update some stale comments referencing deprecated config name (#12271)
* comments: update some stale comments referencing deprecated config name
2022-04-05 17:12:23 -04:00
Derek Strickland
bab317300e Add description for allocs stopped due to reconnect (#12270) 2022-04-05 17:12:23 -04:00
Derek Strickland
b317aaa8fe Add unknown to TaskGroupSummary (#12269) 2022-04-05 17:12:23 -04:00
Derek Strickland
6329f44148 disconnected clients: ensure servers meet minimum required version (#12202)
* planner: expose ServerMeetsMinimumVersion via Planner interface
* filterByTainted: add flag indicating disconnect support
* allocReconciler: accept and pass disconnect support flag
* tests: update dependent tests
2022-04-05 17:12:23 -04:00
Derek Strickland
83dd636bf1 MaxClientDisconnect Jobspec checklist (#12177)
* api: Add struct, conversion function, and tests
* TaskGroup: Add field, validation, and tests
* diff: Add diff handler and test
* docs: Update docs
2022-04-05 17:12:23 -04:00