Commit Graph

23988 Commits

Author SHA1 Message Date
James Rasell
847c2cc528 client: accommodate Consul 1.14.0 gRPC and agent self changes. (#15309)
* client: accommodate Consul 1.14.0 gRPC and agent self changes.

Consul 1.14.0 changed the way in which gRPC listeners are
configured, particularly when using TLS. Prior to the change, a
single listener was responsible for handling plain-text and
encrypted gRPC requests. In 1.14.0 and beyond, separate listeners
will be used for each, defaulting to 8502 and 8503 for plain-text
and TLS respectively.

The change means that Nomad’s Consul Connect integration would not
work when integrated with Consul clusters using TLS and running
1.14.0 or greater.

The Nomad Consul fingerprinter identifies the gRPC port Consul has
exposed using the "DebugConfig.GRPCPort" value from Consul’s
“/v1/agent/self” endpoint. In Consul 1.14.0 and greater, this only
represents the plain-text gRPC port which is likely to be disbaled
in clusters running TLS. In order to fix this issue, Nomad now
takes into account the Consul version and configured scheme to
optionally use “DebugConfig.GRPCTLSPort” value from Consul’s agent
self return.

The “consul_grcp_socket” allocrunner hook has also been updated so
that the fingerprinted gRPC port attribute is passed in. This
provides a better fallback method, when the operator does not
configure the “consul.grpc_address” option.

* docs: modify Consul Connect entries to detail 1.14.0 changes.

* changelog: add entry for #15309

* fixup: tidy tests and clean version match from review feedback.

* fixup: use strings tolower func.
2022-11-21 09:19:09 -06:00
Jai
2aff20e894 respect casing on service tags (#15329)
* styles: add service tag style

* refact: update service tag on alloc

* refact: update service tag in component
2022-11-21 10:18:15 -05:00
Jai
34004b0917 style: wrap secret value in tag (#15331) 2022-11-21 10:18:02 -05:00
Seth Hoenig
3b14db4b83 consul: add trace logging around service registrations (#15311)
This PR adds trace logging around the differential done between a Nomad service
registration and its corresponding Consul service registration, in an effort
to shed light on why a service registration request is being made.
2022-11-21 08:03:56 -06:00
Piotr Kazmierczak
b7ddd5bf62 acl: sso auth method RPC endpoints (#15221)
This PR implements RPC endpoints for SSO auth methods.

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-21 10:15:39 +01:00
Piotr Kazmierczak
fee85dac79 acl: sso auth method event stream (#15280)
This PR implements SSO auth method support in the event stream.

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-21 10:06:05 +01:00
Phil Renaud
4703f55d6d [ui] Show Consul Connect upstreams / on update info in sidebar (#15324)
* Added consul connect icon and sidebar info

* Show icon to the right of name
2022-11-18 22:49:10 -05:00
Seth Hoenig
78593daaee e2e: jammy image needs latest java lts (#15323) 2022-11-18 14:36:36 -06:00
James Rasell
faabc2b2c2 api: ensure ACL role upsert decode error returns a 400 status code. (#15253) 2022-11-18 17:47:43 +01:00
James Rasell
c495cd99bf api: ensure all request body decode error return a 400 status code. (#15252) 2022-11-18 17:04:33 +01:00
Luiz Aoqui
329807bd7f docs: add cpu-allocated and memory-allocated (#15299)
Document the Autoscaler Nomad APM paramemeters `cpu-allocated` and
`memory-allocated` that were implemented in
https://github.com/hashicorp/nomad-autoscaler/pull/324 and
https://github.com/hashicorp/nomad-autoscaler/pull/334
2022-11-18 10:55:17 -05:00
Tim Gross
991e9a27cb make eval cancelation really async with Eval.Ack (#15298)
Ensure we never block in the `Eval.Ack`
2022-11-18 08:38:17 -05:00
Luiz Aoqui
6a3cf74f32 scheduler: log stack in case of panic (#15303) 2022-11-17 18:59:33 -05:00
stswidwinski
5ce42fe8f2 Add mount propagation to protobuf definition of mounts (#15096)
* Add mount propagation to protobuf definition of mounts

* Fix formatting

* Add mount propagation to the simple roundtrip test.

* changelog: add entry for #15096

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-11-17 18:14:59 -05:00
Tim Gross
eb1507c86f make eval cancelation async with Eval.Ack (#15294)
In #14621 we added an eval canelation reaper goroutine with a channel that
allowed us to wake it up. But we forgot to actually send on this channel from
`Eval.Ack` and are still committing the cancelations synchronously. Fix this by
sending on the buffered channel to wake up the reaper instead.
2022-11-17 16:40:41 -05:00
Tim Gross
6eb1f99fb3 autopilot: include only servers from the same region (#15290)
When we migrated to the updated autopilot library in Nomad 1.4.0, the interface
for finding servers changed. Previously autopilot would get the serf members and
call `IsServer` on each of them, leaving it up to the implementor to filter out
clients (and in Nomad's case, other regions). But in the "new" autopilot
library, the equivalent interface is `KnownServers` for which we did not filter
by region. This causes spurious attempts for the cross-region stats fetching,
which results in TLS errors and a lot of log noise.

Filter the member set by region to fix the regression.
2022-11-17 12:09:36 -05:00
Tim Gross
21c2d1593a remove deprecated AllocUpdateRequestType raft entry (#15285)
After Deployments were added in Nomad 0.6.0, the `AllocUpdateRequestType` raft
log entry was no longer in use. Mark this as deprecated, remove the associated
dead code, and remove references to the metrics it emits from the docs. We'll
leave the entry itself just in case we encounter old raft logs that we need to
be able to safely load.
2022-11-17 12:08:04 -05:00
Seth Hoenig
7c254ccdb8 e2e: disable systemd stub dns in jammy image (#15286) 2022-11-17 09:50:44 -06:00
stswidwinski
d16a2c9467 Fix goroutine leakage (#15180)
* Fix goroutine leakage

* cl: add cl entry

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-11-17 09:47:11 -06:00
Seth Hoenig
732adae999 ci: use hashicorp/setup-golang for setting up go compiler, cache (#15271)
This PR changes test-core to make use of

https://github.com/hashicorp/setup-golang

to consolidate the setting up of the Go compiler and the Go modules cache
used for the CI job.

Fixes: #14905
2022-11-17 07:50:45 -06:00
Tim Gross
f54a50bb0b keyring: update handle to state inside replication loop (#15227)
* keyring: update handle to state inside replication loop

When keyring replication starts, we take a handle to the state store. But
whenever a snapshot is restored, this handle is invalidated and no longer points
to a state store that is receiving new keys. This leaks a bunch of memory too!

In addition to operator-initiated restores, when fresh servers are added to
existing clusters with large-enough state, the keyring replication can get
started quickly enough that it's running before the snapshot from the existing
clusters have been restored.

Fix this by updating the handle to the state store on each pass.
2022-11-17 08:40:12 -05:00
Ayrat Badykov
322c6b3dce fix create snapshot request docs (#15242) 2022-11-17 08:43:40 +01:00
Tim Gross
1c4307b829 eval broker: shed all but one blocked eval per job after ack (#14621)
When an evaluation is acknowledged by a scheduler, the resulting plan is
guaranteed to cover up to the `waitIndex` set by the worker based on the most
recent evaluation for that job in the state store. At that point, we no longer
need to retain blocked evaluations in the broker that are older than that index.

Move all but the highest priority / highest `ModifyIndex` blocked eval into a
canceled set. When the `Eval.Ack` RPC returns from the eval broker it will
signal a reap of a batch of cancelable evals to write to raft. This paces the
cancelations limited by how frequently the schedulers are acknowledging evals;
this should reduce the risk of cancelations from overwhelming raft relative to
scheduler progress. In order to avoid straggling batches when the cluster is
quiet, we also include a periodic sweep through the cancelable list.
2022-11-16 16:10:11 -05:00
Seth Hoenig
0e3606afa0 e2e: swap bionic image for jammy (#15220) 2022-11-16 10:37:18 -06:00
Tim Gross
460f19b608 test: ensure leader is still valid in reelection test (#15267)
The `TestLeader_Reelection` test waits for a leader to be elected and then makes
some other assertions. But it implcitly assumes that there's no failure of
leadership before shutting down the leader, which can lead to a panic in the
tests. Assert there's still a leader before the shutdown.
2022-11-16 11:04:02 -05:00
Jai
3743e913f7 feat: add tooltip to storage volumes (#15245)
* feat: add tooltip to storage volumes

* chore: move Tooltip into td to preserve style

* styling: add overflow-x to section (#15246)

* styling: add overflow-x to section

* refact: use media query with display block
2022-11-15 14:13:57 -05:00
Jai
22f9c554e0 refact: remove unused API (#15244) 2022-11-15 14:13:14 -05:00
James Rasell
a3f3018227 agent: ensure all HTTP Server methods are pointer receivers. (#15250) 2022-11-15 16:31:44 +01:00
Nikita Beletskii
b55ab6318e Fix variable create API example in docs (#15248) 2022-11-15 16:04:11 +01:00
Tim Gross
65b3d01aab eval delete: move batching of deletes into RPC handler and state (#15117)
During unusual outage recovery scenarios on large clusters, a backlog of
millions of evaluations can appear. In these cases, the `eval delete` command can
put excessive load on the cluster by listing large sets of evals to extract the
IDs and then sending larges batches of IDs. Although the command's batch size
was carefully tuned, we still need to be JSON deserialize, re-serialize to
MessagePack, send the log entries through raft, and get the FSM applied.

To improve performance of this recovery case, move the batching process into the
RPC handler and the state store. The design here is a little weird, so let's
look a the failed options first:

* A naive solution here would be to just send the filter as the raft request and
  let the FSM apply delete the whole set in a single operation. Benchmarking with
  1M evals on a 3 node cluster demonstrated this can block the FSM apply for
  several minutes, which puts the cluster at risk if there's a leadership
  failover (the barrier write can't be made while this apply is in-flight).

* A less naive but still bad solution would be to have the RPC handler filter
  and paginate, and then hand a list of IDs to the existing raft log
  entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
  took roughly an hour to complete.

Instead, we're filtering and paginating in the RPC handler to find a page token,
and then passing both the filter and page token in the raft log. The FSM apply
recreates the paginator using the filter and page token to get roughly the same
page of evaluations, which it then deletes. The pagination process is fairly
cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
rework ends up being much faster. A benchmark of 1M evaluations showed this
blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
completes in less than 4 minutes.

Note that, as with the existing design, this delete is not consistent: a new
evaluation inserted "behind" the cursor of the pagination will fail to be
deleted.
2022-11-14 14:08:13 -05:00
Douglas Jose
1217a96edf Fix wrong reference to vault (#15228) 2022-11-14 10:49:09 +01:00
Kyle Root
263ed6f9c6 Fix broken URL to nvidia device plugin (#15234) 2022-11-14 10:37:06 +01:00
Charlie Voiselle
9ad90290e2 [bug] Return a spec on reconnect (#15214)
client: fixed a bug where non-`docker` tasks with network isolation would leak network namespaces and iptables rules if the client was restarted while they were running
2022-11-11 13:27:36 -05:00
Seth Hoenig
5f3f52156e client: avoid unconsumed channel in timer construction (#15215)
* client: avoid unconsumed channel in timer construction

This PR fixes a bug introduced in #11983 where a Timer initialized with 0
duration causes an immediate tick, even if Reset is called before reading the
channel. The fix is to avoid doing that, instead creating a Timer with a non-zero
initial wait time, and then immediately calling Stop.

* pr: remove redundant stop
2022-11-11 09:31:34 -06:00
Tim Gross
11a5f79084 exec: allow running commands from host volume (#14851)
The exec driver and other drivers derived from the shared executor check the
path of the command before handing off to libcontainer to ensure that the
command doesn't escape the sandbox. But we don't check any host volume mounts,
which should be safe to use as a source for executables if we're letting the
user mount them to the container in the first place.

Check the mount config to verify the executable lives in the mount's host path,
but then return an absolute path within the mount's task path so that we can hand
that off to libcontainer to run.

Includes a good bit of refactoring here because the anchoring of the final task
path has different code paths for inside the task dir vs inside a mount. But
I've fleshed out the test coverage of this a good bit to ensure we haven't
created any regressions in the process.
2022-11-11 09:51:15 -05:00
Seth Hoenig
106dce9c9f docs: clarify how to access task meta values in templates (#15212)
This PR updates template and meta docs pages to give examples of accessing
meta values in templates. To do so one must use the environment variable form
of the meta key name, which isn't obvious and wasn't yet documented.
2022-11-10 16:11:53 -06:00
Luiz Aoqui
a2fed26ffa ci: notify on backport-assistant errors (#15203) 2022-11-10 16:11:26 -05:00
Luiz Aoqui
e20af3cf20 ci: re-enable tests on main (#15204)
Now that the tests are grouped more tightly we don't use as many runners
as before, so we can re-enable these without clogging the queue.
2022-11-10 13:51:37 -05:00
Piotr Kazmierczak
02253e6f19 acl: sso auth method schema and store functions (#15191)
This PR implements ACLAuthMethod type, acl_auth_methods table schema and crud state store methods. It also updates nomadSnapshot.Persist and nomadSnapshot.Restore methods in order for them to work with the new table, and adds two new Raft messages: ACLAuthMethodsUpsertRequestType and ACLAuthMethodsDeleteRequestType

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-10 19:42:41 +01:00
Seth Hoenig
00c8cd37bb template: protect use of template manager with a lock (#15192)
This PR protects access to `templateHook.templateManager` with its lock. So
far we have not been able to reproduce the panic - but it seems either Poststart
is running without a Prestart being run first (should be impossible), or the
Update hook is running concurrently with Poststart, nil-ing out the templateManager
in a race with Poststart.

Fixes #15189
2022-11-10 08:30:27 -06:00
Seth Hoenig
72d58fcf72 make: add target cl for create changelog entry (#15186)
* make: add target cl for create changelog entry

This PR adds `tools/cl-entry` and the `make cl` Makefile target for
conveniently creating correctly formatted Changelog entries.

* Update tools/cl-entry/main.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* Update tools/cl-entry/main.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-11-08 09:43:32 -06:00
Derek Strickland
7e8306e476 api: remove mapstructure tags fromPort struct (#12916)
This PR solves a defect in the deserialization of api.Port structs when returning structs from theEventStream.

Previously, the api.Port struct's fields were decorated with both mapstructure and hcl tags to support the network.port stanza's use of the keyword static when posting a static port value. This works fine when posting a job and when retrieving any struct that has an embedded api.Port instance as long as the value is deserialized using JSON decoding. The EventStream, however, uses mapstructure to decode event payloads in the api package. mapstructure expects an underlying field named static which does not exist. The result was that the Port.Value field would always be set to 0.

Upon further inspection, a few things became apparent.

The struct already has hcl tags that support the indirection during job submission.
Serialization/deserialization with both the json and hcl packages produce the desired result.
The use of of the mapstructure tags provided no value as the Port struct contains only fields with primitive types.
This PR:

Removes the mapstructure tags from the api.Port structs
Updates the job parsing logic to use hcl instead of mapstructure when decoding Port instances.
Closes #11044

Co-authored-by: DerekStrickland <dstrickland@hashicorp.com>
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2022-11-08 11:26:28 +01:00
twunderlich-grapl
1b5eedc07a Fix s3 example URLs in the artifacts docs (#15123)
* Fix s3 URLs so that they work

Unfortunately, s3 urls prefixed with https:// do NOT work with the underlying go-getter library. As such, this fixes the examples so that they are working examples that won't cause problems for people reading the docs.
See discussion in https://github.com/hashicorp/nomad/issues/1113 circa 2016.

* Use s3:// protocol schema for artifact examples

Per the discussion in https://github.com/hashicorp/nomad/pull/15123,
we're going to use the explicit s3 protocol in the examples since that
is the likeliest to work in all scenarios
2022-11-07 14:14:57 -05:00
Drew Gonzales
1a1ce363e7 server: add git revision to serf tags (#9159) 2022-11-07 10:34:33 -05:00
Phil Renaud
6a75882967 [ui] Remove animation from task logs sidebar (#15146)
* Remove animation from task logs sidebar

* changelog
2022-11-07 10:11:18 -05:00
Tim Gross
ce0e0768ff API for Eval.Count (#15147)
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.

The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
2022-11-07 08:53:19 -05:00
dependabot[bot]
406db7f258 build(deps): bump github.com/zclconf/go-cty-yaml from 1.0.2 to 1.0.3 (#15165)
Bumps [github.com/zclconf/go-cty-yaml](https://github.com/zclconf/go-cty-yaml) from 1.0.2 to 1.0.3.
- [Release notes](https://github.com/zclconf/go-cty-yaml/releases)
- [Changelog](https://github.com/zclconf/go-cty-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/zclconf/go-cty-yaml/compare/v1.0.2...v1.0.3)

---
updated-dependencies:
- dependency-name: github.com/zclconf/go-cty-yaml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-07 06:57:23 -06:00
dependabot[bot]
171ca522e0 build(deps): bump github.com/hashicorp/go-plugin from 1.4.3 to 1.4.5 (#15166)
Bumps [github.com/hashicorp/go-plugin](https://github.com/hashicorp/go-plugin) from 1.4.3 to 1.4.5.
- [Release notes](https://github.com/hashicorp/go-plugin/releases)
- [Changelog](https://github.com/hashicorp/go-plugin/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/go-plugin/compare/v1.4.3...v1.4.5)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-07 06:39:41 -06:00
dependabot[bot]
05a2eb970f build(deps): bump github.com/shirou/gopsutil/v3 from 3.22.9 to 3.22.10 (#15162)
Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.22.9 to 3.22.10.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.22.9...v3.22.10)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-07 06:37:42 -06:00
dependabot[bot]
ee2f3e4e7c build(deps): bump github.com/zclconf/go-cty from 1.11.0 to 1.12.0 (#15161)
Bumps [github.com/zclconf/go-cty](https://github.com/zclconf/go-cty) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/zclconf/go-cty/releases)
- [Changelog](https://github.com/zclconf/go-cty/blob/main/CHANGELOG.md)
- [Commits](https://github.com/zclconf/go-cty/compare/v1.11.0...v1.12.0)

---
updated-dependencies:
- dependency-name: github.com/zclconf/go-cty
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-07 06:36:44 -06:00