nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 02:15:43 +03:00

Author	SHA1	Message	Date
James Rasell	84b79aa87d	sso: add ACL auth-method HTTP API CRUD endpoints (#15338 ) * core: remove custom auth-method TTLS and use ACL token TTLS. * agent: add ACL auth-method HTTP endpoints for CRUD actions. * api: add ACL auth-method client.	2022-11-23 09:38:02 +01:00
Phil Renaud	142043382e	Task sub row alignment changes (#15363 )	2022-11-22 15:49:50 -05:00
Lance Haig	8667dc2607	Add command "nomad tls" (#14296 )	2022-11-22 14:12:07 -05:00
Tim Gross	612f0a68f9	Merge pull request #15361 from hashicorp/post-1.4.3-release Post 1.4.3 release	2022-11-22 13:20:43 -05:00
Tim Gross	91e1af1f37	changelogs for 1.3.8 and 1.2.15	2022-11-22 12:57:55 -05:00
hc-github-team-nomad-core	8c6a5354c6	Prepare for next release	2022-11-22 12:56:29 -05:00
hc-github-team-nomad-core	87f3565fc6	Generate files for 1.4.3 release	2022-11-22 12:56:29 -05:00
Seth Hoenig	6e4410a9b1	e2e: fix 1 of 4 client disconnect tests (#15357 ) This PR modifies the disconnect helper job to run as root, which is necesary for manipulating iptables as it does. Also re-organizes the final test logic to wait for client re-connect before looking for the replacement (3rd) allocation in case that client was needed to run the alloc (also giving the sheduler more time to do its thing). Skips the other 3 tests, which fail and I cannot yet figure out what is going on.	2022-11-22 08:51:53 -06:00
Jai	6276a6b4ba	refact: add conditional table logic (#15330 )	2022-11-22 09:19:16 -05:00
Phil Renaud	7723e64fc4	Conditional CSS and an awaiter to help screenshot consistency (#15355 )	2022-11-21 14:55:13 -05:00
Tim Gross	0235280bd0	ensure engineering has merge authority on build pipeline (#15350 ) Adds @hashicorp/nomad-eng to the codeowners list for the build and release workflow files, so that we can fix problems that arise without being bottlenecked on another team.	2022-11-21 14:30:02 -05:00
Tim Gross	ba81ae18e1	pin build/release pipeline to ubuntu 20.04 (#15348 ) The `ubuntu-latest` runner has been migrated to Ubuntu 22.04, which doesn't have all the same multilib packages as 20.04. Although we'll probably want to migrate eventually, we should ship Nomad 1.4.3 with the same toolchain as we did previously so that we're not introducing new issues.	2022-11-21 14:08:45 -05:00
Seth Hoenig	2372c6d20c	e2e: fixup oversubscription test case for jammy (#15347 ) * e2e: fixup oversubscription test case for jammy jammy uses cgroups v2, need to lookup the max memory limit from the unified heirarchy format * e2e: set constraint to require cgroups v2 on oversub docker test	2022-11-21 12:41:55 -06:00
James Rasell	847c2cc528	client: accommodate Consul 1.14.0 gRPC and agent self changes. (#15309 ) * client: accommodate Consul 1.14.0 gRPC and agent self changes. Consul 1.14.0 changed the way in which gRPC listeners are configured, particularly when using TLS. Prior to the change, a single listener was responsible for handling plain-text and encrypted gRPC requests. In 1.14.0 and beyond, separate listeners will be used for each, defaulting to 8502 and 8503 for plain-text and TLS respectively. The change means that Nomad’s Consul Connect integration would not work when integrated with Consul clusters using TLS and running 1.14.0 or greater. The Nomad Consul fingerprinter identifies the gRPC port Consul has exposed using the "DebugConfig.GRPCPort" value from Consul’s “/v1/agent/self” endpoint. In Consul 1.14.0 and greater, this only represents the plain-text gRPC port which is likely to be disbaled in clusters running TLS. In order to fix this issue, Nomad now takes into account the Consul version and configured scheme to optionally use “DebugConfig.GRPCTLSPort” value from Consul’s agent self return. The “consul_grcp_socket” allocrunner hook has also been updated so that the fingerprinted gRPC port attribute is passed in. This provides a better fallback method, when the operator does not configure the “consul.grpc_address” option. * docs: modify Consul Connect entries to detail 1.14.0 changes. * changelog: add entry for #15309 * fixup: tidy tests and clean version match from review feedback. * fixup: use strings tolower func.	2022-11-21 09:19:09 -06:00
Jai	2aff20e894	respect casing on service tags (#15329 ) * styles: add service tag style * refact: update service tag on alloc * refact: update service tag in component	2022-11-21 10:18:15 -05:00
Jai	34004b0917	style: wrap secret value in tag (#15331 )	2022-11-21 10:18:02 -05:00
Seth Hoenig	3b14db4b83	consul: add trace logging around service registrations (#15311 ) This PR adds trace logging around the differential done between a Nomad service registration and its corresponding Consul service registration, in an effort to shed light on why a service registration request is being made.	2022-11-21 08:03:56 -06:00
Piotr Kazmierczak	b7ddd5bf62	acl: sso auth method RPC endpoints (#15221 ) This PR implements RPC endpoints for SSO auth methods. This PR is part of the SSO work captured under ☂️ ticket #13120.	2022-11-21 10:15:39 +01:00
Piotr Kazmierczak	fee85dac79	acl: sso auth method event stream (#15280 ) This PR implements SSO auth method support in the event stream. This PR is part of the SSO work captured under ☂️ ticket #13120.	2022-11-21 10:06:05 +01:00
Phil Renaud	4703f55d6d	[ui] Show Consul Connect upstreams / on update info in sidebar (#15324 ) * Added consul connect icon and sidebar info * Show icon to the right of name	2022-11-18 22:49:10 -05:00
Seth Hoenig	78593daaee	e2e: jammy image needs latest java lts (#15323 )	2022-11-18 14:36:36 -06:00
James Rasell	faabc2b2c2	api: ensure ACL role upsert decode error returns a 400 status code. (#15253 )	2022-11-18 17:47:43 +01:00
James Rasell	c495cd99bf	api: ensure all request body decode error return a 400 status code. (#15252 )	2022-11-18 17:04:33 +01:00
Luiz Aoqui	329807bd7f	docs: add cpu-allocated and memory-allocated (#15299 ) Document the Autoscaler Nomad APM paramemeters `cpu-allocated` and `memory-allocated` that were implemented in https://github.com/hashicorp/nomad-autoscaler/pull/324 and https://github.com/hashicorp/nomad-autoscaler/pull/334	2022-11-18 10:55:17 -05:00
Tim Gross	991e9a27cb	make eval cancelation really async with `Eval.Ack` (#15298 ) Ensure we never block in the `Eval.Ack`	2022-11-18 08:38:17 -05:00
Luiz Aoqui	6a3cf74f32	scheduler: log stack in case of panic (#15303 )	2022-11-17 18:59:33 -05:00
stswidwinski	5ce42fe8f2	Add mount propagation to protobuf definition of mounts (#15096 ) * Add mount propagation to protobuf definition of mounts * Fix formatting * Add mount propagation to the simple roundtrip test. * changelog: add entry for #15096 Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-11-17 18:14:59 -05:00
Tim Gross	eb1507c86f	make eval cancelation async with `Eval.Ack` (#15294 ) In #14621 we added an eval canelation reaper goroutine with a channel that allowed us to wake it up. But we forgot to actually send on this channel from `Eval.Ack` and are still committing the cancelations synchronously. Fix this by sending on the buffered channel to wake up the reaper instead.	2022-11-17 16:40:41 -05:00
Tim Gross	6eb1f99fb3	autopilot: include only servers from the same region (#15290 ) When we migrated to the updated autopilot library in Nomad 1.4.0, the interface for finding servers changed. Previously autopilot would get the serf members and call `IsServer` on each of them, leaving it up to the implementor to filter out clients (and in Nomad's case, other regions). But in the "new" autopilot library, the equivalent interface is `KnownServers` for which we did not filter by region. This causes spurious attempts for the cross-region stats fetching, which results in TLS errors and a lot of log noise. Filter the member set by region to fix the regression.	2022-11-17 12:09:36 -05:00
Tim Gross	21c2d1593a	remove deprecated `AllocUpdateRequestType` raft entry (#15285 ) After Deployments were added in Nomad 0.6.0, the `AllocUpdateRequestType` raft log entry was no longer in use. Mark this as deprecated, remove the associated dead code, and remove references to the metrics it emits from the docs. We'll leave the entry itself just in case we encounter old raft logs that we need to be able to safely load.	2022-11-17 12:08:04 -05:00
Seth Hoenig	7c254ccdb8	e2e: disable systemd stub dns in jammy image (#15286 )	2022-11-17 09:50:44 -06:00
stswidwinski	d16a2c9467	Fix goroutine leakage (#15180 ) * Fix goroutine leakage * cl: add cl entry Co-authored-by: Seth Hoenig <shoenig@duck.com>	2022-11-17 09:47:11 -06:00
Seth Hoenig	732adae999	ci: use hashicorp/setup-golang for setting up go compiler, cache (#15271 ) This PR changes test-core to make use of https://github.com/hashicorp/setup-golang to consolidate the setting up of the Go compiler and the Go modules cache used for the CI job. Fixes: #14905	2022-11-17 07:50:45 -06:00
Tim Gross	f54a50bb0b	keyring: update handle to state inside replication loop (#15227 ) * keyring: update handle to state inside replication loop When keyring replication starts, we take a handle to the state store. But whenever a snapshot is restored, this handle is invalidated and no longer points to a state store that is receiving new keys. This leaks a bunch of memory too! In addition to operator-initiated restores, when fresh servers are added to existing clusters with large-enough state, the keyring replication can get started quickly enough that it's running before the snapshot from the existing clusters have been restored. Fix this by updating the handle to the state store on each pass.	2022-11-17 08:40:12 -05:00
Ayrat Badykov	322c6b3dce	fix create snapshot request docs (#15242 )	2022-11-17 08:43:40 +01:00
Tim Gross	1c4307b829	eval broker: shed all but one blocked eval per job after ack (#14621 ) When an evaluation is acknowledged by a scheduler, the resulting plan is guaranteed to cover up to the `waitIndex` set by the worker based on the most recent evaluation for that job in the state store. At that point, we no longer need to retain blocked evaluations in the broker that are older than that index. Move all but the highest priority / highest `ModifyIndex` blocked eval into a canceled set. When the `Eval.Ack` RPC returns from the eval broker it will signal a reap of a batch of cancelable evals to write to raft. This paces the cancelations limited by how frequently the schedulers are acknowledging evals; this should reduce the risk of cancelations from overwhelming raft relative to scheduler progress. In order to avoid straggling batches when the cluster is quiet, we also include a periodic sweep through the cancelable list.	2022-11-16 16:10:11 -05:00
Seth Hoenig	0e3606afa0	e2e: swap bionic image for jammy (#15220 )	2022-11-16 10:37:18 -06:00
Tim Gross	460f19b608	test: ensure leader is still valid in reelection test (#15267 ) The `TestLeader_Reelection` test waits for a leader to be elected and then makes some other assertions. But it implcitly assumes that there's no failure of leadership before shutting down the leader, which can lead to a panic in the tests. Assert there's still a leader before the shutdown.	2022-11-16 11:04:02 -05:00
Jai	3743e913f7	feat: add tooltip to storage volumes (#15245 ) * feat: add tooltip to storage volumes * chore: move Tooltip into td to preserve style * styling: add overflow-x to section (#15246) * styling: add overflow-x to section * refact: use media query with display block	2022-11-15 14:13:57 -05:00
Jai	22f9c554e0	refact: remove unused API (#15244 )	2022-11-15 14:13:14 -05:00
James Rasell	a3f3018227	agent: ensure all HTTP Server methods are pointer receivers. (#15250 )	2022-11-15 16:31:44 +01:00
Nikita Beletskii	b55ab6318e	Fix variable create API example in docs (#15248 )	2022-11-15 16:04:11 +01:00
Tim Gross	65b3d01aab	eval delete: move batching of deletes into RPC handler and state (#15117 ) During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluations can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, re-serialize to MessagePack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.	2022-11-14 14:08:13 -05:00
Douglas Jose	1217a96edf	Fix wrong reference to `vault` (#15228 )	2022-11-14 10:49:09 +01:00
Kyle Root	263ed6f9c6	Fix broken URL to nvidia device plugin (#15234 )	2022-11-14 10:37:06 +01:00
Charlie Voiselle	9ad90290e2	[bug] Return a spec on reconnect (#15214 ) client: fixed a bug where non-`docker` tasks with network isolation would leak network namespaces and iptables rules if the client was restarted while they were running	2022-11-11 13:27:36 -05:00
Seth Hoenig	5f3f52156e	client: avoid unconsumed channel in timer construction (#15215 ) * client: avoid unconsumed channel in timer construction This PR fixes a bug introduced in #11983 where a Timer initialized with 0 duration causes an immediate tick, even if Reset is called before reading the channel. The fix is to avoid doing that, instead creating a Timer with a non-zero initial wait time, and then immediately calling Stop. * pr: remove redundant stop	2022-11-11 09:31:34 -06:00
Tim Gross	11a5f79084	exec: allow running commands from host volume (#14851 ) The exec driver and other drivers derived from the shared executor check the path of the command before handing off to libcontainer to ensure that the command doesn't escape the sandbox. But we don't check any host volume mounts, which should be safe to use as a source for executables if we're letting the user mount them to the container in the first place. Check the mount config to verify the executable lives in the mount's host path, but then return an absolute path within the mount's task path so that we can hand that off to libcontainer to run. Includes a good bit of refactoring here because the anchoring of the final task path has different code paths for inside the task dir vs inside a mount. But I've fleshed out the test coverage of this a good bit to ensure we haven't created any regressions in the process.	2022-11-11 09:51:15 -05:00
Seth Hoenig	106dce9c9f	docs: clarify how to access task meta values in templates (#15212 ) This PR updates template and meta docs pages to give examples of accessing meta values in templates. To do so one must use the environment variable form of the meta key name, which isn't obvious and wasn't yet documented.	2022-11-10 16:11:53 -06:00
Luiz Aoqui	a2fed26ffa	ci: notify on backport-assistant errors (#15203 )	2022-11-10 16:11:26 -05:00

1 2 3 4 5 ...

24001 Commits