nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
stswidwinski	2285432424	GC: ensure no leakage of evaluations for batch jobs. (#15097 ) Prior to `2409f72` the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to unreclaimable memory. Prior to this change allocations and evaluations for batch jobs were never garbage collected until the batch job was explicitly stopped. The new `batch_eval_gc_threshold` server configuration controls how often they are collected. The default threshold is `24h`.	2023-01-31 13:32:14 -05:00
Jorge Marey	340ad2db58	Rename fields on proxyConfig (#15541 ) * Change api Fields for expose and paths * Add changelog entry * changelog: add deprecation notes about connect fields * api: minor style tweaks --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-01-30 09:31:16 -06:00
Piotr Kazmierczak	949a6f60c7	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
James Rasell	166aee71f3	cli: separate auth method config output for easier reading. (#15892 )	2023-01-30 11:44:26 +01:00
Seth Hoenig	ce71d2dc0d	consul: check for acceptable service identity on consul tokens (#15928 ) When registering a job with a service and 'consul.allow_unauthenticated=false', we scan the given Consul token for an acceptable policy or role with an acceptable policy, but did not scan for an acceptable service identity (which is backed by an acceptable virtual policy). This PR updates our consul token validation to also accept a matching service identity when registering a service into Consul. Fixes #15902	2023-01-27 18:15:51 -06:00
Tim Gross	e53b591582	metrics: Add remaining server RPC rate metrics (#15901 )	2023-01-27 08:29:53 -05:00
Piotr Kazmierczak	0abadb6804	acl: make auth method default across all types (#15869 )	2023-01-26 14:17:11 +01:00
James Rasell	14fb036473	sso: allow binding rules to create management ACL tokens. (#15860 ) * sso: allow binding rules to create management ACL tokens. * docs: update binding rule docs to detail management type addition.	2023-01-26 09:57:44 +01:00
Tim Gross	bcd5bbdad7	add metric for count of RPC requests (#15515 ) Implement a metric for RPC requests with labels on the identity, so that administrators can monitor the source of requests within the cluster. This changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire up the remaining RPCs once we've threaded the new pre-forwarding authentication through the all. Note that metrics are measured after we forward but before we return any authentication error. This ensures that we only emit metrics on the server that actually serves the request. We'll perform rate limiting at the same place. Includes telemetry configuration to omit identity labels.	2023-01-24 11:54:20 -05:00
Tim Gross	b79d00abf3	implement pre-forwarding auth on select RPCs (#15513 ) In #15417 we added a new `Authenticate` method to the server that returns an `AuthenticatedIdentity` struct. This changeset implements this method for a small number of RPC endpoints that together represent all the various ways in which RPCs are sent, so that we can validate that we're happy with this approach.	2023-01-24 10:52:07 -05:00
Karl Johann Schubert	588392cabc	client: add disk_total_mb and disk_free_mb config options (#15852 )	2023-01-24 09:14:22 -05:00
Charlie Voiselle	85f67d4a83	Add raft snapshot configuration options (#15522 ) * Add config elements * Wire in snapshot configuration to raft * Add hot reload of raft config * Add documentation for new raft settings * Add changelog	2023-01-20 14:21:51 -05:00
James Rasell	6a8728d00a	cli: use localhost for default login callback address. (#15820 )	2023-01-19 16:46:17 +01:00
James Rasell	859cb6e3fb	Merge branch 'main' into sso/gh-13120-oidc-login	2023-01-18 10:05:31 +00:00
Phil Renaud	d57b805780	[sso] OIDC Updates for the UI (#15804 ) * Updated UI to handle OIDC method changes * Remove redundant store unload call	2023-01-17 17:01:47 -05:00
Dao Thanh Tung	af56eb8b7f	fix bug in nomad fmt -check does not return error code (#15797 )	2023-01-17 09:15:34 -05:00
James Rasell	531bada034	cli: add login command to allow OIDC provider SSO login.	2023-01-13 13:16:09 +00:00
James Rasell	0279d95b55	api: add OIDC HTTP API endpoints and SDK.	2023-01-13 13:15:58 +00:00
Seth Hoenig	4698d8da79	consul/connect: support for proxy upstreams opaque config (#15761 ) This PR adds support for configuring `proxy.upstreams[].config` for Consul Connect upstreams. This is an opaque config value to Nomad - the data is passed directly to Consul and is unknown to Nomad.	2023-01-12 08:20:54 -06:00
Anthony Davis	abe088954e	Fix rejoin_after_leave behavior (#15552 )	2023-01-11 16:39:24 -05:00
Dao Thanh Tung	30b235345d	cli: Add a nomad operator client state command (#15469 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-01-11 10:03:31 -05:00
Dao Thanh Tung	f89ac80801	agent: Make agent syslog log level inherit from Nomad agent log (#15625 )	2023-01-04 09:38:06 -05:00
Tim Gross	e23b3a350e	csi: Fix parsing of '=' in secrets at command line and HTTP (#15670 ) The command line flag parsing and the HTTP header parsing for CSI secrets incorrectly split at more than one '=' rune, making it impossible to use secrets that included that rune.	2023-01-03 16:28:38 -05:00
Seth Hoenig	dab4d7ed7a	ci: swap freeport for portal in packages (#15661 )	2023-01-03 11:25:20 -06:00
Seth Hoenig	e2f912046b	command: fixup parsing of stale query parameter (#15631 ) In #15605 we fixed the bug where the presense of "stale" query parameter was mean to imply stale, even if the value of the parameter was "false" or malformed. In parsing, we missed the case where the slice of values would be nil which lead to a failing test case that was missed because CI didn't run against the original PR.	2023-01-03 08:21:20 -06:00
Seth Hoenig	2609f6a137	cleanup: remove usage of consul/sdk/testutil/retry (#15609 ) This PR removes usages of `consul/sdk/testutil/retry`, as part of the ongoing effort to remove use of any non-API module from Consul. There is one remanining usage in the helper/freeport package, but that will get removed as part of #15589	2023-01-02 08:06:20 -06:00
Dao Thanh Tung	1584496d96	fix: `stale` querystring parameter value as boolean (#15605 ) * Add changes to make stale querystring param boolean Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Make error message more consistent Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Changes from code review + Adding CHANGELOG file Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Changes from code review to use github.com/shoenig/test package Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Change must.Nil() to must.NoError() Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Minor fix on the import order Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Fix existing code format too Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Minor changes addressing code review feedbacks Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * swap must.EqOp() order of param provided Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-01-01 13:04:14 -06:00
Seth Hoenig	5380a944ad	command: fixup tests concerning multi job stop (#15606 ) * command: fixup job multi-stop test This PR refactors the StopCommand test that runs 10 jobs and then passes them all to one invokation of 'job stop'. * test: swap use of assert for must * test: cleanup job files we create * command: fixup job stop failure tests Now that JobStop works on concurrent jobs, the error messages are different. * cleanup: use multiple post scripts	2022-12-21 16:21:48 -06:00
Seth Hoenig	3bb144c43f	tests: do not return error from testagent shutdown (#15595 )	2022-12-21 08:23:58 -06:00
Danish Prakash	16401b864e	command/job_stop: accept multiple jobs, stop concurrently (#12582 ) * command/job_stop: accept multiple jobs, stop concurrently Signed-off-by: danishprakash <grafitykoncept@gmail.com> * command/job_stop_test: add test for multiple job stops Signed-off-by: danishprakash <grafitykoncept@gmail.com> * improve output, add changelog and docs Signed-off-by: danishprakash <grafitykoncept@gmail.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-12-16 15:46:58 -08:00
James Rasell	13c1275057	cli: add ACL binding rule commands for CRUD actions. (#15554 )	2022-12-15 16:57:44 +01:00
James Rasell	4d60dd3dbb	ACL: add ACL binding rule RPC and HTTP API handlers. (#15529 ) This change add the RPC ACL binding rule handlers. These handlers are responsible for the creation, updating, reading, and deletion of binding rules. The write handlers are feature gated so that they can only be used when all federated servers are running the required version. The HTTP API handlers and API SDK have also been added where required. This allows the endpoints to be called from the API by users and clients.	2022-12-15 09:18:55 +01:00
Piotr Kazmierczak	627debc14b	acl: numerous small bugfixes for acl auth methods CLI (#15539 ) This PR contains a number of small bugfixes discovered during #15538 work.	2022-12-14 13:25:40 +01:00
Piotr Kazmierczak	9dbe34ac05	bugfix: acl sso auth methods test failures (#15512 ) This PR fixes unit test failures introduced in `f4e89e2`	2022-12-09 18:47:32 +01:00
Piotr Kazmierczak	7ee82dc21a	acl: added type to ACL Auth Method stub (#15480 )	2022-12-06 14:47:05 +01:00
Piotr Kazmierczak	2c196c3de4	bugfix: corrected indentation for ACL auth method create CLI command (#15481 )	2022-12-06 14:45:24 +01:00
Seth Hoenig	86479386f8	consul: fixup expected consul tagged_addresses when using ipv6 (#15411 ) This PR is a continuation of #14917, where we missed the ipv6 cases. Consul auto-inserts tagged_addresses for keys - lan_ipv4 - wan_ipv4 - lan_ipv6 - wan_ipv6 even though the service registration coming from Nomad does not contain such elements. When doing the differential between services Nomad expects to be registered vs. the services actually registered into Consul, we must first purge these automatically inserted tagged_addresses if they do not exist in the Nomad view of the Consul service.	2022-12-01 07:38:30 -06:00
Piotr Kazmierczak	fe1ff602f8	acl: sso auth methods RPC/API/CLI should return created or updated objects (#15410 ) Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens. This PR fixes it. Relates to #13120	2022-11-29 07:36:36 +01:00
Piotr Kazmierczak	2d83ce7b15	acl: sso auth methods cli commands (#15322 ) This PR implements CLI commands to interact with SSO auth methods. This PR is part of the SSO work captured under ☂️ ticket #13120.	2022-11-28 10:51:45 +01:00
Piotr Kazmierczak	ecd454e15d	bugfix: typos in acl role commands (#15382 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2022-11-25 10:28:33 +01:00
Luiz Aoqui	d439fe0d90	cli: improve errors for multiregion deployments (#15326 ) Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2022-11-23 16:40:13 -05:00
Jack	66c61e4fd2	cli: `wait` flag for use with `deployment status -monitor` (#15262 )	2022-11-23 16:36:13 -05:00
James Rasell	84b79aa87d	sso: add ACL auth-method HTTP API CRUD endpoints (#15338 ) * core: remove custom auth-method TTLS and use ACL token TTLS. * agent: add ACL auth-method HTTP endpoints for CRUD actions. * api: add ACL auth-method client.	2022-11-23 09:38:02 +01:00
Lance Haig	8667dc2607	Add command "nomad tls" (#14296 )	2022-11-22 14:12:07 -05:00
hc-github-team-nomad-core	87f3565fc6	Generate files for 1.4.3 release	2022-11-22 12:56:29 -05:00
Seth Hoenig	3b14db4b83	consul: add trace logging around service registrations (#15311 ) This PR adds trace logging around the differential done between a Nomad service registration and its corresponding Consul service registration, in an effort to shed light on why a service registration request is being made.	2022-11-21 08:03:56 -06:00
James Rasell	faabc2b2c2	api: ensure ACL role upsert decode error returns a 400 status code. (#15253 )	2022-11-18 17:47:43 +01:00
James Rasell	c495cd99bf	api: ensure all request body decode error return a 400 status code. (#15252 )	2022-11-18 17:04:33 +01:00
James Rasell	a3f3018227	agent: ensure all HTTP Server methods are pointer receivers. (#15250 )	2022-11-15 16:31:44 +01:00
Tim Gross	65b3d01aab	eval delete: move batching of deletes into RPC handler and state (#15117 ) During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluations can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, re-serialize to MessagePack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.	2022-11-14 14:08:13 -05:00

1 2 3 4 5 ...

3453 Commits