nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	5c909213ce	scheduler: add reconciler annotations to completed evals (#26188 ) The output of the reconciler stage of scheduling is only visible via debug-level logs, typically accessible only to the cluster admin. We can give job authors better ability to understand what's happening to their jobs if we expose this information to them in the `eval status` command. Add the reconciler's desired updates to the evaluation struct so it can be exposed in the API. This increases the size of evals by roughly 15% in the state store, or a bit more when there are preemptions (but we expect this will be a small minority of evals). Ref: https://hashicorp.atlassian.net/browse/NMD-818 Fixes: https://github.com/hashicorp/nomad/issues/15564	2025-07-07 09:40:21 -04:00
Tim Gross	aa3c08d069	eval status: enrich with related evals and placed allocs tables (#26156 ) When debugging an evaluation, you almost always want to know about all the related evaluations and what allocations were placed by that evaluation (and where), not just failed placements. We can enrich the command by adding the `related` query parameter to the API, and having the command query for the evaluations allocations automatically. Emit this data as a pair of new tables and expose fields like quota limits, and previous/next/blocked eval without the `-verbose` flag. Update the docs to include the full output and remove references to long-removed behavior of the `-json` flag. Ref: https://hashicorp.atlassian.net/browse/NMD-818 Ref: https://go.hashi.co/rfc/nmd-212	2025-06-30 09:23:36 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	65b3d01aab	eval delete: move batching of deletes into RPC handler and state (#15117 ) During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluations can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, re-serialize to MessagePack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.	2022-11-14 14:08:13 -05:00
Tim Gross	ce0e0768ff	API for `Eval.Count` (#15147 ) Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is designed to support interactive use in the `nomad eval delete` command to get a count of evals expected to be deleted before doing so. The state store operations to do this sort of thing are somewhat expensive, but it's cheaper than serializing a big list of evals to JSON. Note that although it seems like this could be done as an extra parameter and response field on `Eval.List`, having it as its own endpoint avoids having to change the response body shape and lets us avoid handling the legacy filter params supported by `Eval.List`.	2022-11-07 08:53:19 -05:00
James Rasell	581390bed1	cli: do not import structs, use API package only. (#13938 )	2022-08-02 16:33:08 +02:00
James Rasell	11cb4c6d82	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
Luiz Aoqui	81687c1ce5	api: add related evals to eval details (#12305 ) The `related` query param is used to indicate that the request should return a list of related (next, previous, and blocked) evaluations. Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>	2022-03-17 13:56:14 -04:00
Jasmine Dahilig	1bdb111127	add create and modify timestamps to evaluations (#5881 )	2019-08-07 09:50:35 -07:00
Preetha Appan	d4056c4489	Rename DelayCeiling to MaxDelay	2018-03-14 16:10:32 -05:00
Alex Dadgar	f6fbb36054	sync	2017-10-13 14:36:02 -07:00
Alex Dadgar	ac1539d5d9	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	53f4952c56	initial impl	2017-07-07 12:03:11 -07:00
Alex Dadgar	4fbe182372	Add metrics to show allocations on the client This PR adds the following metrics to the client: client.allocations.migrating client.allocations.blocked client.allocations.pending client.allocations.running client.allocations.terminal Also adds some missing fields to the API version of the evaluation.	2017-03-09 12:37:41 -08:00
Alex Dadgar	bf34379235	Add QueuedAllocations to api.Evaluation	2017-01-06 11:32:14 -08:00
Alex Dadgar	92bddbc3a5	rename SpawnedBlockedEval and simplify map safety check	2016-05-24 18:12:59 -07:00
Alex Dadgar	6deadf1ccd	Evals track blocked evals they create	2016-05-19 13:09:52 -07:00
Alex Dadgar	96ab783b3f	Scheduler no longer produces failed allocations; failed alloc metrics stored in evaluation	2016-05-18 18:11:40 -07:00
Ivo Verberk	905742249e	Refactoring continued * Refactor other cli commands to new design * Add PrefixList method to api package * Add more tests	2015-12-24 20:53:37 +01:00
Ryan Uber	06aa0119e4	api: sort all list responses	2015-09-17 13:10:20 -07:00
Ryan Uber	d2597982ad	api: use stub structs	2015-09-13 20:02:22 -07:00
Ryan Uber	d013410adf	api: working on evaluations	2015-09-09 13:48:56 -07:00
Ryan Uber	689fa2bb11	api: finishing jobs	2015-09-08 18:42:34 -07:00

23 Commits