nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Piotr Kazmierczak	648bacda77	testing: migrate nomad/scheduler off of testify (#25968 ) In the spirit of #25909, this PR removes testify dependencies from the scheduler package, along with reflect.DeepEqual removal. This is again a combination of semgrep and hx editing magic. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-04 09:29:28 +02:00
Tim Gross	67967c99a7	scheduler: stack test should use job.ID and not job.Name (#23169 ) Some of our scheduler tests use the `AllocName` function from the structs package incorrectly. This function should always receive the `Job.ID` and not the `Job.Name`. Fix this to prevent future bugs from copy-pasting usage around.	2024-06-05 08:34:04 -04:00
Seth Hoenig	4d83733909	tests: swap testify for test in more places (#20028 ) * tests: swap testify for test in plugins/csi/client_test.go * tests: swap testify for test in testutil/ * tests: swap testify for test in host_test.go * tests: swap testify for test in plugin_test.go * tests: swap testify for test in utils_test.go * tests: swap testify for test in scheduler/ * tests: swap testify for test in parse_test.go * tests: swap testify for test in attribute_test.go * tests: swap testify for test in plugins/drivers/ * tests: swap testify for test in command/ * tests: fixup some test usages * go: run go mod tidy * windows: cpuset test only on linux	2024-02-29 12:11:35 -06:00
Luiz Aoqui	0ccf942b26	scheduler: fix host volume feasibility check (#18679 ) Host volumes were considered regular feasibility checks. This had two unintended consequences. The first happened when scheduling an allocation with a host volume on a set of nodes with the same computed class but where only some of them had the desired host volume. If the first node evaluated did not have the host volume, the entire node class was considered ineligible for the task group. ```go // Run the job feasibility checks. for _, check := range w.jobCheckers { feasible := check.Feasible(option) if !feasible { // If the job hasn't escaped, set it to be ineligible since it // failed a job check. if !jobEscaped { evalElig.SetJobEligibility(false, option.ComputedClass) } continue OUTER } } ``` This results in all nodes with the same computed class to be skipped, even if they do have the desired host volume. ```go switch evalElig.JobStatus(option.ComputedClass) { case EvalComputedClassIneligible: // Fast path the ineligible case metrics.FilterNode(option, "computed class ineligible") continue ``` The second consequence is somewhat the opposite. When an allocation has a host volume with `per_alloc = true` the node must have a host volume that matches the allocation index, so each allocation is likely to be placed in different nodes. But when the first allocation found a node match, it registered the node class as eligible for the task group. ```go // Set the task group eligibility if the constraints weren't escaped and // it hasn't been set before. if !tgEscaped && tgUnknown { evalElig.SetTaskGroupEligibility(true, w.tg, option.ComputedClass) } ``` This could cause other allocations to be placed on nodes without the expected host volume because of the computed node class fast path. The node feasibility for the volume was never checked. ```go case EvalComputedClassEligible: // Fast path the eligible case if w.available(option) { return option } // We match the class but are temporarily unavailable continue OUTER ``` These problems did not happen with CSI volumes kind of accidentally. Since the `CSIVolumeChecker` was not placed in the `tgCheckers` list it did not cause the node class to be considered ineligible on failure (avoiding the first problem). And, as illustrated in the code snippet above, the eligible node class fast path checks `tgAvailable` (where `CSIVolumeChecker` is placed) before returning the option (avoiding the second problem). By also placing `HostVolumeChecker` in the `tgAvailable` list instead of `tgCheckers` we also avoid these problems on host volume feasibility.	2023-10-06 11:00:48 -04:00
hashicorp-copywrite[bot]	a9d61ea3fd	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:29 -05:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Seth Hoenig	b242957990	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Tim Gross	7d0f87b910	CSI: allow updates to volumes on re-registration (#12167 ) CSI `CreateVolume` RPC is idempotent given that the topology, capabilities, and parameters are unchanged. CSI volumes have many user-defined fields that are immutable once set, and many fields that are not user-settable. Update the `Register` RPC so that updating a volume via the API merges onto any existing volume without touching Nomad-controlled fields, while validating it with the same strict requirements expected for idempotent `CreateVolume` RPCs. Also, clarify that this state store method is used for everything, not just for the `Register` RPC.	2022-03-07 11:06:59 -05:00
Seth Hoenig	61ee443ee6	core: implement system batch scheduler This PR implements a new "System Batch" scheduler type. Jobs can make use of this new scheduler by setting their type to 'sysbatch'. Like the name implies, sysbatch can be thought of as a hybrid between system and batch jobs - it is for running short lived jobs intended to run on every compatible node in the cluster. As with batch jobs, sysbatch jobs can also be periodic and/or parameterized dispatch jobs. A sysbatch job is considered complete when it has been run on all compatible nodes until reaching a terminal state (success or failed on retries). Feasibility and preemption are governed the same as with system jobs. In this PR, the update stanza is not yet supported. The update stanza is sill limited in functionality for the underlying system scheduler, and is not useful yet for sysbatch jobs. Further work in #4740 will improve support for the update stanza and deployments. Closes #2527	2021-08-03 10:30:47 -04:00
Tim Gross	7c7569674c	CSI: unique volume per allocation Add a `PerAlloc` field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. `[0]`), rather than the exact source ID. Read the `PerAlloc` field when making the volume claim at the client to determine if the allocation index suffix (ex. `[0]`) should be added to the volume source ID.	2021-03-18 15:35:11 -04:00
Drew Bailey	7ce0b5017c	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Lang Martin	9f850016bd	csi: fix index maintenance for CSIVolume and CSIPlugin tables (#7049 ) * state_store: csi volumes/plugins store the index in the txn * nomad: csi_endpoint_test require index checks need uint64() * nomad: other tests using int 0 not uint64(0) * structs: pass index into New, but not other struct methods * state_store: csi plugin indexes, use new struct interface * nomad: csi_endpoint_test check index/query meta (on explicit 0) * structs: NewCSIVolume takes an index arg now * scheduler/test: NewCSIVolume takes an index arg now	2020-03-23 13:58:29 -04:00
Lang Martin	f370e25843	CSI: Scheduler knows about CSI constraints and availability (#6995 ) * structs: piggyback csi volumes on host volumes for job specs * state_store: CSIVolumeByID always includes plugins, matches usecase * scheduler/feasible: csi volume checker * scheduler/stack: add csi volumes * contributing: update rpc checklist * scheduler: add volumes to State interface * scheduler/feasible: introduce new checker collection tgAvailable * scheduler/stack: taskGroupCSIVolumes checker is transient * state_store CSIVolumeDenormalizePlugins comment clarity * structs: remote TODO comment in TaskGroup Validate * scheduler/feasible: CSIVolumeChecker hasPlugins improve comment * scheduler/feasible_test: set t.Parallel * Update nomad/state/state_store.go Co-Authored-By: Danielle <dani@hashicorp.com> * Update scheduler/feasible.go Co-Authored-By: Danielle <dani@hashicorp.com> * structs: lift ControllerRequired to each volume * state_store: store plug.ControllerRequired, use it for volume health * feasible: csi match fast path remove stale host volume copied logic * scheduler/feasible: improve comments Co-authored-by: Danielle <dani@builds.terrible.systems>	2020-03-23 13:58:29 -04:00
Alex Dadgar	e30b20e65e	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	0f2f4797cb	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	49c2d4f775	Scheduler uses allocated resources	2018-10-02 17:08:25 -07:00
Preetha Appan	f6cbfbfef6	Track top k nodes by norm score rather than top k nodes per scorer	2018-09-04 16:10:11 -05:00
Preetha Appan	4d68d935e4	Use heap to store top K scoring nodes. Scoring metadata is now aggregated by scorer type to make it easier to parse when reading it in the CLI.	2018-09-04 16:10:11 -05:00
Preetha Appan	00924555a8	Implement affinity support in generic scheduler	2018-09-04 16:10:11 -05:00
Preetha Appan	4cbef07d37	Prevent side effect modification of select options when preferred nodes are set	2018-01-31 09:56:53 -06:00
Preetha Appan	5ecb7895bb	Reschedule previous allocs and track their reschedule attempts	2018-01-31 09:56:53 -06:00
Diptanu Choudhury	a6e0077f72	Implemented SetPrefferingNodes in stack	2016-08-30 16:17:50 -07:00
Alex Dadgar	8b55fd6b70	Only interpret vars wrapped in braces	2016-02-04 17:26:46 -08:00
Alex Dadgar	97070f44f8	Add benchmark	2016-01-26 15:16:43 -08:00
Alex Dadgar	0ad3575897	FeasibilityWrapper uses computed node class eligibility to call feasibility checks minimally	2016-01-26 15:16:43 -08:00
Chris Hines	597194fb96	Skip unreliable time measurement assertions on Windows.	2015-12-09 14:55:57 -05:00
Alex Dadgar	2405101328	Remove base nodes from stack constructors	2015-10-16 17:05:23 -07:00
Alex Dadgar	b24f48a4ed	System scheduler and system stack	2015-10-14 18:39:44 -07:00
Armon Dadgar	b81105bd09	Change CPU from float64 to int	2015-09-23 11:14:32 -07:00
Armon Dadgar	40b84e3023	scheduler: recompute scan limit on SetNodes	2015-09-11 12:03:41 -07:00
Armon Dadgar	efdf717991	scheduler: allow updating the base nodes	2015-09-07 11:30:13 -07:00
Chris Bednarski	3164401ffd	Update references to "os" to use "kernel.name" This brings test code and mocks up to date with the fingerprinter. This was a slightly larger change than I anticipated, but I think it's good for two reasons: 1. More semanitcally correct. `os.name` is something like "Windows 10 Pro" or "Ubuntu", while `kernel.name` is "windows" or "linux". `os.version` and `kernel.version` match these semantics. 2. `kernel.name` is much easier to grep for than `os`, which is helpful because oracle can't help us with strings.	2015-08-28 01:30:47 -07:00
Armon Dadgar	6a39f5b5da	scheduler: adding minor specialization for batch	2015-08-13 22:35:48 -07:00
Armon Dadgar	a5b1f16a5e	scheduler: test all the metrics	2015-08-13 21:58:55 -07:00
Armon Dadgar	9d1c52bf66	scheduler: testing service stack	2015-08-13 21:55:37 -07:00
Armon Dadgar	25a60ebd81	scheduler: adding tests	2015-08-13 18:51:08 -07:00

36 Commits