nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Piotr Kazmierczak	648bacda77	testing: migrate nomad/scheduler off of testify (#25968 ) In the spirit of #25909, this PR removes testify dependencies from the scheduler package, along with reflect.DeepEqual removal. This is again a combination of semgrep and hx editing magic. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-06-04 09:29:28 +02:00
tehut	55523ecf8e	Add NodeMaxAllocations to client configuration (#25785 ) * Set MaxAllocations in client config Add NodeAllocationTracker struct to Node struct Evaluate MaxAllocations in AllocsFit function Set up cli config parsing Integrate maxAllocs into AllocatedResources view Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-05-22 12:49:27 -07:00
Tim Gross	456d95a19e	scheduler: account for affinity value of zero in score normalization (#25800 ) If there are no affinities on a job, we don't want to count an affinity score of zero in the number of scores we divide the normalized score by. This is how we handle other scoring components like node reschedule penalties on nodes that weren't running the previous allocation. But we also exclude counting the affinity in the case where we have affinity but the value is zero. In pathological cases, this can result in a node with a low affinity being picked over a node with no affinity, because the denominator is 1 larger. Include zero-value affinities in the count of scores if the job has affinities but the value just happens to be zero. Fixes: https://github.com/hashicorp/nomad/issues/25621	2025-05-19 14:10:00 -04:00
Seth Hoenig	8b093a6a5d	scheduler: support for device - aware numa scheduling (#1760 ) (#23837 ) (CE backport of ENT 59433d56c7215c0b8bf33764f41b57d9bd30160f (without ent files)) * scheduler: enhance numa aware scheduling with support for devices * cr: add comments	2024-08-20 07:53:04 -05:00
Tim Gross	b25f1b66ce	resources: allow job authors to configure size of secrets tmpfs (#23696 ) On supported platforms, the secrets directory is a 1MiB tmpfs. But some tasks need larger space for downloading large secrets. This is especially the case for tasks using `templates`, which need extra room to write a temporary file to the secrets directory that gets renamed to the old file atomically. This changeset allows increasing the size of the tmpfs in the `resources` block. Because this is a memory resource, we need to include it in the memory we allocate for scheduling purposes. The task is already prevented from using more memory in the tmpfs than the `resources.memory` field allows, but can bypass that limit by writing to the tmpfs via `template` or `artifact` blocks. Therefore, we need to account for the size of the tmpfs in the allocation resources. Simply adding it to the memory needed when we create the allocation allows it to be accounted for in all downstream consumers, and then we'll subtract that amount from the memory resources just before configuring the task driver. For backwards compatibility, the default value of 1MiB is "free" and ignored by the scheduler. Otherwise we'd be increasing the allocated resources for every existing alloc, which could cause problems across upgrades. If a user explicitly sets `resources.secrets = 1` it will no longer be free. Fixes: https://github.com/hashicorp/nomad/issues/2481 Ref: https://hashicorp.atlassian.net/browse/NET-10070	2024-08-05 16:06:58 -04:00
Tim Gross	7d73065066	numa: fix scheduler panic due to topology serialization bug (#23284 ) The NUMA topology struct field `NodeIDs` is a `idset.Set`, which has no public members. As a result, this field is never serialized via msgpack and persisted in state. When `numa.affinity = "prefer"`, the scheduler dereferences this nil field and panics the scheduler worker. Ideally we would fix this by adding a msgpack serialization extension, but because the field already exists and is just always empty, this breaks RPC wire compatibility across upgrades. Instead, create a new field that's populated at the same time we populate the more useful `idset.Set`, and repopulate the set on demand. Fixes: https://hashicorp.atlassian.net/browse/NET-9924	2024-06-11 08:55:00 -04:00
Gabi	ca22f34373	fix exhausted node metrics reporting in preemption (#20346 )	2024-04-11 14:49:56 -04:00
Seth Hoenig	0020139440	core: port common code changes from ENT for numa scheduling (#18818 ) Some additional changes were made in the ENT PR to the common code in support of numa scheduling; this PR copies those changes back to CE.	2023-10-20 13:19:02 -05:00
Seth Hoenig	83720740f5	core: plumbing to support numa aware scheduling (#18681 ) * core: plumbing to support numa aware scheduling * core: apply node resources compatibility upon fsm rstore Handle the case where an upgraded server dequeus an evaluation before a client triggers a new fingerprint - which would be needed to cause the compatibility fix to run. By running the compat fix on restore the server will immediately have the compatible pseudo topology to use. * lint: learn how to spell pseudo	2023-10-19 15:09:30 -05:00
hashicorp-copywrite[bot]	a9d61ea3fd	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:29 -05:00
Luiz Aoqui	f4c7182873	node pools: apply node pool scheduler configuration (#17598 )	2023-06-21 20:31:50 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Michael Schurter	f998a2b77b	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Luiz Aoqui	8a427a470a	scheduler: detect and log unexpected scheduling collisions (#11793 )	2022-01-14 20:09:14 -05:00
Mahmood Ali	28b8767b27	Allow configuring memory oversubscription (#10466 ) Cluster operators want to have better control over memory oversubscription and may want to enable/disable it based on their experience. This PR adds a scheduler configuration field to control memory oversubscription. It's additional field that can be set in the [API via Scheduler Config](https://www.nomadproject.io/api-docs/operator/scheduler), or [the agent server config](https://www.nomadproject.io/docs/configuration/server#configuring-scheduler-config). I opted to have the memory oversubscription be an opt-in, but happy to change it. To enable it, operators should call the API with: ```json { "MemoryOversubscriptionEnabled": true } ``` If memory oversubscription is disabled, submitting jobs specifying `memory_max` will get a "Memory oversubscription is not enabled" warnings, but the jobs will be accepted without them accessing the additional memory. The warning message is like: ``` $ nomad job run /tmp/j Job Warnings: 1 warning(s): * Memory oversubscription is not enabled; Task cache.redis memory_max value will be ignored ==> Monitoring evaluation "7c444157" Evaluation triggered by job "example" ==> Monitoring evaluation "7c444157" Evaluation within deployment: "9d826f13" Allocation "aa5c3cad" created: node "9272088e", group "cache" Evaluation status changed: "pending" -> "complete" ==> Evaluation "7c444157" finished with status "complete" # then you can examine the Alloc AllocatedResources to validate whether the task is allowed to exceed memory: $ nomad alloc status -json aa5c3cad \| jq '.AllocatedResources.Tasks["redis"].Memory' { "MemoryMB": 256, "MemoryMaxMB": 0 } ```	2021-04-29 22:09:56 -04:00
Andrii Chubatiuk	6f7171d50f	add support for host network interpolation	2021-04-13 09:53:05 -04:00
Nick Ethier	94c7aec159	scheduler: implement scheduling of reserved cores	2021-03-19 00:29:07 -04:00
Drew Bailey	7ce0b5017c	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Nick Ethier	ad8ced3873	multi-interface network support	2020-06-19 09:42:10 -04:00
Mahmood Ali	60aa516db8	missed fixing one invocation	2020-05-01 13:38:46 -04:00
Mahmood Ali	5078e0cfed	tests and some clean up	2020-05-01 13:13:30 -04:00
Michael Schurter	2781717ad6	core: fix node reservation scoring The BinPackIter accounted for node reservations twice when scoring nodes which could bias scores toward nodes with reservations. Pseudo-code for previous algorithm: ``` proposed = reservedResources + sum(allocsResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are added to the total resources used by allocations, and then the node's reserved resources are later substracted from the node's overall resources. The new algorithm is: ``` proposed = sum(allocResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are no longer added to the total resources used by allocations. My guess as to how this bug happened is that the resource utilization variable (`util`) is calculated and returned by the `AllocsFit` function which needs to take reserved resources into account as a basic feasibility check. To avoid re-calculating alloc resource usage (because there may be a large number of allocs), we reused `util` in the `ScoreFit` function. `ScoreFit` properly accounts for reserved resources by subtracting them from the node's overall resources. However since `util` _also_ took reserved resources into account the score would be incorrect. Prior to the fix the added test output: ``` Node: reserved Score: 1.0000 Node: reserved2 Score: 1.0000 Node: no-reserved Score: 0.9741 ``` The scores being 1.0 for both nodes with reserved resources is a good hint something is wrong as they should receive different scores. Upon further inspection the double accounting of reserved resources caused their scores to be >1.0 and clamped. After the fix the added test outputs: ``` Node: no-reserved Score: 0.9741 Node: reserved Score: 0.9480 Node: reserved2 Score: 0.8717 ```	2020-04-15 15:13:30 -07:00
Michael Schurter	f12bfdb193	scheduler: update tests with modern error helper	2019-12-02 20:25:52 -08:00
Preetha Appan	5a1dd79179	Code review feedback	2019-07-31 01:04:08 -04:00
Preetha Appan	b561816343	Scheduler changes to support network at task group level Also includes unit tests for binpacker and preemption. The tests verify that network resources specified at the task group level are properly accounted for	2019-07-31 01:04:08 -04:00
Alex Dadgar	bc42873e07	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Alex Dadgar	8264f50c52	convert driver to device for device constraint/attributes	2019-01-23 10:58:45 -08:00
Preetha Appan	66670c0f02	Fixes device scheduling unit tests Also changes the logic for score when there is more than one task requesting a device. Since inter task affinities are already normalized, we take the average of the scores across tasks.	2018-11-08 10:31:19 -06:00
Alex Dadgar	ad4c26a1e3	review comments	2018-11-07 11:31:52 -08:00
Alex Dadgar	7ab3ce4bde	affinities	2018-11-07 10:32:03 -08:00
Alex Dadgar	77ad27de60	assign devices	2018-11-07 10:32:03 -08:00
Alex Dadgar	e30b20e65e	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	0f2f4797cb	fixing tests	2018-10-04 14:26:19 -07:00
Preetha Appan	5cd8d1fe82	Some minor changes from code review	2018-09-04 16:10:11 -05:00
Preetha Appan	a236342caa	Address some review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	00924555a8	Implement affinity support in generic scheduler	2018-09-04 16:10:11 -05:00
Preetha Appan	c6c0741bd8	Add helper methods, use require and other code review feedback	2018-01-31 09:56:53 -06:00
Preetha Appan	5ecb7895bb	Reschedule previous allocs and track their reschedule attempts	2018-01-31 09:56:53 -06:00
Michael Schurter	04b8f8e7fc	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	a9e3a41407	Enable more linters	2017-09-26 15:26:33 -07:00
Alex Dadgar	ac1539d5d9	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	741a71e0b3	Fix tests	2017-05-01 13:54:26 -07:00
Diptanu Choudhury	396e45629b	Renaming LocalDisk to EphemeralDisk (#1710 ) Renaming LocalDisk to EphemeralDisk	2016-09-14 15:43:42 -07:00
Diptanu Choudhury	7da66e169c	Making the scheduler use LocalDisk instead of Resources.DiskMB	2016-08-25 12:27:42 -05:00
Diptanu Choudhury	230a59ca16	Fixed some more tests	2016-07-25 17:26:38 -07:00
Alex Dadgar	bffcf5bd9f	ProposedAllocs dedups in-place updated allocations	2016-03-21 18:09:32 -07:00
Alex Dadgar	bc13dcaf48	merge	2015-12-16 15:01:15 -08:00
Alex Dadgar	17bc13bfe5	Add garbage collection to jobs	2015-12-16 15:00:45 -08:00
Armon Dadgar	924bf123a1	scheduler: binpacker makes network offers	2015-09-13 14:31:32 -07:00
Armon Dadgar	8a02dbc481	Use a single implementation of GenerateUUID	2015-09-07 15:23:03 -07:00

1 2

61 Commits