nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-05 01:45:44 +03:00

Author	SHA1	Message	Date
Jorge Marey	25426f0777	fingerprint: add config option to disable dmidecode (#25108 )	2025-02-13 11:20:48 -05:00
Piotr Kazmierczak	0bc9796d3b	client: log an error message if total detected cpu is zero (#23827 )	2024-08-15 18:31:27 +02:00
Tim Gross	7d73065066	numa: fix scheduler panic due to topology serialization bug (#23284 ) The NUMA topology struct field `NodeIDs` is a `idset.Set`, which has no public members. As a result, this field is never serialized via msgpack and persisted in state. When `numa.affinity = "prefer"`, the scheduler dereferences this nil field and panics the scheduler worker. Ideally we would fix this by adding a msgpack serialization extension, but because the field already exists and is just always empty, this breaks RPC wire compatibility across upgrades. Instead, create a new field that's populated at the same time we populate the more useful `idset.Set`, and repopulate the set on demand. Fixes: https://hashicorp.atlassian.net/browse/NET-9924	2024-06-11 08:55:00 -04:00
Seth Hoenig	83720740f5	core: plumbing to support numa aware scheduling (#18681 ) * core: plumbing to support numa aware scheduling * core: apply node resources compatibility upon fsm rstore Handle the case where an upgraded server dequeus an evaluation before a client triggers a new fingerprint - which would be needed to cause the compatibility fix to run. By running the compat fix on restore the server will immediately have the compatible pseudo topology to use. * lint: learn how to spell pseudo	2023-10-19 15:09:30 -05:00
Seth Hoenig	591394fb62	drivers: plumb hardware topology via grpc into drivers (#18504 ) * drivers: plumb hardware topology via grpc into drivers This PR swaps out the temporary use of detecting system hardware manually in each driver for using the Client's detected topology by plumbing the data over gRPC. This ensures that Client configuration is taken to account consistently in all references to system topology. * cr: use enum instead of bool for core grade * cr: fix test slit tables to be possible	2023-09-18 08:58:07 -05:00
Seth Hoenig	2e1974a574	client: refactor cpuset partitioning (#18371 ) * client: refactor cpuset partitioning This PR updates the way Nomad client manages the split between tasks that make use of resources.cpus vs. resources.cores. Previously, each task was explicitly assigned which CPU cores they were able to run on. Every time a task was started or destroyed, all other tasks' cpusets would need to be updated. This was inefficient and would crush the Linux kernel when a client would try to run ~400 or so tasks. Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently manage cpusets. * cr: tweaks for feedback	2023-09-12 09:11:11 -05:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Seth Hoenig	a4cc76bd3e	numa: enable numa topology detection (#18146 ) * client: refactor cgroups management in client * client: fingerprint numa topology * client: plumb numa and cgroups changes to drivers * client: cleanup task resource accounting * client: numa client and config plumbing * lib: add a stack implementation * tools: remove ec2info tool * plugins: fixup testing for cgroups / numa changes * build: update makefile and package tests and cl	2023-08-10 17:05:30 -05:00
Patric Stout	e190eae395	Use config "cpu_total_compute" (if set) for all CPU statistics (#17628 ) Before this commit, it was only used for fingerprinting, but not for CPU stats on nodes or tasks. This meant that if the auto-detection failed, setting the cpu_total_compute didn't resolved the issue. This issue was most noticeable on ARM64, as there auto-detection always failed.	2023-07-19 13:30:47 -05:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Seth Hoenig	fd900d0723	client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips (#16672 ) * client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips This PR adds detection of asymetric core types (Power & Efficiency) (P/E) when running on M1/M2 Apple Silicon CPUs. This functionality is provided by shoenig/go-m1cpu which makes use of the Apple IOKit framework to read undocumented registers containing CPU performance data. Currently working on getting that functionality merged upstream into gopsutil, but gopsutil would still not support detecting P vs E cores like this PR does. Also refactors the CPUFingerprinter code to handle the mixed core types, now setting power vs efficiency cpu attributes. For now the scheduler is still unaware of mixed core types - on Apple platforms tasks cannot reserve cores anyway so it doesn't matter, but at least now the total CPU shares available will be correct. Future work should include adding support for detecting P/E cores on the latest and upcoming Intel chips, where computation of total cpu shares is currently incorrect. For that, we should also include updating the scheduler to be core-type aware, so that tasks of resources.cores on Linux platforms can be assigned the correct number of CPU shares for the core type(s) they have been assigned. node attributes before cpu.arch = arm64 cpu.modelname = Apple M2 Pro cpu.numcores = 12 cpu.reservablecores = 0 cpu.totalcompute = 1000 node attributes after cpu.arch = arm64 cpu.frequency.efficiency = 2424 cpu.frequency.power = 3504 cpu.modelname = Apple M2 Pro cpu.numcores.efficiency = 4 cpu.numcores.power = 8 cpu.reservablecores = 0 cpu.totalcompute = 37728 * fingerprint/cpu: follow up cr items	2023-03-28 08:27:58 -05:00
Michael Schurter	2e059c624f	fingerprint: add node attr for reserverable cores (#14694 ) * fingerprint: add node attr for reserverable cores Add an attribute for the number of reservable CPU cores as they may differ from the existing `cpu.numcores` due to client configuration or OS support. Hopefully clarifies some confusion in #14676 * add changelog * num_reservable_cores -> reservablecores	2022-09-26 13:03:03 -07:00
Nick Ethier	f897ac79e8	client/ar: thread through cpuset manager	2021-04-13 13:28:36 -04:00
Nick Ethier	03d6eb8205	client: only fingerprint reservable cores via cgroups, allowing manual override for other platforms	2021-04-13 13:28:15 -04:00
Nick Ethier	b8397a712d	fingerprint: implement client fingerprinting of reservable cores on Linux systems this is derived from the configure cpuset cgroup parent (defaults to /nomad) for non Linux systems and Linux systems where cgroups are not enabled, the client defaults to using all cores	2021-04-13 13:28:15 -04:00
Joel May	2e17610406	Allow client.cpu_total_compute to override attr.cpu.totalcompute	2021-01-07 15:31:11 -05:00
Seth Hoenig	da1235f35b	client/fingerprint/cpu: use fallback total compute value if cpu not detected Previously, Nomad would fail to startup if the CPU fingerprinter could not detect the cpu total compute (i.e. cores * mhz). This is common on some EC2 instance types (graviton class), where the env_aws fingerprinter will override the detected CPU performance with a more accurate value anyway. Instead of crashing on startup, have Nomad use a low default for available cpu performance of 1000 ticks (e.g. 1 core * 1 GHz). This enables Nomad to get past the useless cpu fingerprinting on those EC2 instances. The crashing error message is now a log statement suggesting the setting of cpu_total_compute in client config. Fixes #7989	2020-12-09 10:35:58 -06:00
Danielle Tomlinson	da48a7eab3	client: Move fingerprint structs to pkg This removes a cyclical dependency when importing client/structs from dependencies of the plugin_loader, specifically, drivers. Due to client/config also depending on the plugin_loader. It also better reflects the ownership of fingerprint structs, as they are fairly internal to the fingerprint manager.	2018-12-01 17:10:39 +01:00
Alex Dadgar	9a2c2a4f68	client uses passed logger and fix fingerprinters	2018-10-16 16:53:30 -07:00
Alex Dadgar	5e67b37aad	use int64	2018-10-16 15:34:32 -07:00
Preetha Appan	3ca71ae935	Change CPU/Disk/MemoryMB to int everywhere in new resource structs	2018-10-16 16:21:42 -05:00
Alex Dadgar	e30b20e65e	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	b310a54aa6	Node resources on client	2018-09-29 17:23:41 -07:00
Chelsea Holland Komlo	ba2ebbc7f9	code review fixup	2018-01-31 18:34:03 -05:00
Chelsea Holland Komlo	a9447addd3	add applicable boolean to fingerprint response public fields and remove getter functions	2018-01-31 13:21:45 -05:00
Chelsea Holland Komlo	f5fc20a564	create safe getters and setters for fingerprint response	2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo	5e8151d700	refactor Fingerprint to request/response construct	2018-01-24 11:54:02 -05:00
Michael Schurter	ca38020521	0 compute == error	2017-07-03 14:51:02 -07:00
Michael Schurter	c10f530964	Fix cpu_total_compute override	2017-07-03 14:51:02 -07:00
Alex Dadgar	bfebe1afdc	rename cpu_total_compute and docs	2017-03-14 14:15:49 -07:00
Alex Dadgar	36dc330737	Various fixes This PR: * Uses Go 1.8 executable lookup * Stores any err message from stats init method * Allows overriding of Cpu Compute for hosts where it can't be detected	2017-03-14 12:56:31 -07:00
Kenjiro Nakayama	5e4dbd0ff3	tiny: Fix duplicated error message in CPU fingerprint	2016-08-07 12:49:40 +09:00
Alex Dadgar	f2e28735a5	Treat float as int	2016-06-22 15:09:39 -07:00
Alex Dadgar	d87d988491	Floor CPU MHz and total compute and mark hostname as unique	2016-06-22 15:01:36 -07:00
Sean Chittenden	e26606acfd	Memoize the CPU stats. Error if CPU fingerprinting fails.	2016-06-17 12:13:53 -07:00
Sean Chittenden	e42f7d5c23	Record and use only the first Mhz from the CPU fingerprinter. Assume all cores are the same speed.	2016-06-17 11:06:57 -07:00
Sean Chittenden	b0490efb38	In the debug log, split the unit from the measurement awk(1) friendly is UNIX(tm) friendly.	2016-06-16 23:07:13 -07:00
Sean Chittenden	e0b4f7a080	Warn when we're unable to fingerprint the CPU Mhz	2016-06-16 23:07:13 -07:00
Sean Chittenden	a6dc002415	Explicitly call `cpu.Counts()` to determine the CPU core count Much safer than counting the number of InfoStat structs returned.	2016-06-16 23:07:13 -07:00
Diptanu Choudhury	445b181fec	Updated gopsutil	2016-05-28 19:42:34 -07:00
Sean Chittenden	a91f41ce14	Establish a floor of one core for the number of cores. In most cases the upstream library [shirou/gopsutil](https://github.com/shirou/gopsutil) needs to be fixed.	2016-05-09 12:22:40 -07:00
Sean Chittenden	fd9bcabaa8	Emit various debugging information with the results of the fingerprinter	2016-05-09 12:21:51 -07:00
Alex Dadgar	5b067a3e4f	Merge fix	2015-11-05 13:46:02 -08:00
Armon Dadgar	b81105bd09	Change CPU from float64 to int	2015-09-23 11:14:32 -07:00
Chris Bednarski	ca7798268e	Get average frequency of all CPUs so we can do average frequency * cores for total compute	2015-08-27 13:35:54 -07:00
Clint	d05d878dfc	Merge pull request #6 from hashicorp/cpu-resources populate CPU in Node Resources	2015-08-27 15:26:00 -05:00
Chris Bednarski	9a12a00966	Merge pull request #4 from hashicorp/f-storage-fingerprint Add storage fingerprinter	2015-08-27 12:43:18 -07:00
Chris Bednarski	6804ec7450	Changed logs to errors; added data to node.Resources.DiskMB	2015-08-27 12:23:17 -07:00
Clint Shryock	050ee19547	populate CPU in Node Resources	2015-08-27 14:15:56 -05:00
Clint Shryock	4e5dcf5c43	Add cpu.frequency, cpu.totalcompute	2015-08-27 09:19:53 -05:00

1 2

51 Commits