Commit Graph

11 Commits

Author SHA1 Message Date
Seth Hoenig
6e4d57b330 numalib: provide a fallback for topology scanning on linux (#19457)
* numalib: provide a fallback for topology scanning on linux

* numalib: better package var names

* cl: add cl

* lint: fix my sloppy code

* cl: fixup wording
2023-12-13 13:06:30 -06:00
Piotr Kazmierczak
b6dd376100 numa: account for incorrect core number on topology.insert (#19383)
Unsupported environments like containers or guests OSs inside LXD can
incorrectly number of available cores thus leading to numalib having trouble
detecting cores and panicking. This code adds tests for linux sysfs detection
methods and fixes the panic.
2023-12-13 17:40:26 +01:00
Seth Hoenig
1604dba508 client: fingerprint cpu on raspberry pi (#18982)
This PR tweaks the linux cpu fingerprinter to handle the case where no
NUMA node data is found under /sys/devices/system/, in which case we
need to assume just one node, one socket.
2023-11-02 15:52:37 -05:00
Seth Hoenig
5b56a5c5d1 client: fix cpu core/freq calculation on intel macs (#18934) 2023-11-01 07:16:26 -05:00
Seth Hoenig
0020139440 core: port common code changes from ENT for numa scheduling (#18818)
Some additional changes were made in the ENT PR to the common code in
support of numa scheduling; this PR copies those changes back to CE.
2023-10-20 13:19:02 -05:00
Seth Hoenig
83720740f5 core: plumbing to support numa aware scheduling (#18681)
* core: plumbing to support numa aware scheduling

* core: apply node resources compatibility upon fsm rstore

Handle the case where an upgraded server dequeus an evaluation before
a client triggers a new fingerprint - which would be needed to cause
the compatibility fix to run. By running the compat fix on restore the
server will immediately have the compatible pseudo topology to use.

* lint: learn how to spell pseudo
2023-10-19 15:09:30 -05:00
Seth Hoenig
591394fb62 drivers: plumb hardware topology via grpc into drivers (#18504)
* drivers: plumb hardware topology via grpc into drivers

This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.

* cr: use enum instead of bool for core grade

* cr: fix test slit tables to be possible
2023-09-18 08:58:07 -05:00
Seth Hoenig
2e1974a574 client: refactor cpuset partitioning (#18371)
* client: refactor cpuset partitioning

This PR updates the way Nomad client manages the split between tasks
that make use of resources.cpus vs. resources.cores.

Previously, each task was explicitly assigned which CPU cores they were
able to run on. Every time a task was started or destroyed, all other
tasks' cpusets would need to be updated. This was inefficient and would
crush the Linux kernel when a client would try to run ~400 or so tasks.

Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently
manage cpusets.

* cr: tweaks for feedback
2023-09-12 09:11:11 -05:00
Seth Hoenig
8833452d44 followup to numa/cgroups refactor (#18214)
* lang: note that Stack is not concurrency-safe

* client: use more descriptive name for wrangler hook in logs

* numalib: use correct name for receiver parameter
2023-08-15 14:12:17 -05:00
hashicorp-copywrite[bot]
2d35e32ec9 Update copyright file headers to BUSL-1.1 2023-08-10 17:27:15 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00