nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 09:25:46 +03:00

Author	SHA1	Message	Date
Seth Hoenig	591394fb62	drivers: plumb hardware topology via grpc into drivers (#18504 ) * drivers: plumb hardware topology via grpc into drivers This PR swaps out the temporary use of detecting system hardware manually in each driver for using the Client's detected topology by plumbing the data over gRPC. This ensures that Client configuration is taken to account consistently in all references to system topology. * cr: use enum instead of bool for core grade * cr: fix test slit tables to be possible	2023-09-18 08:58:07 -05:00
Tim Gross	f00bff09f1	fix multiple overflow errors in exponential backoff (#18200 ) We use capped exponential backoff in several places in the code when handling failures. The code we've copy-and-pasted all over has a check to see if the backoff is greater than the limit, but this check happens after the bitshift and we always increment the number of attempts. This causes an overflow with a fairly small number of failures (ex. at one place I tested it occurs after only 24 iterations), resulting in a negative backoff which then never recovers. The backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC handler or an external API such as Vault. Note this doesn't occur in places where we cap the number of iterations so the loop breaks (usually to return an error), so long as the number of iterations is reasonable. Introduce a helper with a check on the cap before the bitshift to avoid overflow in all places this can occur. Fixes: #18199 Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>	2023-08-15 14:38:18 -04:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Seth Hoenig	a4cc76bd3e	numa: enable numa topology detection (#18146 ) * client: refactor cgroups management in client * client: fingerprint numa topology * client: plumb numa and cgroups changes to drivers * client: cleanup task resource accounting * client: numa client and config plumbing * lib: add a stack implementation * tools: remove ec2info tool * plugins: fixup testing for cgroups / numa changes * build: update makefile and package tests and cl	2023-08-10 17:05:30 -05:00
Seth Hoenig	ec4fa55bbf	drivers/docker: refactor use of clients in docker driver (#17731 ) * drivers/docker: refactor use of clients in docker driver This PR refactors how we manage the two underlying clients used by the docker driver for communicating with the docker daemon. We keep two clients - one with a hard-coded timeout that applies to all operations no matter what, intended for use with short lived / async calls to docker. The other has no timeout and is the responsibility of the caller to set a context that will ensure the call eventually terminates. The use of these two clients has been confusing and mistakes were made in a number of places where calls were making use of the wrong client. This PR makes it so that a user must explicitly call a function to get the client that makes sense for that use case. Fixes #17023 * cr: followup items	2023-06-26 15:21:42 -05:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Seth Hoenig	c1e033c8c6	cleanup: prevent leaks from time.After This PR replaces use of time.After with a safe helper function that creates a time.Timer to use instead. The new function returns both a time.Timer and a Stop function that the caller must handle. Unlike time.NewTimer, the helper function does not panic if the duration set is <= 0.	2022-02-02 14:32:26 -06:00
Kris Hicks	7747124ef0	Apply some suggested fixes from staticcheck (#9598 )	2020-12-10 07:29:18 -08:00
Michael Schurter	855a08810c	docker: improve stats names and comments	2019-04-02 09:18:38 -07:00
Mahmood Ali	82d3c43e31	Update drivers/docker/stats.go comment Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-04-02 09:09:17 -07:00
Michael Schurter	a69664e5ee	docker: fix send after close panic in stats destCh was being written to by one goroutine and closed by another goroutine. This panic occurred in Travis: ``` === FAIL: drivers/docker TestDockerCoordinator_ConcurrentPulls (117.66s) === PAUSE TestDockerCoordinator_ConcurrentPulls === CONT TestDockerCoordinator_ConcurrentPulls panic: send on closed channel goroutine 5358 [running]: github.com/hashicorp/nomad/drivers/docker.dockerStatsCollector(0xc0003a4a20, 0xc0003a49c0, 0x3b9aca00) /home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats.go:108 +0x167 created by github.com/hashicorp/nomad/drivers/docker.TestDriver_DockerStatsCollector /home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats_test.go:33 +0x1ab ``` The 2 ways to fix this kind of error are to either (1) add extra coordination around multiple goroutines writing to a chan or (2) make it so only one goroutines writes to a chan. I implemented (2) first as it's simpler, but @notnoop pointed out since the same destCh in reused in the stats loop there's now a double close panic possible! So this implements (1) by adding a *usageSender struct for handling concurrent senders and closing.	2019-04-02 08:28:08 -07:00
Danielle Tomlinson	6624d3667b	docker: Support stats on Windows	2019-02-22 14:19:58 +01:00
Mahmood Ali	b5c20aa50b	Track Basic Memory Usage as reported by cgroups Track current memory usage, `memory.usage_in_bytes`, in addition to `memory.max_memory_usage_in_bytes` and friends. This number is closer what Docker reports. Related to https://github.com/hashicorp/nomad/issues/5165 .	2019-01-14 18:47:52 -05:00
Nick Ethier	f6af1d4d04	docker: add test for stats collection	2019-01-12 12:18:22 -05:00
Nick Ethier	fbf9a4c772	executor: implement streaming stats API plugins/driver: update driver interface to support streaming stats client/tr: use streaming stats api TODO: * how to handle errors and closed channel during stats streaming * prevent tight loop if Stats(ctx) returns an error drivers: update drivers TaskStats RPC to handle streaming results executor: better error handling in stats rpc docker: better control and error handling of stats rpc driver: allow stats to return a recoverable error	2019-01-12 12:18:22 -05:00

15 Commits