Commit Graph

15 Commits

Author SHA1 Message Date
Tim Gross
374e987b9b metrics: emit cache and rss stats on cgroup v2 (#25751)
In cgroups v2, a different map of memory stats is available from the kernel than
in v1. The Docker API reflects this change. But there are equivalent values in
the map for RSS (anonymously mapped memory) and cache (filesystem cache and
tmpfs), which the Docker driver is not currently emitting.

Fallback to these alternate values when the cgroups v1 values are not
available. Include the anonymous mapping in the "measured" allocation stats as
"RSS" so that they both show up in allocation metrics. We can do this on both
the `docker` driver and the Linux executor for `exec` and `java` drivers.

Fixes: https://github.com/hashicorp/nomad/issues/19185
Ref: https://hashicorp.atlassian.net/browse/NMD-437
Ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
Ref: https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
2025-04-24 12:48:18 -04:00
Piotr Kazmierczak
3ad0df71a8 docker: correct stat response for rss, cache and swap memory in cgroups v1 (#25741)
#25138 refactoring accidentally removed
some of the memory stats that weren't available as concrete types in
containerapi.
2025-04-24 15:17:56 +02:00
James Rasell
0726e4cc3e driver/docker: Fix container CPU stats collection (#24768)
The recent change to collection via a "one-shot" Docker API call
did not update the stream boolean argument. This results in the
PreCPUStats values being zero and therefore breaking the CPU
calculations which rely on this data. The base fix is to update
the passed boolean parameter to match the desired non-streaming
behaviour. The non-streaming API call correctly returns the
PreCPUStats data which can be seen in the added unit test.

The most recent change also modified the behaviour of the
collectStats go routine, so that any error encountered results in
the routine exiting. In the event this was a transient error, the
container will continue to run, however, no stats will be collected
until the task is stopped and replaced. This PR reverts the
behaviour, so that an error encountered during a stats collection
run results in the error being logged but the collection process
continuing with a backoff used.
2025-01-07 07:42:31 +00:00
Piotr Kazmierczak
981ca36049 docker: use official client instead of fsouza/go-dockerclient (#23966)
This PR replaces fsouza/go-dockerclient 3rd party docker client library with
docker's official SDK.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-09-26 18:41:44 +02:00
Seth Hoenig
591394fb62 drivers: plumb hardware topology via grpc into drivers (#18504)
* drivers: plumb hardware topology via grpc into drivers

This PR swaps out the temporary use of detecting system hardware manually
in each driver for using the Client's detected topology by plumbing the
data over gRPC. This ensures that Client configuration is taken to account
consistently in all references to system topology.

* cr: use enum instead of bool for core grade

* cr: fix test slit tables to be possible
2023-09-18 08:58:07 -05:00
hashicorp-copywrite[bot]
2d35e32ec9 Update copyright file headers to BUSL-1.1 2023-08-10 17:27:15 -05:00
Seth Hoenig
a4cc76bd3e numa: enable numa topology detection (#18146)
* client: refactor cgroups management in client

* client: fingerprint numa topology

* client: plumb numa and cgroups changes to drivers

* client: cleanup task resource accounting

* client: numa client and config plumbing

* lib: add a stack implementation

* tools: remove ec2info tool

* plugins: fixup testing for cgroups / numa changes

* build: update makefile and package tests and cl
2023-08-10 17:05:30 -05:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Alessandro De Blasis
759397533a metrics: added mapped_file metric (#11500)
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
Co-authored-by: Nate <37554478+servusdei2018@users.noreply.github.com>
2022-01-10 15:35:19 -05:00
Michael Schurter
855a08810c docker: improve stats names and comments 2019-04-02 09:18:38 -07:00
Michael Schurter
a69664e5ee docker: fix send after close panic in stats
destCh was being written to by one goroutine and closed by another
goroutine. This panic occurred in Travis:

```
=== FAIL: drivers/docker TestDockerCoordinator_ConcurrentPulls (117.66s)
=== PAUSE TestDockerCoordinator_ConcurrentPulls
=== CONT  TestDockerCoordinator_ConcurrentPulls

panic: send on closed channel

goroutine 5358 [running]:
github.com/hashicorp/nomad/drivers/docker.dockerStatsCollector(0xc0003a4a20, 0xc0003a49c0, 0x3b9aca00)
	/home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats.go:108 +0x167

created by
github.com/hashicorp/nomad/drivers/docker.TestDriver_DockerStatsCollector
	/home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats_test.go:33 +0x1ab
```

The 2 ways to fix this kind of error are to either (1) add extra
coordination around multiple goroutines writing to a chan or (2) make it
so only one goroutines writes to a chan.

I implemented (2) first as it's simpler, but @notnoop pointed out since
the same destCh in reused in the stats loop there's now a double close
panic possible!

So this implements (1) by adding a *usageSender struct for handling
concurrent senders and closing.
2019-04-02 08:28:08 -07:00
Danielle Tomlinson
6624d3667b docker: Support stats on Windows 2019-02-22 14:19:58 +01:00
Mahmood Ali
b5c20aa50b Track Basic Memory Usage as reported by cgroups
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends.  This number is closer
what Docker reports.

Related to https://github.com/hashicorp/nomad/issues/5165 .
2019-01-14 18:47:52 -05:00
Nick Ethier
f6af1d4d04 docker: add test for stats collection 2019-01-12 12:18:22 -05:00