Commit Graph

55 Commits

Author SHA1 Message Date
Tim Gross
14568b3e00 deps: bump gopsutil to v3.21.2 2021-03-30 16:02:51 -04:00
Mahmood Ali
4e4c3873cb Update gopsutil code
Latest gosutil includes two backward incompatible changes:

First, it removed unused Stolen field in
cae8efcffa (diff-d9747e2da342bdb995f6389533ad1a3d)
.

Second, it updated the Windows cpu stats calculation to be inline with
other platforms, where it returns absolate stats rather than
percentages.  See https://github.com/shirou/gopsutil/pull/611.
2020-03-15 09:37:05 +01:00
Yoan Blanc
f80cbe86a1 gopsutils: v2.20.2
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-15 09:36:59 +01:00
Danielle Lancashire
5b183e5306 client: Return empty values when host stats fail
Currently, there is an issue when running on Windows whereby under some
circumstances the Windows stats API's will begin to return errors (such
as internal timeouts) when a client is under high load, and potentially
other forms of resource contention / system states (and other unknown
cases).

When an error occurs during this collection, we then short circuit
further metrics emission from the client until the next interval.

This can be problematic if it happens for a sustained number of
intervals, as our metrics aggregator will begin to age out older
metrics, and we will eventually stop emitting various types of metrics
including `nomad.client.unallocated.*` metrics.

However, when metrics collection fails on Linux, gopsutil will in many cases
(e.g cpu.Times) silently return 0 values, rather than an error.

Here, we switch to returning empty metrics in these failures, and
logging the error at the source. This brings the behaviour into line
with Linux/Unix platforms, and although making aggregation a little
sadder on intermittent failures, will result in more desireable overall
behaviour of keeping metrics available for further investigation if
things look unusual.
2019-09-19 01:22:07 +02:00
Mahmood Ali
ba3fe15f7e Add Client Device Stats structs in api package 2018-11-14 14:41:19 -05:00
Mahmood Ali
5af9296bb4 Expose Device Stats in /client/stats API endpoint 2018-11-14 14:41:19 -05:00
Alex Dadgar
f91b269b2a fix test compiling 2018-10-16 16:56:55 -07:00
Michael Schurter
9da25adc54 client: hclog-ify most of the client
Leaving fingerprinters in case that interface changes with plugins.
2018-10-16 16:53:30 -07:00
Alex Dadgar
98c7abe541 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Josh Soref
94e9e17d05 spelling: represents 2018-03-11 18:42:29 +00:00
Josh Soref
0230661b30 spelling: purposes 2018-03-11 18:39:35 +00:00
Alex Dadgar
e2d1ce8ff2 Fix manager tests and make testagent recover from port conflicts 2018-02-15 13:59:01 -08:00
Alex Dadgar
ddee97ca29 Stats Endpoint 2018-02-15 13:59:00 -08:00
Michael Schurter
9c1e595e2e Fix GC'd alloc tracking
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.

Also stops logging "marked for GC" twice.
2017-11-01 15:16:38 -05:00
Alex Dadgar
a9e3a41407 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar
f23ac5f083 Non-locked accessors to common Node fields
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.

Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Alex Dadgar
db261cd0c7 Fix invalid CPU stats on Windows
This PR fixes an issue introduced in Nomad 0.6.0 due to
https://github.com/shirou/gopsutil/issues/420. The issue arised from the
fact that the Windows stats from gopsutil reports CPUs in
percentages where we expected ticks.
2017-09-10 15:30:48 -07:00
James Nugent
3a5082022d client: Guard against "NaN" values from floats
This commit protects against finding `0.NaN` tokens in JSON streams
because of infinity representation on serialization.
2017-09-08 16:21:07 -05:00
Michael Schurter
7a84cbe02a Squelch logspam when unable to get disk usage stats
To reproduce logspam:

```
$ docker plugin install --grant-all-permissions vieux/sshfs
$ nomad agent -dev
...
2017/08/25 17:09:03.282868 [WARN] client: error fetching host disk usage stats for /var/lib/docker/plugins/a8b4a69b07e5180f828d19e1e9e102ccc0e26f9c9939eaef85357260c30b20a7/rootfs/mnt/volumes: permission denied
... repeats every collection period ...
```
2017-08-28 12:04:32 -07:00
Alex Dadgar
47d48bdaee Fix nil dereference 2017-01-10 14:14:58 -08:00
Diptanu Choudhury
7ebe4a6972 Added comments 2016-12-20 10:49:48 -08:00
Diptanu Choudhury
61e534d684 Making the gc allocator understand real disk usage 2016-12-16 18:34:59 -08:00
Diptanu Choudhury
41d7ebc5c5 Refactored hoststats collector 2016-12-14 15:07:42 -08:00
Christoffer Kylvåg
c3df9dd73f #1680: Continue after not being able to stat a mountpoint 2016-12-13 12:28:57 +01:00
Kenjiro Nakayama
50ca5c7f71 Update after the review 2016-08-11 10:53:33 +09:00
Kenjiro Nakayama
fbb2d5cd5d Return error when client failed to collect host stats 2016-08-11 09:38:28 +09:00
Alex Dadgar
0b3a39b47f guard against NaN 2016-06-20 10:29:46 -07:00
Alex Dadgar
591104f848 Merge pull request #1260 from hashicorp/f-alloc-stats-struct
Allocation resources returned in a struct
2016-06-12 11:18:57 -07:00
Alex Dadgar
020f8b05d3 only support latest and remove ring buffer 2016-06-12 09:32:38 -07:00
Diptanu Choudhury
53a57cae79 Fix the calculation of total ticks for docker and exec 2016-06-12 18:08:35 +02:00
Diptanu Choudhury
658362d248 Removing un-used code 2016-06-12 01:23:49 +02:00
Diptanu Choudhury
73f5c29e38 Fixed the calculation of the host node ticks 2016-06-12 01:14:51 +02:00
Diptanu Choudhury
d1fdd27f86 Moving the clkspeed code to helper 2016-06-11 17:31:49 +02:00
Diptanu Choudhury
304403a2f8 Extracted a method for getting clock speed 2016-06-11 02:07:28 +02:00
Diptanu Choudhury
358cdf8f63 Calculating total ticks consumed in the nomad client 2016-06-10 23:14:33 +02:00
Diptanu Choudhury
c13e750a02 Calculating the cpu ticks in nomad client 2016-06-10 22:22:32 +02:00
Alex Dadgar
693c8f9e42 Alloc-status only shows measured statistics and fixes to CPU calculations 2016-06-10 10:38:29 -07:00
Diptanu Choudhury
15e79c3783 Changing the api of the stats endpoints 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
0b868e07cb Initializing the ring buffer with no cells 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
71d3361f79 creating the host cpu percent calculator lazily 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
73f05942f2 Refactored the api for NewHostStatsCollector 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
584c1e34fb Incorporated review comments for executor 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
4491c2b0e8 Added disk usage to node status 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
1183037d56 Added uptime to node stats 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
aee9db02d5 Showing host resource usage stats 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
458b7014e4 Added a test for calculating cpu stats 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
3dc28bd871 Stopping stats collection of tasks which has been destroyed 2016-05-28 19:59:20 -07:00
Diptanu Choudhury
df68129e5a Added some docs 2016-05-28 19:59:03 -07:00
Diptanu Choudhury
b7158be541 Added locks to RingBuf 2016-05-28 19:59:03 -07:00
Diptanu Choudhury
98068678f1 Implemented nomad cpu percentage calculator 2016-05-28 19:59:03 -07:00