nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 09:25:46 +03:00

Author	SHA1	Message	Date
Michael Schurter	c52741ae1b	docs: clarify total_escaped is just an optimization (#13460 )	2022-06-22 11:39:56 -07:00
Michael Schurter	19bac3caa8	docs: add plan for node rejected details and more (#12564 ) - Moved federation docs to the bottom since everyone is potentially affected by the other sections on the page, but only users of federation are affected by it. - Added section on the plan for node rejected bug since it is fairly easy to diagnose and removing affected nodes is a fairly reliable workaround. - Mention 5s cliff for wait_for_index. - Remove the lie that we do not have job status metrics! How old was that?! - Reinforce the importance of monitoring basic system resources	2022-04-14 16:09:33 -07:00
Jasmine Dahilig	ccaaadf493	docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505 )	2022-04-07 15:12:41 -07:00
Derek Strickland	5b5c853597	disconnected clients: Observability plumbing (#12141 ) * Add disconnects/reconnect to log output and emit reschedule metrics * TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics	2022-04-05 17:12:23 -04:00
Seth Hoenig	16efcf4e71	core: switch to go.etc.io/bbolt This PR swaps the underlying BoltDB implementation from boltdb/bolt to go.etc.io/bbolt. In addition, the Server has a new configuration option for disabling NoFreelistSync on the underlying database. Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81 Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720	2022-02-23 14:26:41 -06:00
Luiz Aoqui	a0c0b808af	docs: add `nomad.plan.node_rejected` metric (#11860 )	2022-01-18 13:47:20 -05:00
Tim Gross	7fad4b9169	docs: new scheduler metrics (#11790 ) * Fixed name of `nomad.scheduler.allocs.reschedule` metric * Added new metrics to metrics reference documentation * Expanded definitions of "waiting" metrics * Changelog entry for #10236 and #10237	2022-01-07 09:51:15 -05:00
Tim Gross	95fa1b30f4	docs: improve docs for troubleshooting and monitoring scheduler (#11623 ) This changeset adds more specific recommendations as to what metrics to monitor, and what resources should be examined during incident response. It also renames the "Telemetry" section to "Monitoring Nomad" to surface the material better and distinguish it from the "Metric Reference". Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2021-12-07 15:52:13 -05:00
James Rasell	d2132b96b4	docs: add license expiry metric to metrics website doc.	2021-12-07 10:31:51 +00:00
kfenech1	6bbcb180f2	docs: `nomad.client.unallocated.memory` is in Megabytes not bytes (#11468 )	2021-11-08 11:05:11 -05:00
Michael Schurter	594ceb7022	docs: improve wait_for_index metrics description (#10717 ) Old description of `{plan,worker}.wait_for_index` described the metric in terms of waiting for a snapshot which has two problems: 1. "Snapshot" is an overloaded term in Nomad and operators can't be expected to know which use we're referring to here. 2. The most important thing about the metric is what we're waiting on before taking a snapshot: the raft index of the object to be processed (plan or eval). The new description tries to cram all of that context into the tiny space provided. See #5791 for details about the `wait_for_index` mechanism in general.	2021-06-09 08:53:06 -04:00
Luiz Aoqui	c7114921fa	Add metrics for blocked eval resources (#10454 ) * add metrics for blocked eval resources * docs: add new blocked_evals metrics * fix to call `pruneStats` instead of `stats.prune` directly	2021-04-29 15:03:45 -04:00
Bryce Kalow	ee79587a67	feat(website): migrates to new nav data format (#10264 )	2021-03-31 08:43:17 -05:00
Tim Gross	3b42d75225	docs: add metrics from raft leadership transitions	2021-01-27 11:50:11 -05:00
Jeff Escalante	0eae603a86	implement mdx remote	2021-01-05 19:02:39 -05:00

15 Commits