client: emit optional telemetry from prerun and prestart hooks. (#24556)

The Nomad client can now optionally emit telemetry data from the
prerun and prestart hooks. This allows operators to monitor and
alert on failures and time taken to complete.

The new datapoints are:
  - nomad.client.alloc_hook.prerun.success (counter)
  - nomad.client.alloc_hook.prerun.failed (counter)
  - nomad.client.alloc_hook.prerun.elapsed (sample)

  - nomad.client.task_hook.prestart.success (counter)
  - nomad.client.task_hook.prestart.failed (counter)
  - nomad.client.task_hook.prestart.elapsed (sample)

The hook execution time is useful to Nomad engineering and will
help optimize code where possible and understand job specification
impacts on hook performance.

Currently only the PreRun and PreStart hooks have telemetry
enabled, so we limit the number of new metrics being produced.
This commit is contained in:
James Rasell
2024-12-12 14:43:14 +00:00
committed by GitHub
parent 86bc7ed224
commit 7d48aa2667
23 changed files with 515 additions and 57 deletions

View File

@@ -369,6 +369,7 @@
"syslog_facility": "LOCAL1",
"telemetry": [
{
"disable_allocation_hook_metrics": true,
"in_memory_collection_interval": "1m",
"in_memory_retention_period": "24h",
"collection_interval": "3s",