From ba596ea1e308ce29c1209ba979792d278a758fa1 Mon Sep 17 00:00:00 2001 From: Tim Gross Date: Fri, 18 Dec 2020 16:14:44 -0500 Subject: [PATCH] docs: update metrics tables --- website/pages/docs/operations/metrics.mdx | 1341 +++++---------------- 1 file changed, 324 insertions(+), 1017 deletions(-) diff --git a/website/pages/docs/operations/metrics.mdx b/website/pages/docs/operations/metrics.mdx index ef7b422c1..b53f8beb3 100644 --- a/website/pages/docs/operations/metrics.mdx +++ b/website/pages/docs/operations/metrics.mdx @@ -12,9 +12,7 @@ different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute. This data can be accessed via an HTTP endpoint or via sending a signal to the -Nomad process. - -As of Nomad version 0.7, this data is available via HTTP at `/metrics`. See +Nomad process. This data is available via HTTP at `/metrics`. See [Metrics](/api-docs/metrics) for more information. To view this data via sending a signal to the Nomad process: on Unix, @@ -70,197 +68,55 @@ Below is sample output of a telemetry dump: [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204 ``` +### Metric Types + +| Type | Description | Quantiles | +|---------|---------------------------------------------------------------------------------------------------------------------|-----------| +| Gauge | Gauge types report an absolute number at the end of the aggregation interval | false | +| Counter | Counts are incremented and flushed at the end of the aggregation interval and then are reset to zero | true | +| Timer | Timers measure the time to complete a task and will include quantiles, means, standard deviation, etc per interval. | true | + +### Tagged Metrics + +Nomad emits metrics in a tagged format. Each metric can support more than one +tag, meaning that it is possible to do a match over metrics for datapoints +such as a particular datacenter, and return all metrics with this tag. Nomad +supports labels for namespaces as well. + + ## Key Metrics -When telemetry is being streamed to statsite or statsd, `interval` is defined to -be their flush interval. Otherwise, the interval can be assumed to be 10 seconds -when retrieving metrics using the above described signals. +The metrics in the table below are the most important metrics for monitoring +the overall health of a Nomad cluster. + +When telemetry is being streamed to statsite or statsd, `interval` in the +table below is defined to be their flush interval. Otherwise, the interval can +be assumed to be 10 seconds when retrieving metrics using the above described +signals. + +| Metrics | Description | Unit | Type | +|-------------------|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------| +| `nomad.runtime.alloc_bytes` | Memory utilization | # of bytes | Gauge | +| `nomad.runtime.heap_objects` | Number of objects on the heap. General memory pressure indicator | # of heap objects | Gauge | +| `nomad.runtime.num_goroutines` | Number of goroutines and general load pressure indicator | # of goroutines | Gauge | +| `nomad.nomad.broker.total_blocked` | Evaluations that are blocked until an existing evaluation for the same job completes | # of evaluations | Gauge | +| `nomad.nomad.broker.total_ready` | Number of evaluations ready to be processed | # of evaluations | Gauge | +| `nomad.nomad.broker.total_unacked` | Evaluations dispatched for processing but incomplete | # of evaluations | Gauge | +| `nomad.nomad.heartbeat.active` | Number of active heartbeat timers. Each timer represents a Nomad Client connection | # of heartbeat timers | Gauge | +| `nomad.nomad.heartbeat.invalidate` | The length of time it takes to invalidate a Nomad Client due to failed heartbeats | ms / Heartbeat Invalidation | Timer | +| `nomad.nomad.plan.evaluate` | Time to validate a scheduler Plan. Higher values cause lower scheduling throughput. Similar to `nomad.plan.submit` but does not include RPC time or time in the Plan Queue | ms / Plan Evaluation | Timer | +| `nomad.nomad.plan.queue_depth` | Number of scheduler Plans waiting to be evaluated | # of plans | Gauge | +| `nomad.nomad.plan.submit` | Time to submit a scheduler Plan. Higher values cause lower scheduling throughput | ms / Plan Submit | Timer | +| `nomad.nomad.rpc.query` | Number of RPC queries | RPC Queries / `interval` | Counter | +| `nomad.nomad.rpc.request_error` | Number of RPC requests being handled that result in an error | RPC Errors / `interval` | Counter | +| `nomad.nomad.rpc.request` | Number of RPC requests being handled | RPC Requests / `interval` | Counter | +| `nomad.nomad.worker.invoke_scheduler.` | Time to run the scheduler of the given type | ms / Scheduler Run | Timer | +| `nomad.nomad.worker.wait_for_index` | Time waiting for Raft log replication from leader. High delays result in lower scheduling throughput | ms / Raft Index Wait | Timer | +| `nomad.raft.apply` | Number of Raft transactions | Raft transactions / `interval` | Counter | +| `nomad.raft.leader.lastContact` | Time since last contact to leader. General indicator of Raft latency | ms / Leader Contact | Timer | +| `nomad.raft.replication.appendEntries` | Raft transaction commit time | ms / Raft Log Append | Timer | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricDescriptionUnitType
- nomad.runtime.num_goroutines - Number of goroutines and general load pressure indicator# of goroutinesGauge
- nomad.runtime.alloc_bytes - Memory utilization# of bytesGauge
- nomad.runtime.heap_objects - Number of objects on the heap. General memory pressure indicator# of heap objectsGauge
- nomad.raft.apply - Number of Raft transactionsRaft transactions / `interval`Counter
- nomad.raft.replication.appendEntries - Raft transaction commit timems / Raft Log AppendTimer
- nomad.raft.leader.lastContact - - Time since last contact to leader. General indicator of Raft latency - ms / Leader ContactTimer
- nomad.broker.total_ready - Number of evaluations ready to be processed# of evaluationsGauge
- nomad.broker.total_unacked - Evaluations dispatched for processing but incomplete# of evaluationsGauge
- nomad.broker.total_blocked - - Evaluations that are blocked until an existing evaluation for the same - job completes - # of evaluationsGauge
- nomad.plan.queue_depth - Number of scheduler Plans waiting to be evaluated# of plansGauge
- nomad.plan.submit - - Time to submit a scheduler Plan. Higher values cause lower scheduling - throughput - ms / Plan SubmitTimer
- nomad.plan.evaluate - - Time to validate a scheduler Plan. Higher values cause lower scheduling - throughput. Similar to nomad.plan.submit but does not - include RPC time or time in the Plan Queue - ms / Plan EvaluationTimer
- nomad.worker.invoke_scheduler.<type> - Time to run the scheduler of the given typems / Scheduler RunTimer
- nomad.worker.wait_for_index - - Time waiting for Raft log replication from leader. High delays result in - lower scheduling throughput - ms / Raft Index WaitTimer
- nomad.heartbeat.active - - Number of active heartbeat timers. Each timer represents a Nomad Client - connection - # of heartbeat timersGauge
- nomad.heartbeat.invalidate - - The length of time it takes to invalidate a Nomad Client due to failed - heartbeats - ms / Heartbeat InvalidationTimer
- nomad.rpc.query - Number of RPC queriesRPC Queries / `interval`Counter
- nomad.rpc.request - Number of RPC requests being handledRPC Requests / `interval`Counter
- nomad.rpc.request_error - Number of RPC requests being handled that result in an errorRPC Errors / `interval`Counter
## Client Metrics @@ -284,852 +140,303 @@ and `periodic_id` are emitted, containing the ID of the specific invocation of t parameterized or periodic job respectively. For example, a dispatch job with the id `myjob/dispatch-1312323423423`, will have the following labels. - - - - - - - - - - - - - - - - - - - - - -
LabelValue
job - myjob/dispatch-1312323423423 -
parent_idmyjob
dispatch_id1312323423423
+| Label | Value | +|-------------|--------------------------------| +| job | `myjob/dispatch-1312323423423` | +| parent_id | `myjob` | +| dispatch_id | `1312323423423` | -## Host Metrics (post Nomad version 0.7) -Starting in version 0.7, Nomad will emit [tagged metrics][tagged-metrics], in the below format: +## Host Metrics - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricDescriptionUnitTypeLabels
- nomad.client.allocated.cpu - Total amount of CPU shares the scheduler has allocated to tasksMHzGaugenode_id, datacenter
- nomad.client.unallocated.cpu - - Total amount of CPU shares free for the scheduler to allocate to tasks - MHzGaugenode_id, datacenter
- nomad.client.allocated.memory - Total amount of memory the scheduler has allocated to tasksMegabytesGaugenode_id, datacenter
- nomad.client.unallocated.memory - - Total amount of memory free for the scheduler to allocate to tasks - MegabytesGaugenode_id, datacenter
- nomad.client.allocated.disk - Total amount of disk space the scheduler has allocated to tasksMegabytesGaugenode_id, datacenter
- nomad.client.unallocated.disk - - Total amount of disk space free for the scheduler to allocate to tasks - MegabytesGaugenode_id, datacenter
- nomad.client.allocated.network - - Total amount of bandwidth the scheduler has allocated to tasks on the - given device - MegabitsGaugenode_id, datacenter, device
- nomad.client.unallocated.network - - Total amount of bandwidth free for the scheduler to allocate to tasks on - the given device - MegabitsGaugenode_id, datacenter, device
- nomad.client.host.memory.total - Total amount of physical memory on the nodeBytesGaugenode_id, datacenter
- nomad.client.host.memory.available - - Total amount of memory available to processes which includes free and - cached memory - BytesGaugenode_id, datacenter
- nomad.client.host.memory.used - Amount of memory used by processesBytesGaugenode_id, datacenter
- nomad.client.host.memory.free - Amount of memory which is freeBytesGaugenode_id, datacenter
- nomad.client.uptime - Uptime of the host running the Nomad clientSecondsGaugenode_id, datacenter
- nomad.client.host.cpu.total - Total CPU utilizationPercentageGaugenode_id, datacenter, cpu
- nomad.client.host.cpu.user - CPU utilization in the user spacePercentageGaugenode_id, datacenter, cpu
- nomad.client.host.cpu.system - CPU utilization in the system spacePercentageGaugenode_id, datacenter, cpu
- nomad.client.host.cpu.idle - Idle time spent by the CPUPercentageGaugenode_id, datacenter, cpu
- nomad.client.host.disk.size - Total size of the deviceBytesGaugenode_id, datacenter, disk
- nomad.client.host.disk.used - Amount of space which has been usedBytesGaugenode_id, datacenter, disk
- nomad.client.host.disk.available - Amount of space which is availableBytesGaugenode_id, datacenter, disk
- nomad.client.host.disk.used_percent - Percentage of disk space usedPercentageGaugenode_id, datacenter, disk
- nomad.client.host.disk.inodes_percent - Disk space consumed by the inodesPercentGaugenode_id, datacenter, disk
- nomad.client.allocs.start - Number of allocations startingIntegerCounternode_id, job, task_group
- nomad.client.allocs.running - Number of allocations starting to runIntegerCounternode_id, job, task_group
- nomad.client.allocs.failed - Number of allocations failingIntegerCounternode_id, job, task_group
- nomad.client.allocs.restart - Number of allocations restartingIntegerCounternode_id, job, task_group
- nomad.client.allocs.complete - Number of allocations completingIntegerCounternode_id, job, task_group
- nomad.client.allocs.destroy - Number of allocations being destroyedIntegerCounternode_id, job, task_group
+Nomad will emit [tagged metrics][tagged-metrics], in the below format: -Nomad 0.9 adds an additional `node_class` label from the client's -`NodeClass` attribute. This label is set to the string "none" if empty. - -## Host Metrics (deprecated post Nomad 0.7) - -The below are metrics emitted by Nomad in versions prior to 0.7. These metrics -can be emitted in the below format post-0.7 (as well as the new format, -detailed above) but any new metrics will only be available in the new format. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricDescriptionUnitType
- nomad.client.allocated.cpu.<HostID> - Total amount of CPU shares the scheduler has allocated to tasksMHzGauge
- nomad.client.unallocated.cpu.<HostID> - - Total amount of CPU shares free for the scheduler to allocate to tasks - MHzGauge
- nomad.client.allocated.memory.<HostID> - Total amount of memory the scheduler has allocated to tasksMegabytesGauge
- nomad.client.unallocated.memory.<HostID> - - Total amount of memory free for the scheduler to allocate to tasks - MegabytesGauge
- nomad.client.allocated.disk.<HostID> - Total amount of disk space the scheduler has allocated to tasksMegabytesGauge
- nomad.client.unallocated.disk.<HostID> - - Total amount of disk space free for the scheduler to allocate to tasks - MegabytesGauge
- - nomad.client.allocated.network.<Device-Name>.<HostID> - - - Total amount of bandwidth the scheduler has allocated to tasks on the - given device - MegabitsGauge
- - nomad.client.unallocated.network.<Device-Name>.<HostID> - - - Total amount of bandwidth free for the scheduler to allocate to tasks on - the given device - MegabitsGauge
- nomad.client.host.memory.<HostID>.total - Total amount of physical memory on the nodeBytesGauge
- nomad.client.host.memory.<HostID>.available - - Total amount of memory available to processes which includes free and - cached memory - BytesGauge
- nomad.client.host.memory.<HostID>.used - Amount of memory used by processesBytesGauge
- nomad.client.host.memory.<HostID>.free - Amount of memory which is freeBytesGauge
- nomad.client.uptime.<HostID> - Uptime of the host running the Nomad clientSecondsGauge
- nomad.client.host.cpu.<HostID>.<CPU-Core>.total - Total CPU utilizationPercentageGauge
- nomad.client.host.cpu.<HostID>.<CPU-Core>.user - CPU utilization in the user spacePercentageGauge
- - nomad.client.host.cpu.<HostID>.<CPU-Core>.system - - CPU utilization in the system spacePercentageGauge
- nomad.client.host.cpu.<HostID>.<CPU-Core>.idle - Idle time spent by the CPUPercentageGauge
- - nomad.client.host.disk.<HostID>.<Device-Name>.size - - Total size of the deviceBytesGauge
- - nomad.client.host.disk.<HostID>.<Device-Name>.used - - Amount of space which has been usedBytesGauge
- - nomad.client.host.disk.<HostID>.<Device-Name>.available - - Amount of space which is availableBytesGauge
- - nomad.client.host.disk.<HostID>.<Device-Name>.used_percent - - Percentage of disk space usedPercentageGauge
- - nomad.client.host.disk.<HostID>.<Device-Name>.inodes_percent - - Disk space consumed by the inodesPercentGauge
+| Metric | Description | Unit | Type | Labels | +|-----------------------------------------|-------------------------------------------------------------------------------------|------------|-------|---------------------------------------------------------------------------------------| +| `nomad.client.allocated.cpu` | Total amount of CPU shares the scheduler has allocated to tasks | Mhz | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocated.memory` | Total amount of memory the scheduler has allocated to tasks | Megabytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocated_disk` | Total amount of disk space the scheduler has allocated to tasks | Megabytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocations.blocked` | Number of allocations blocked | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocations.migrating` | Number of allocations migrating | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocations.pending` | Number of allocations pending | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocations.running` | Number of allocations running | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocations.start` | Number of allocations starting | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocations.terminal` | Number of allocations terminal | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.allocs.oom_killed` | Number of allocations OOM killed | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.cpu.idle` | CPU utilization in idle state | Percentage | Gauge | cpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.cpu.system` | CPU utilization in system space | Percentage | Gauge | cpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.cpu.total` | Total CPU utilization | Percentage | Gauge | cpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.cpu.user` | CPU utilization in user space | Percentage | Gauge | cpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.disk.available` | Amount of space which is available | Bytes | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.disk.inodes_percent` | Disk space consumed by the inodes | Percentage | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.disk.size` | Total size of the device | Bytes | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.disk.used_percent` | Percentage of disk space used | Percentage | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.disk.used` | Amount of space which has been used | Bytes | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.memory.available` | Total amount of memory available to processes which includes free and cached memory | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.memory.free` | Amount of memory which is free | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.memory.total` | Total amount of physical memory on the node | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.host.memory.used` | Amount of memory used by processes | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.unallocated.cpu` | Total amount of CPU shares free for the scheduler to allocate to tasks | Mhz | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.unallocated.disk` | Total amount of disk space free for the scheduler to allocate to tasks | Megabytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.unallocated_memory` | Total amount of memory free for the scheduler to allocate to tasks | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | +| `nomad.client.uptime` | Uptime of the host running the Nomad client | Seconds | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status | ## Allocation Metrics - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricDescriptionUnitTypeLabels
- nomad.client.allocs.memory.allocated - Amount of memory allocated by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.memory.rss - Amount of RSS memory consumed by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.memory.cache - Amount of memory cached by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.memory.swap - Amount of memory swapped by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.memory.usage - Total amount of memory used by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.memory.max_usage - Maximum amount of memory ever used by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.memory.kernel_usage - Amount of memory used by the kernel for this taskBytesGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.memory.kernel_max_usage - Maximum amount of memory ever used by the kernel for this taskBytesGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.cpu.allocated - Total CPU resources allocated by the task across all coresMHzGaugealloc_id, host, job, namespace, task, task_group
- nomad.client.allocs.cpu.total_percent - Total CPU resources consumed by the task across all coresPercentageGaugealloc_id, task, namespace, host, job, task_group
- nomad.client.allocs.cpu.system - Total CPU resources consumed by the task in the system spacePercentageGaugealloc_id, task, namespace, host, job, task_group
- nomad.client.allocs.cpu.user - Total CPU resources consumed by the task in the user spacePercentageGaugealloc_id, task, namespace, host, job, task_group
- nomad.client.allocs.cpu.throttled_time - Total time that the task was throttledNanosecondsGaugealloc_id, task, namespace, host, job, task_group
- nomad.client.allocs.cpu.throttled_periods - Total number of CPU periods that the task was throttledNanosecondsGaugealloc_id, task, namespace, host, job, task_group
- nomad.client.allocs.cpu.total_ticks - CPU ticks consumed by the process in the last collection intervalIntegerGaugealloc_id, task, namespace, host, job, task_group
+The following metrics are emitted for each allocation if allocation metrics +are enabled. Note that allocation metrics available may be dependent on the +task driver; not all task drivers can provide all metrics. + +| Metric | Description | Unit | Type | Labels | +|-----------------------------------------------|-------------------------------------------------------------------|-------------|-------|--------------------------------------------------| +| `nomad.client.allocs.cpu.allocated` | Total CPU resources allocated by the task across all cores | MHz | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.cpu.system` | Total CPU resources consumed by the task in system space | Percentage | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.cpu.throttled_periods` | Total number of CPU periods that the task was throttled | Nanoseconds | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.cpu.throttled_time` | Total time that the task was throttled | Nanoseconds | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.cpu.total_percent` | Total CPU resources consumed by the task across all cores | Percentage | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.cpu.total_ticks` | CPU ticks consumed by the process in the last collection interval | Integer | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.cpu.user` | Total CPU resources consumed by the task in the user space | Percentage | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.memory.allocated` | Amount of memory allocated by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.memory.cache` | Amount of memory cached by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.memory.kernel_max_usage` | Maximum amount of memory ever used by the kernel for this task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.memory.kernel_usage` | Amount of memory used by the kernel for this task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.memory.max_usage` | Maximum amount of memory ever used by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.memory.rss` | Amount of RSS memory consumed by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.memory.swap` | Amount of memory swapped by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group | +| `nomad.client.allocs.memory.usage` | Total amount of memory used by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group | ## Job Summary Metrics Job summary metrics are emitted by the Nomad leader server. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricDescriptionUnitTypeLabels
- nomad.job_summary.queued - Number of queued allocations for a jobIntegerGaugejob, task_group
- nomad.job_summary.complete - Number of complete allocations for a jobIntegerGaugejob, task_group
- nomad.job_summary.failed - Number of failed allocations for a jobIntegerGaugejob, task_group
- nomad.job_summary.running - Number of running allocations for a jobIntegerGaugejob, task_group
- nomad.job_summary.starting - Number of starting allocations for a jobIntegerGaugejob, task_group
- nomad.job_summary.lost - Number of lost allocations for a jobIntegerGaugejob, task_group
+| Metric | Description | Unit | Type | Labels | +|------------------------------------|------------------------------------------|---------|-------|----------------------------------| +| `nomad.nomad.job_summary.complete` | Number of complete allocations for a job | Integer | Gauge | host, job, namespace, task_group | +| `nomad.nomad.job_summary.failed` | Number of failed allocations for a job | Integer | Gauge | host, job, namespace, task_group | +| `nomad.nomad.job_summary.lost` | Number of lost allocations for a job | Integer | Gauge | host, job, namespace, task_group | +| `nomad.nomad.job_summary.queued` | Number of queued allocations for a job | Integer | Gauge | host, job, namespace, task_group | +| `nomad.nomad.job_summary.running` | Number of running allocations for a job | Integer | Gauge | host, job, namespace, task_group | +| `nomad.nomad.job_summary.starting` | Number of starting allocations for a job | Integer | Gauge | host, job, namespace, task_group | ## Job Status Metrics Job status metrics are emitted by the Nomad leader server. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricDescriptionUnitType
- nomad.job_status.pending - Number jobs pendingIntegerGauge
- nomad.job_status.running - Number jobs runningIntegerGauge
- nomad.job_status.dead - Number of dead jobsIntegerGauge
+| Metric | Description | Unit | Type | Labels | +|----------------------------------|------------------------|---------|-------|--------| +| `nomad.nomad.job_status.dead` | Number of dead jobs | Integer | Gauge | host | +| `nomad.nomad.job_status.pending` | Number of pending jobs | Integer | Gauge | host | +| `nomad.nomad.job_status.running` | Number of running jobs | Integer | Gauge | host | -## Metric Types +## Server Metrics - - - - - - - - - - - - - - - - - - - - - - - - - -
TypeDescriptionQuantiles
Gauge - Gauge types report an absolute number at the end of the aggregation - interval - false
Counter - Counts are incremented and flushed at the end of the aggregation - interval and then are reset to zero - true
Timer - Timers measure the time to complete a task and will include quantiles, - means, standard deviation, etc per interval. - true
+The following table includes metrics for overall cluster health in addition to +those listed in [Key Metrics](#key-metrics) above. -## Tagged Metrics +| Metric | Description | Unit | Type | Labels | +|------------------------------------------------------|-------------------------------------------------------------------|----------------------|---------|--------| +| `nomad.memberlist.gossip` | Time elapsed to broadcast gossip messages | Nanoseconds | Summary | host | +| `nomad.nomad.acl.bootstrap` | Time elapsed for `ACL.Bootstrap` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.delete_policies` | Time elapsed for `ACL.DeletePolicies` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.delete_tokens` | Time elapsed for `ACL.DeleteTokens` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.get_policies` | Time elapsed for `ACL.GetPolicies` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.get_policy` | Time elapsed for `ACL.GetPolicy` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.get_token` | Time elapsed for `ACL.GetToken` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.get_tokens` | Time elapsed for `ACL.GetTokens` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.list_policies` | Time elapsed for `ACL.ListPolicies` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.list_tokens` | Time elapsed for `ACL.ListTokens` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.resolve_token` | Time elapsed for `ACL.ResolveToken` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.upsert_policies` | Time elapsed for `ACL.UpsertPolicies` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.acl.upsert_tokens` | Time elapsed for `ACL.UpsertTokens` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.alloc.exec` | Time elapsed to establish alloc exec | Nanoseconds | Summary | Host | +| `nomad.nomad.alloc.get_alloc` | Time elapsed for `Alloc.GetAlloc` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.alloc.get_allocs` | Time elapsed for `Alloc.GetAllocs` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.alloc.list` | Time elapsed for `Alloc.List` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.alloc.stop` | Time elapsed for `Alloc.Stop` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.alloc.update_desired_transition` | Time elapsed for `Alloc.UpdateDesiredTransition` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.blocked_evals.total_blocked` | Count of evals in the blocked state | Integer | Gauge | host | +| `nomad.nomad.blocked_evals.total_escaped` | Count of evals that have escaped computed node classes | Integer | Gauge | host | +| `nomad.nomad.blocked_evals.total_quota_limit` | Count of blocked evals due to quota limits | Integer | Gauge | host | +| `nomad.nomad.broker.batch_ready` | Count of batch evals ready to be scheduled | Integer | Gauge | host | +| `nomad.nomad.broker.batch_unacked` | Count of unacknowledged batch evals | Integer | Gauge | host | +| `nomad.nomad.broker.service_ready` | Count of service evals ready to be scheduled | Integer | Gauge | host | +| `nomad.nomad.broker.service_unacked` | Count of unacknowledged service evals | Integer | Gauge | host | +| `nomad.nomad.broker.system_ready` | Count of system evals ready to be scheduled | Integer | Gauge | host | +| `nomad.nomad.broker.system_unacked` | Count of unacknowledged system evals | Integer | Gauge | host | +| `nomad.nomad.broker.total_ready` | Count of evals in the ready state | Integer | Gauge | host | +| `nomad.nomad.broker.total_waiting` | Count of evals in the waiting state | Integer | Gauge | host | +| `nomad.nomad.client.batch_deregister` | Time elapsed for `Node.BatchDeregister` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.deregister` | Time elapsed for `Node.Deregister` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.derive_si_token` | Time elapsed for `Node.DeriveSIToken` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.derive_vault_token` | Time elapsed for `Node.DeriveVaultToken` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.emit_events` | Time elapsed for `Node.EmitEvents` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.evaluate` | Time elapsed for `Node.Evaluate` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.get_allocs` | Time elapsed for `Node.GetAllocs` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.get_client_allocs` | Time elapsed for `Node.GetClientAllocs` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.get_node` | Time elapsed for `Node.GetNode` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.list` | Time elapsed for `Node.List` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.register` | Time elapsed for `Node.Register` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.stats` | Time elapsed for `Client.Stats` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.update_alloc` | Time elapsed for `Node.UpdateAlloc` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.update_drain` | Time elapsed for `Node.UpdateDrain` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.update_eligibility` | Time elapsed for `Node.UpdateEligibility` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client.update_status` | Time elapsed for `Node.UpdateStatus` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_allocations.garbage_collect_all` | Time elapsed for `ClientAllocations.GarbageCollectAll` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_allocations.garbage_collect` | Time elapsed for `ClientAllocations.GarbageCollect` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_allocations.restart` | Time elapsed for `ClientAllocations.Restart` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_allocations.signal` | Time elapsed for `ClientAllocations.Signal` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_allocations.stats` | Time elapsed for `ClientAllocations.Stats` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_csi_controller.attach_volume` | Time elapsed for `Controller.AttachVolume` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_csi_controller.detach_volume` | Time elapsed for `Controller.DetachVolume` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_csi_controller.validate_volume` | Time elapsed for `Controller.ValidateVolume` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.client_csi_node.detach_volume` | Time elapsed for `Node.DetachVolume` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.allocations` | Time elapsed for `Deployment.Allocations` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.cancel` | Time elapsed for `Deployment.Cancel` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.fail` | Time elapsed for `Deployment.Fail` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.get_deployment` | Time elapsed for `Deployment.GetDeployment` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.list` | Time elapsed for `Deployment.List` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.pause` | Time elapsed for `Deployment.Pause` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.promote` | Time elapsed for `Deployment.Promote` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.reap` | Time elapsed for `Deployment.Reap` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.run` | Time elapsed for `Deployment.Run` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.set_alloc_health` | Time elapsed for `Deployment.SetAllocHealth` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.deployment.unblock` | Time elapsed for `Deployment.Unblock` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.ack` | Time elapsed for `Eval.Ack` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.allocations` | Time elapsed for `Eval.Allocations` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.create` | Time elapsed for `Eval.Create` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.dequeue` | Time elapsed for `Eval.Dequeue` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.get_eval` | Time elapsed for `Eval.GetEval` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.list` | Time elapsed for `Eval.List` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.nack` | Time elapsed for `Eval.Nack` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.reap` | Time elapsed for `Eval.Reap` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.reblock` | Time elapsed for `Eval.Reblock` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.eval.update` | Time elapsed for `Eval.Update` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.file_system.list` | Time elapsed for `FileSystem.List` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.file_system.logs` | Time elapsed to establish `FileSystem.Logs` RPC | Nanoseconds | Summary | Host | +| `nomad.nomad.file_system.stat` | Time elapsed for `FileSystem.Stat` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.file_system.stream` | Time elapsed to establish `FileSystem.Stream` RPC | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.alloc_client_update` | Time elapsed to apply `AllocClientUpdate` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.alloc_update_desired_transition` | Time elapsed to apply `AllocUpdateDesiredTransition` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.alloc_update` | Time elapsed to apply `AllocUpdate` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_acl_policy_delete` | Time elapsed to apply `ApplyACLPolicyDelete` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_acl_policy_upsert` | Time elapsed to apply `ApplyACLPolicyUpsert` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_acl_token_bootstrap` | Time elapsed to apply `ApplyACLTokenBootstrap` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_acl_token_delete` | Time elapsed to apply `ApplyACLTokenDelete` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_acl_token_upsert` | Time elapsed to apply `ApplyACLTokenUpsert` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_csi_plugin_delete` | Time elapsed to apply `ApplyCSIPluginDelete` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_csi_volume_batch_claim` | Time elapsed to apply `ApplyCSIVolumeBatchClaim` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_csi_volume_claim` | Time elapsed to apply `ApplyCSIVolumeClaim` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_csi_volume_deregister` | Time elapsed to apply `ApplyCSIVolumeDeregister` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_csi_volume_register` | Time elapsed to apply `ApplyCSIVolumeRegister` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_deployment_alloc_health` | Time elapsed to apply `ApplyDeploymentAllocHealth` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_deployment_delete` | Time elapsed to apply `ApplyDeploymentDelete` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_deployment_promotion` | Time elapsed to apply `ApplyDeploymentPromotion` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_deployment_status_update` | Time elapsed to apply `ApplyDeploymentStatusUpdate` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_job_stability` | Time elapsed to apply `ApplyJobStability` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_namespace_delete` | Time elapsed to apply `ApplyNamespaceDelete` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_namespace_upsert` | Time elapsed to apply `ApplyNamespaceUpsert` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_plan_results` | Time elapsed to apply `ApplyPlanResults` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.apply_scheduler_config` | Time elapsed to apply `ApplySchedulerConfig` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.autopilot` | Time elapsed to apply `Autopilot` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.batch_deregister_job` | Time elapsed to apply `BatchDeregisterJob` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.batch_deregister_node` | Time elapsed to apply `BatchDeregisterNode` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.batch_node_drain_update` | Time elapsed to apply `BatchNodeDrainUpdate` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.cluster_meta` | Time elapsed to apply `ClusterMeta` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.delete_eval` | Time elapsed to apply `DeleteEval` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.deregister_job` | Time elapsed to apply `DeregisterJob` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.deregister_node` | Time elapsed to apply `DeregisterNode` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.deregister_si_accessor` | Time elapsed to apply `DeregisterSITokenAccessor` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.deregister_vault_accessor` | Time elapsed to apply `DeregisterVaultAccessor` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.node_drain_update` | Time elapsed to apply `NodeDrainUpdate` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.node_eligibility_update` | Time elapsed to apply `NodeEligibilityUpdate` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.node_status_update` | Time elapsed to apply `NodeStatusUpdate` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.persist` | Time elapsed to apply `Persist` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.register_job` | Time elapsed to apply `RegisterJob` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.register_node` | Time elapsed to apply `RegisterNode` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.update_eval` | Time elapsed to apply `UpdateEval` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.upsert_node_events` | Time elapsed to apply `UpsertNodeEvents` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.upsert_scaling_event` | Time elapsed to apply `UpsertScalingEvent` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.upsert_si_accessor` | Time elapsed to apply `UpsertSITokenAccessors` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.fsm.upsert_vault_accessor` | Time elapsed to apply `UpsertVaultAccessor` raft entry | Nanoseconds | Summary | Host | +| `nomad.nomad.job.allocations` | Time elapsed for `Job.Allocations` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.batch_deregister` | Time elapsed for `Job.BatchDeregister` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.deployments` | Time elapsed for `Job.Deployments` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.deregister` | Time elapsed for `Job.Deregister` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.dispatch` | Time elapsed for `Job.Dispatch` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.evaluate` | Time elapsed for `Job.Evaluate` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.evaluations` | Time elapsed for `Job.Evaluations` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.get_job_versions` | Time elapsed for `Job.GetJobVersions` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.get_job` | Time elapsed for `Job.GetJob` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.latest_deployment` | Time elapsed for `Job.LatestDeployment` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.list` | Time elapsed for `Job.List` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.plan` | Time elapsed for `Job.Plan` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.register` | Time elapsed for `Job.Register` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.revert` | Time elapsed for `Job.Revert` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.scale_status` | Time elapsed for `Job.ScaleStatus` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.scale` | Time elapsed for `Job.Scale` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.stable` | Time elapsed for `Job.Stable` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job.validate` | Time elapsed for `Job.Validate` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.job_summary.get_job_summary` | Time elapsed for `Job.Summary` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.leader.barrier` | Time elapsed to establish a raft barrier during leader transition | Nanoseconds | Summary | host | +| `nomad.nomad.leader.reconcileMember` | Time elapsed to reconcile a serf peer with state store | Nanoseconds | Summary | host | +| `nomad.nomad.leader.reconcile` | Time elapsed to reconcile all serf peers with state store | Nanoseconds | Summary | host | +| `nomad.nomad.namespace.delete_namespaces` | Time elapsed for `Namespace.DeleteNamespaces` | Nanoseconds | Summary | Host | +| `nomad.nomad.namespace.get_namespace` | Time elapsed for `Namespace.GetNamespace` | Nanoseconds | Summary | Host | +| `nomad.nomad.namespace.get_namespaces` | Time elapsed for `Namespace.GetNamespaces` | Nanoseconds | Summary | Host | +| `nomad.nomad.namespace.list_namespace` | Time elapsed for `Namespace.ListNamespaces` | Nanoseconds | Summary | Host | +| `nomad.nomad.namespace.upsert_namespaces` | Time elapsed for `Namespace.UpsertNamespaces` | Nanoseconds | Summary | Host | +| `nomad.nomad.periodic.force` | Time elapsed for `Periodic.Force` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.plan.apply` | Time elapsed to apply a plan | Nanoseconds | Summary | host | +| `nomad.nomad.plan.evaluate` | Time elapsed to evaluate a plan | Nanoseconds | Summary | host | +| `nomad.nomad.plan.queue_depth` | Count of evals in the plan queue | Integer | Gauge | host | +| `nomad.nomad.plan.submit` | Time elapsed for `Plan.Submit` RPC call | Nanoseconds | Summary | host | +| `nomad.nomad.plan.wait_for_index` | Time elapsed for the planner to obtain a snapshot | Nanoseconds | Summary | host | +| `nomad.nomad.plugin.delete` | Time elapsed for `CSIPlugin.Delete` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.plugin.get` | Time elapsed for `CSIPlugin.Get` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.plugin.list` | Time elapsed for `CSIPlugin.List` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.scaling.get_policy` | Time elapsed for `Scaling.GetPolicy` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.scaling.list_policies` | Time elapsed for `Scaling.ListPolicies` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.search.prefix_search` | Time elapsed for `Search.PrefixSearch` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.vault.create_token` | Time elapsed to create Vault token | Nanoseconds | Gauge | host | +| `nomad.nomad.vault.distributed_tokens_revoked` | Count of revoked tokens | Integer | Gauge | host | +| `nomad.nomad.vault.lookup_token` | Time elapsed to lookup Vault token | Nanoseconds | Gauge | host | +| `nomad.nomad.vault.renew_failed` | Count of failed attempts to renew Vault token | Integer | Gauge | host | +| `nomad.nomad.vault.renew` | Time elapsed to renew Vault token | Nanoseconds | Gauge | host | +| `nomad.nomad.vault.revoke_tokens` | Time elapsed to revoke Vault tokens | Nanoseconds | Gauge | host | +| `nomad.nomad.vault.token_ttl` | Time to live for Vault token | Integer | Gauge | host | +| `nomad.nomad.vault.undistributed_tokens_abandoned` | Count of abandoned tokens | Integer | Gauge | host | +| `nomad.nomad.volume.claim` | Time elapsed for `CSIVolume.Claim` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.volume.deregister` | Time elapsed for `CSIVolume.Deregister` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.volume.get` | Time elapsed for `CSIVolume.Get` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.volume.list` | Time elapsed for `CSIVolume.List` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.volume.register` | Time elapsed for `CSIVolume.Register` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.volume.unpublish` | Time elapsed for `CSIVolume.Unpublish` RPC call | Nanoseconds | Summary | Host | +| `nomad.nomad.worker.create_eval` | Time elapsed for worker to create an eval | Nanoseconds | Summary | host | +| `nomad.nomad.worker.dequeue_eval` | Time elapsed for worker to dequeue an eval | Nanoseconds | Summary | host | +| `nomad.nomad.worker.invoke_scheduler_service` | Time elapsed for worker to invoke the scheduler | Nanoseconds | Summary | host | +| `nomad.nomad.worker.send_ack` | Time elapsed for worker to send acknowledgement | Nanoseconds | Summary | host | +| `nomad.nomad.worker.submit_plan` | Time elapsed for worker to submit plan | Nanoseconds | Summary | host | +| `nomad.nomad.worker.update_eval` | Time elapsed for worker to submit updated eval | Nanoseconds | Summary | host | +| `nomad.nomad.worker.wait_for_index` | Time elapsed for worker get snapshot | Nanoseconds | Summary | host | +| `nomad.raft.appliedIndex` | Current index applied to FSM | Integer | Gauge | host | +| `nomad.raft.barrier` | Count of blocking raft API calls | Integer | Counter | host | +| `nomad.raft.commitNumLogs` | Count of logs enqueued | Integer | Gauge | host | +| `nomad.raft.commitTime` | Time elapsed to commit writes | Nanoseconds | Summary | host | +| `nomad.raft.fsm.apply` | Time elapsed to apply write to FSM | Nanoseconds | Summary | host | +| `nomad.raft.fsm.enqueue` | Time elapsed to enqueue write to FSM | Nanoseconds | Summary | host | +| `nomad.raft.lastIndex` | Most recent index seen | Integer | Gauge | host | +| `nomad.raft.leader.dispatchLog` | Time elapsed to write log, mark in flight, and start replication | Nanoseconds | Summary | host | +| `nomad.raft.leader.dispatchNumLogs` | Count of logs dispatched | Integer | Gauge | host | +| `nomad.raft.replication.appendEntries` | Raft transaction commit time | ms / Raft Log Append | Timer | | +| `nomad.runtime.free_count` | Count of objects freed from heap by go runtime GC | Integer | Gauge | host | +| `nomad.runtime.gc_pause_ns` | Go runtime GC pause times | Nanoseconds | Summary | host | +| `nomad.runtime.sys_bytes` | Go runtime GC metadata size | # of bytes | Gauge | host | +| `nomad.runtime.total_gc_pause_ns` | Total elapsed go runtime GC pause times | Nanoseconds | Gauge | host | +| `nomad.runtime.total_gc_runs` | Count of go runtime GC runs | Integer | Gauge | host | +| `nomad.serf.queue.Event` | Count of memberlist events received | Integer | Summary | host | +| `nomad.serf.queue.Intent` | Count of memberlist changes | Integer | Summary | host | +| `nomad.serf.queue.Query` | Count of memberlist queries | Integer | Summary | host | +| `nomad.state.snapshotIndex` | Current snapshot index | Integer | Gauge | host | -As of version 0.7, Nomad will start emitting metrics in a tagged format. Each -metric can support more than one tag, meaning that it is possible to do a -match over metrics for datapoints such as a particular datacenter, and return -all metrics with this tag. Nomad supports labels for namespaces as well. [tagged-metrics]: /docs/telemetry/metrics#tagged-metrics