mirror of
https://github.com/kemko/nomad.git
synced 2026-01-04 17:35:43 +03:00
The Nomad client can now optionally emit telemetry data from the prerun and prestart hooks. This allows operators to monitor and alert on failures and time taken to complete. The new datapoints are: - nomad.client.alloc_hook.prerun.success (counter) - nomad.client.alloc_hook.prerun.failed (counter) - nomad.client.alloc_hook.prerun.elapsed (sample) - nomad.client.task_hook.prestart.success (counter) - nomad.client.task_hook.prestart.failed (counter) - nomad.client.task_hook.prestart.elapsed (sample) The hook execution time is useful to Nomad engineering and will help optimize code where possible and understand job specification impacts on hook performance. Currently only the PreRun and PreStart hooks have telemetry enabled, so we limit the number of new metrics being produced.
219 lines
8.7 KiB
Plaintext
219 lines
8.7 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: telemetry Block - Agent Configuration
|
|
description: |-
|
|
The "telemetry" block configures Nomad's publication of metrics and telemetry
|
|
to third-party systems.
|
|
---
|
|
|
|
# `telemetry` Block
|
|
|
|
<Placement groups={['telemetry']} />
|
|
|
|
The `telemetry` block configures Nomad's publication of metrics and telemetry
|
|
to third-party systems.
|
|
|
|
```hcl
|
|
telemetry {
|
|
publish_allocation_metrics = true
|
|
publish_node_metrics = true
|
|
}
|
|
```
|
|
|
|
This section of the documentation only covers the configuration options for
|
|
`telemetry` block. To understand the architecture and metrics themselves,
|
|
please see the [Telemetry guide](/nomad/docs/operations/monitoring-nomad).
|
|
|
|
## `telemetry` Parameters
|
|
|
|
Due to the number of configurable parameters to the `telemetry` block,
|
|
parameters on this page are grouped by the telemetry provider.
|
|
|
|
### Common
|
|
|
|
The following options are available on all telemetry configurations.
|
|
|
|
- `in_memory_collection_interval` `(duration: 10s)` Configures the in-memory
|
|
sink collection interval. This sink is always configured and backs the JSON
|
|
metrics API endpoint. This option is particularly useful for debugging or
|
|
development purposes, where aggressive collection is required.
|
|
|
|
- `in_memory_retention_period` `(duration: 1m)` Configures the in-memory sink
|
|
retention period. This sink is always configured and backs the JSON metrics
|
|
API endpoint. This option is particularly useful for debugging or development
|
|
purposes.
|
|
|
|
- `disable_hostname` `(bool: false)` - Specifies if gauge values should be
|
|
prefixed with the local hostname.
|
|
|
|
- `collection_interval` `(duration: 1s)` - Specifies the time interval at which
|
|
the Nomad agent collects telemetry data. For metrics tools that scrape metrics
|
|
(for example, Prometheus), you should ensure this value is less than or equal
|
|
to the value of any scrape interval.
|
|
|
|
- `use_node_name` `(bool: false)` - Specifies if gauge values should be
|
|
prefixed with the name of the node, instead of the hostname. If set it will
|
|
override [disable_hostname](#disable_hostname) value.
|
|
|
|
- `publish_allocation_metrics` `(bool: false)` - Specifies if Nomad should
|
|
publish runtime metrics of allocations.
|
|
|
|
- `include_alloc_metadata_in_metrics` `(bool: false)` - This controls whether
|
|
allocation metadata is included in metric labels. Enabling this option may result in
|
|
high cardinality labels. You should also configure [allowed_metadata_keys_in_metrics](#allowed_metadata_keys_in_metrics).
|
|
|
|
- `allowed_metadata_keys_in_metrics` `(list: [])` - This filters the metadata
|
|
keys to be included in the metric publishing. By default it does not filter
|
|
out any keys and thus include all metadata.
|
|
|
|
- `publish_node_metrics` `(bool: false)` - Specifies if Nomad should publish
|
|
runtime metrics of nodes.
|
|
|
|
- `filter_default` `(bool: true)` - This controls whether to allow metrics that
|
|
have not been specified by the filter. Defaults to true, which will allow all
|
|
metrics when no filters are provided. When set to false with no filters, no
|
|
metrics will be sent.
|
|
|
|
- `prefix_filter` `(list: [])` - This is a list of filter rules to apply for
|
|
allowing/blocking metrics by prefix. A leading "<b>+</b>" will enable any
|
|
metrics with the given prefix, and a leading "<b>-</b>" will block them. If
|
|
there is overlap between two rules, the more specific rule will take
|
|
precedence. Blocking will take priority if the same prefix is listed multiple
|
|
times.
|
|
|
|
```python
|
|
['-nomad.raft', '+nomad.raft.apply', '-nomad.memberlist']
|
|
```
|
|
|
|
- `disable_dispatched_job_summary_metrics` `(bool: false)` - Specifies if Nomad
|
|
should ignore jobs dispatched from a parameterized job when publishing job
|
|
summary statistics. Since each job has a small memory overhead for tracking
|
|
summary statistics, it is sometimes desired to trade these statistics for
|
|
more memory when dispatching high volumes of jobs.
|
|
|
|
- `disable_quota_utilization_metrics` `(bool: false)` - Specifies if Nomad
|
|
should publish metrics about quota utilization (a Nomad Enterprise feature).
|
|
Since each quota utilization check requires a relatively expensive check
|
|
against Nomad's state store, users with many namespaces and many quotas may
|
|
want to disable these metrics.
|
|
|
|
- `disable_allocation_hook_metrics` `(bool: false)` - Specifies if the Nomad
|
|
client should publish metrics related to allocation and task hooks.
|
|
|
|
### `statsite`
|
|
|
|
These `telemetry` parameters apply to
|
|
[statsite](https://github.com/armon/statsite).
|
|
|
|
- `statsite_address` `(string: "")` - Specifies the address of a statsite server
|
|
to forward metrics data to.
|
|
|
|
```hcl
|
|
telemetry {
|
|
statsite_address = "statsite.company.local:8125"
|
|
}
|
|
```
|
|
|
|
### `statsd`
|
|
|
|
These `telemetry` parameters apply to
|
|
[statsd](https://github.com/etsy/statsd).
|
|
|
|
- `statsd_address` `(string: "")` - Specifies the address of a statsd server to
|
|
forward metrics to.
|
|
|
|
```hcl
|
|
telemetry {
|
|
statsd_address = "statsd.company.local:8125"
|
|
}
|
|
```
|
|
|
|
### `datadog`
|
|
|
|
These `telemetry` parameters apply to
|
|
[DataDog statsd](https://github.com/DataDog/datadog-agent).
|
|
|
|
- `datadog_address` `(string: "")` - Specifies the address of a DataDog statsd
|
|
server to forward metrics to.
|
|
|
|
- `datadog_tags` `(list: [])` - Specifies a list of global tags that will be
|
|
added to all telemetry packets sent to DogStatsD. It is a list of strings,
|
|
where each string looks like "my_tag_name:my_tag_value".
|
|
|
|
```hcl
|
|
telemetry {
|
|
datadog_address = "dogstatsd.company.local:8125"
|
|
datadog_tags = ["my_tag_name:my_tag_value"]
|
|
}
|
|
```
|
|
|
|
### `prometheus`
|
|
|
|
These `telemetry` parameters apply to [Prometheus](https://prometheus.io).
|
|
|
|
- `prometheus_metrics` `(bool: false)` - Specifies whether the agent should
|
|
make Prometheus formatted metrics available at `/v1/metrics?format=prometheus`.
|
|
|
|
### `circonus`
|
|
|
|
These `telemetry` parameters apply to
|
|
[Circonus](http://circonus.com/).
|
|
|
|
- `circonus_api_token` `(string: "")` - Specifies a valid Circonus API Token
|
|
used to create/manage check. If provided, metric management is enabled.
|
|
|
|
- `circonus_api_app` `(string: "nomad")` - Specifies a valid app name associated
|
|
with the API token.
|
|
|
|
- `circonus_api_url` `(string: "https://api.circonus.com/v2")` - Specifies the
|
|
base URL to use for contacting the Circonus API.
|
|
|
|
- `circonus_submission_interval` `(string: "10s")` - Specifies the interval at
|
|
which metrics are submitted to Circonus.
|
|
|
|
- `circonus_submission_url` `(string: "")` - Specifies the
|
|
`check.config.submission_url` field, of a Check API object, from a previously
|
|
created HTTPTRAP check.
|
|
|
|
- `circonus_check_id` `(string: "")` - Specifies the Check ID (**not check
|
|
bundle**) from a previously created HTTPTRAP check. The numeric portion of the
|
|
`check._cid` field in the Check API object.
|
|
|
|
- `circonus_check_force_metric_activation` `(bool: false)` - Specifies if force
|
|
activation of metrics which already exist and are not currently active. If
|
|
check management is enabled, the default behavior is to add new metrics as
|
|
they are encountered. If the metric already exists in the check, it will
|
|
not be activated. This setting overrides that behavior.
|
|
|
|
- `circonus_check_instance_id` `(string: "<hostname>:<application>")` - Serves
|
|
to uniquely identify the metrics coming from this _instance_. It can be used
|
|
to maintain metric continuity with transient or ephemeral instances as they
|
|
move around within an infrastructure. By default, this is set to
|
|
hostname:application name (e.g. "host123:nomad").
|
|
|
|
- `circonus_check_search_tag` `(string: <service>:<application>)` - Specifies a
|
|
special tag which, when coupled with the instance id, helps to narrow down the
|
|
search results when neither a Submission URL or Check ID is provided. By
|
|
default, this is set to service:app (e.g. "service:nomad").
|
|
|
|
- `circonus_check_display_name` `(string: "")` - Specifies a name to give a
|
|
check when it is created. This name is displayed in the Circonus UI Checks
|
|
list.
|
|
|
|
- `circonus_check_tags` `(string: "")` - Comma separated list of additional
|
|
tags to add to a check when it is created.
|
|
|
|
- `circonus_broker_id` `(string: "")` - Specifies the ID of a specific Circonus
|
|
Broker to use when creating a new check. The numeric portion of `broker._cid`
|
|
field in a Broker API object. If metric management is enabled and neither a
|
|
Submission URL nor Check ID is provided, an attempt will be made to search for
|
|
an existing check using Instance ID and Search Tag. If one is not found, a new
|
|
HTTPTRAP check will be created. By default, this is a random
|
|
Enterprise Broker is selected, or, the default Circonus Public Broker.
|
|
|
|
- `circonus_broker_select_tag` `(string: "")` - Specifies a special tag which
|
|
will be used to select a Circonus Broker when a Broker ID is not provided. The
|
|
best use of this is to as a hint for which broker should be used based on
|
|
_where_ this particular instance is running (e.g. a specific geographic location or
|
|
datacenter, dc:sfo).
|