Commit Graph

99 Commits

Author SHA1 Message Date
Derek Strickland
696deb9600 Add Nomad RetryConfig to agent template config (#13907)
* add Nomad RetryConfig to agent template config
2022-08-03 16:56:30 -04:00
Luiz Aoqui
d456cc1e7f Track plan rejection history and automatically mark clients as ineligible (#13421)
Plan rejections occur when the scheduler work and the leader plan
applier disagree on the feasibility of a plan. This may happen for valid
reasons: since Nomad does parallel scheduling, it is expected that
different workers will have a different state when computing placements.

As the final plan reaches the leader plan applier, it may no longer be
valid due to a concurrent scheduling taking up intended resources. In
these situations the plan applier will notify the worker that the plan
was rejected and that they should refresh their state before trying
again.

In some rare and unexpected circumstances it has been observed that
workers will repeatedly submit the same plan, even if they are always
rejected.

While the root cause is still unknown this mitigation has been put in
place. The plan applier will now track the history of plan rejections
per client and include in the plan result a list of node IDs that should
be set as ineligible if the number of rejections in a given time window
crosses a certain threshold. The window size and threshold value can be
adjusted in the server configuration.

To avoid marking several nodes as ineligible at one, the operation is rate
limited to 5 nodes every 30min, with an initial burst of 10 operations.
2022-07-12 18:40:20 -04:00
Derek Strickland
43edd0e709 Expose Consul template configuration parameters (#11606)
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.

- `wait`

It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.

- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry` 
- `wait` 

Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.

- `wait_bounds`

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-01-10 10:19:07 -05:00
Matt Mukerjee
0881b94201 Add FailoverHeartbeatTTL to config (#11127)
FailoverHeartbeatTTL is the amount of time to wait after a server leader failure
before considering reallocating client tasks. This TTL should be fairly long as
the new server leader needs to rebuild the entire heartbeat map for the
cluster. In deployments with a small number of machines, the default TTL (5m)
may be unnecessary long. Let's allow operators to configure this value in their
config files.
2021-10-06 18:48:12 -04:00
Nick Ethier
ad8ced3873 multi-interface network support 2020-06-19 09:42:10 -04:00
Drew Bailey
b0fc071026 fix compilation with correct func 2020-03-23 14:32:11 -04:00
Tim Gross
d23eaed85b Merge pull request #7012 from hashicorp/f-csi-volumes
Container Storage Interface Support
2020-03-23 14:19:46 -04:00
Lang Martin
aea212d34d csi: CLI for volume status, registration/deregistration and plugin status (#7193)
* command/csi: csi, csi_plugin, csi_volume

* helper/funcs: move ExtraKeys from parse_config to UnusedKeys

* command/agent/config_parse: use helper.UnusedKeys

* api/csi: annotate CSIVolumes with hcl fields

* command/csi_plugin: add Synopsis

* command/csi_volume_register: use hcl.Decode style parsing

* command/csi_volume_list

* command/csi_volume_status: list format, cleanup

* command/csi_plugin_list

* command/csi_plugin_status

* command/csi_volume_deregister

* command/csi_volume: add Synopsis

* api/contexts/contexts: add csi search contexts to the constants

* command/commands: register csi commands

* api/csi: fix struct tag for linter

* command/csi_plugin_list: unused struct vars

* command/csi_plugin_status: unused struct vars

* command/csi_volume_list: unused struct vars

* api/csi: add allocs to CSIPlugin

* command/csi_plugin_status: format the allocs

* api/allocations: copy Allocation.Stub in from structs

* nomad/client_rpc: add some error context with Errorf

* api/csi: collapse read & write alloc maps to a stub list

* command/csi_volume_status: cleanup allocation display

* command/csi_volume_list: use Schedulable instead of Healthy

* command/csi_volume_status: use Schedulable instead of Healthy

* command/csi_volume_list: sprintf string

* command/csi: delete csi.go, csi_plugin.go

* command/plugin: refactor csi components to sub-command plugin status

* command/plugin: remove csi

* command/plugin_status: remove csi

* command/volume: remove csi

* command/volume_status: split out csi specific

* helper/funcs: add RemoveEqualFold

* command/agent/config_parse: use helper.RemoveEqualFold

* api/csi: do ,unusedKeys right

* command/volume: refactor csi components to `nomad volume`

* command/volume_register: split out csi specific

* command/commands: use the new top level commands

* command/volume_deregister: hardwired type csi for now

* command/volume_status: csiFormatVolumes rescued from volume_list

* command/plugin_status: avoid a panic on no args

* command/volume_status: avoid a panic on no args

* command/plugin_status: predictVolumeType

* command/volume_status: predictVolumeType

* nomad/csi_endpoint_test: move CreateTestPlugin to testing

* command/plugin_status_test: use CreateTestCSIPlugin

* nomad/structs/structs: add CSIPlugins and CSIVolumes search consts

* nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix

* nomad/search_endpoint: add CSIPlugins and CSIVolumes

* command/plugin_status: move the header to the csi specific

* command/volume_status: move the header to the csi specific

* nomad/state/state_store: CSIPluginByID prefix

* command/status: rename the search context to just Plugins/Volumes

* command/plugin,volume_status: test return ids now

* command/status: rename the search context to just Plugins/Volumes

* command/plugin_status: support -json and -t

* command/volume_status: support -json and -t

* command/plugin_status_csi: comments

* command/*_status: clean up text

* api/csi: fix stale comments

* command/volume: make deregister sound less fearsome

* command/plugin_status: set the id length

* command/plugin_status_csi: more compact plugin health

* command/volume: better error message, comment
2020-03-23 13:58:30 -04:00
Drew Bailey
ae5777c4ea Audit config, seams for enterprise audit features
allow oss to parse sink duration

clean up audit sink parsing

ent eventer config reload

fix typo

SetEnabled to eventer interface

client acl test

rm dead code

fix failing test
2020-03-23 13:47:42 -04:00
Danielle Lancashire
6527c38f99 clientconfig: Fix parsing multiple host volumes 2019-08-21 22:19:58 +02:00
Danielle Lancashire
86b4296f9d client: Add parsing and registration of HostVolume configuration 2019-08-12 15:39:08 +02:00
Lang Martin
cedd2ba17f config_parse get rid of ParseConfigDefault 2019-06-11 22:00:23 -04:00
Lang Martin
5a3a47c7a4 config_parse split out defaults from ParseConfig 2019-06-11 15:42:27 -04:00
Lang Martin
d1dbbe868e config_parse leave the *HCL strings in place after converting times 2019-04-30 10:30:53 -04:00
Lang Martin
41bfd27df1 config_parse remove unused multi-stage parsing via mapstructure 2019-04-30 10:29:14 -04:00
Lang Martin
bac0d5f0ed config_parse add new ParseConfigFileDirectHCL
- parse by using hcl.Decode directly
- handle time.Duration strings in a second pass
- report unexpected keys in a third pass
2019-04-30 10:29:14 -04:00
Chris Baker
5db81957ff wip: added config parsing support, CLI flag, still need more testing, VAULT_ var, documentation 2019-04-10 10:34:10 -05:00
Preetha Appan
ff1bac9f48 Address review comments 2019-03-29 08:57:49 -05:00
Preetha Appan
7490af2194 fix linting 2019-03-28 18:01:40 -05:00
Preetha Appan
1f88ccec55 Fix json parsing bug with plugins that don't provide args
This fixes a bug with JSON agent configuration parsing where the AST
for the plugin stanza had unnecessary flattening originating from hcl parsing
library. The workaround fixes the AST by popping off the flattened element and wrapping
it in a list. The workaround comes from similar code in terraform.

There were no existing test cases for json parsing so I added a few.
2019-03-28 16:33:30 -05:00
Alex Dadgar
f171a723cb Enable json logs 2019-01-11 11:36:37 -08:00
Alex Dadgar
0953d913ed Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Nick Ethier
af3f535f0a agent: suppose filter_default telemetry option 2018-11-19 23:21:48 -05:00
Nick Ethier
4182e3e141 nomad: add flag to disable publishing of job_summary metrics for dispatched jobs 2018-11-19 23:21:19 -05:00
Alex Dadgar
49c2d4f775 Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Alex Dadgar
9c6de4bb20 plugin dir parsing 2018-08-30 13:43:09 -07:00
Alex Dadgar
94a21a44e0 Plugin config parsing 2018-08-29 17:06:01 -07:00
Chelsea Holland Komlo
bfaf4dcb2b change function signature to take entire tls config object 2018-08-10 12:37:21 -04:00
Chelsea Holland Komlo
02b89ae0f4 update server_join naming and improve logging 2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo
1a854c444e add server join info to server and client 2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo
25896ddf3c add support for tls PreferServerCipherSuites
add further tests for tls configuration
2018-05-25 13:20:00 -04:00
Chelsea Holland Komlo
509180ee00 add support for configurable TLS minimum version 2018-05-09 18:07:12 -04:00
Chelsea Holland Komlo
0f46208cc1 allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Mildred Ki'Lya
d31105c69e Allow to specify total memory on agent configuration
Allow to set the total memory of an agent in its configuration file. This
can be used in case the automatic detection doesn't work or in specific
environments when memory overcommit (using swap for example) can be
desirable.
2018-03-27 15:46:18 -05:00
James Rasell
dda9207b06 Update Consul check params from using health-check to check. 2018-03-20 16:03:58 +01:00
James Rasell
2439310951 Allow Nomads Consul health checks to be configurable.
This change allows the client HTTP and the server HTTP, Serf and
RPC health check names within Consul to be configurable with the
defaults as previous. The configuration can be done via either a
config file or using CLI flags.

Closes #3988
2018-03-19 19:37:56 +01:00
Mahmood Ali
abfae77545 Add tags option to datadog telemetry
Expose an global tags option in telemetry config for dogstatsd, for
purposes of distinguishing between multiple nomad cluster metrics.
2018-02-06 12:08:37 -05:00
Kyle Havlovitz
2c873adba4 Refactor redundancy_zone/upgrade_version out of client meta 2018-01-29 20:03:38 -08:00
Kyle Havlovitz
c2d0c11f9e Add autopilot functionality based on Consul's autopilot 2017-12-18 14:29:41 -08:00
Chelsea Komlo
fa9fd4422c Nomad agent reload TLS configuration on SIGHUP (#3479)
* Allow server TLS configuration to be reloaded via SIGHUP

* dynamic tls reloading for nomad agents

* code cleanup and refactoring

* ensure keyloader is initialized, add comments

* allow downgrading from TLS

* initalize keyloader if necessary

* integration test for tls reload

* fix up test to assert success on reloaded TLS configuration

* failure in loading a new TLS config should remain at current

Reload only the config if agent is already using TLS

* reload agent configuration before specific server/client

lock keyloader before loading/caching a new certificate

* introduce a get-or-set method for keyloader

* fixups from code review

* fix up linting errors

* fixups from code review

* add lock for config updates; improve copy of tls config

* GetCertificate only reloads certificates dynamically for the server

* config updates/copies should be on agent

* improve http integration test

* simplify agent reloading storing a local copy of config

* reuse the same keyloader when reloading

* Test that server and client get reloaded but keep keyloader

* Keyloader exposes GetClientCertificate as well for outgoing connections

* Fix spelling

* correct changelog style
2017-11-14 17:53:23 -08:00
Alex Dadgar
53dbc4f127 remove atlas 2017-11-02 11:27:21 -07:00
Chelsea Holland Komlo
5e85b5a090 add rpc_upgrade_mode as config option for tls upgrades 2017-11-01 15:19:52 -05:00
Alex Dadgar
f6fbb36054 sync 2017-10-13 14:36:02 -07:00
Alex Dadgar
ddc2efa4ac sync 2017-09-19 10:08:23 -05:00
Chelsea Holland Komlo
59a891cb27 enabling prometheus metrics should be a config option 2017-09-13 19:21:21 +00:00
Chelsea Holland Komlo
1df5310c6e tagged metrics config options should be on telemetry config
better api example, add telemetry documentation
2017-09-06 15:25:36 +00:00
Chelsea Holland Komlo
a1e81abf40 parse config for metrics fields 2017-09-05 14:13:34 +00:00
Armon Dadgar
44fe0afc9f Passthrough replication token for token/policy replication 2017-09-04 13:05:53 -07:00
Armon Dadgar
9fdea05804 agent: Adding ACL block configuration 2017-09-04 13:04:45 -07:00
Alex Dadgar
bb45b95bc4 Allow tuning of heartbeat ttls
This PR allows tuning of heartbeat TTLs. An example of very aggressive
settings is as follows:

```
server {
  heartbeat_grace = "1s"
  min_heartbeat_ttl = "1s"
  max_heartbeats_per_second = 200.0
}
```
2017-07-19 09:38:35 -07:00