Commit Graph

870 Commits

Author SHA1 Message Date
Mahmood Ali
93bbe181e0 address review feedback 2020-10-22 11:49:37 -04:00
Mahmood Ali
4d76e8bd79 Don't parse the server-set fields of the job struct 2020-10-22 08:18:57 -04:00
Mahmood Ali
9fe7423403 api: update /render api to parse hclv2 2020-10-21 15:46:57 -04:00
Mahmood Ali
922b671719 api: parse service gateway name
Adding gateway name eases HCLv2 parsing. This field is only used for parsing the
job and is ignored for any other pruposes
2020-10-21 14:05:46 -04:00
Mahmood Ali
589a9e995d Tag Job spec with HCLv2 tags 2020-10-21 14:05:46 -04:00
Michael Schurter
116b2b8b35 Merge pull request #9055 from hashicorp/f-9017-resources
api: add field filters to /v1/{allocations,nodes}
2020-10-14 14:49:39 -07:00
Dave May
71a022ad8c Metrics gotemplate support, debug bundle features (#9067)
* add goroutine text profiles to nomad operator debug

* add server-id=all to nomad operator debug

* fix bug from changing metrics from string to []byte

* Add function to return MetricsSummary struct, metrics gotemplate support

* fix bug resolving 'server-id=all' when no servers are available

* add url to operator_debug tests

* removed test section which is used for future operator_debug.go changes

* separate metrics from operator, use only structs from go-metrics

* ensure parent directories are created as needed

* add suggested comments for text debug pprof

* move check down to where it is used

* add WaitForFiles helper function to wait for multiple files to exist

* compact metrics check

Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>

* fix github's silly apply suggestion

Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>
2020-10-14 15:16:10 -04:00
Drew Bailey
3c15f41411 filter on additional filter keys, remove switch statement duplication
properly wire up durable event count

move newline responsibility

moves newline creation from NDJson to the http handler, json stream only encodes and sends now

ignore snapshot restore if broker is disabled

enable dev mode to access event steam without acl

use mapping instead of switch

use pointers for config sizes, remove unused ttl, simplify closed conn logic
2020-10-14 14:14:33 -04:00
Michael Schurter
a55f46e9ba api: add field filters to /v1/{allocations,nodes}
Fixes #9017

The ?resources=true query parameter includes resources in the object
stub listings. Specifically:

- For `/v1/nodes?resources=true` both the `NodeResources` and
  `ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
  included.

The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
2020-10-14 10:35:22 -07:00
Drew Bailey
39ef3263ca Add EvictCallbackFn to handle removing entries from go-memdb when they
are removed from the event buffer.

Wire up event buffer size config, use pointers for structs.Events
instead of copying.
2020-10-14 12:44:42 -04:00
Drew Bailey
1288b18b27 rehydrate event publisher on snapshot restore
address pr feedback
2020-10-14 12:44:41 -04:00
Drew Bailey
8a57ee85f0 api comments 2020-10-14 12:44:38 -04:00
Drew Bailey
4f97bf8ef7 Events/eval alloc events (#9012)
* generic eval update event

first pass at alloc client update events

* api/event client
2020-10-14 12:44:37 -04:00
Dave May
abfcb10626 Merge pull request #9034 from hashicorp/dmay-debug-metrics
Add metrics command / output to debug bundle
2020-10-06 11:47:09 -04:00
davemay99
ff1578f0f3 added comment to operator metrics function 2020-10-06 11:22:10 -04:00
davemay99
ec09c593b5 metrics return bytes instead of string for more flexibility 2020-10-06 10:49:15 -04:00
davemay99
bf8bdc94f8 Add metrics command / output to debug bundle 2020-10-05 22:30:01 -04:00
Chris Baker
5062e74b3e updated api tests wrt backwards compat on null chars in IDs 2020-10-05 18:01:50 +00:00
Michael Schurter
902b0b5673 jobspec: lower min cpu resources from 10->1
Since CPU resources are usually a soft limit it is desirable to allow
setting it as low as possible to allow tasks to run only in "idle" time.

Setting it to 0 is still not allowed to avoid potential unintentional
side effects with allowing a zero value. While there may not be any side
effects this commit attempts to minimize risk by avoiding the issue.

This does *not* change the defaults.
2020-09-30 12:15:13 -07:00
Luiz Aoqui
5a48a8d725 add scaling policy type 2020-09-29 17:57:46 -04:00
Mahmood Ali
679fee900c api: target servers for allocation requests (#8897)
Allocation requests should target servers, which then can forward the
request to the appropriate clients.

Contacting clients directly is fragile and prune to failures: e.g.
clients maybe firewalled and not accessible from the API client, or have
some internal certificates not trusted by the API client.

FWIW, in contexts where we anticipate lots of traffic (e.g. logs, or
exec), the api package attempts contacting the client directly but then
fallsback to using the server. This approach seems excessive in these
simple GET/PUT requests.

Fixes #8894
2020-09-16 09:34:17 -04:00
Benjamin Buzbee
648140a727 Add API support for cancelation contexts passed via QueryOptions and WriteOptions (#8836)
Copy Consul API's format: QueryOptions.WithContext(context) will now return
a new QueryOption whose HTTP requests will be canceled with the context
provided (and similar for WriteOptions)
2020-09-09 16:22:07 -04:00
Jasmine Dahilig
8faece3bd7 Merge pull request #8390 from hashicorp/lifecycle-poststart-hook
task lifecycle poststart hook
2020-08-31 13:53:24 -07:00
Tim Gross
5a01ab312d MRD: move 'job stop -global' handling into RPC (#8776)
The initial implementation of global job stop for MRD looped over all the
regions in the CLI for expedience. This changeset includes the OSS parts of
moving this into the RPC layer so that API consumers don't have to implement
this logic themselves.
2020-08-28 14:28:13 -04:00
Lang Martin
dd7016b847 csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699)
* nomad/structs/csi: add explicit job support
* nomad/state/state_store: capture job updates directly
* api/nodes: CSIInfo needs the AllocID
* command/agent/csi_endpoint: AllocID was missing
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-08-27 17:20:00 -04:00
Tim Gross
63db66b1c4 tests: allow API test server to respect NOMAD_TEST_LOG_LEVEL (#8760) 2020-08-27 14:36:17 -04:00
Seth Hoenig
36a743f19d consul/connect: remove envoy dns option from gateway proxy config 2020-08-24 09:11:55 -05:00
Seth Hoenig
9ffdeed904 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
James Rasell
9a536d26b1 Merge pull request #8636 from hashicorp/f-gh-8142
api: add node purge SDK function.
2020-08-17 09:45:54 +02:00
James Rasell
38271df0f3 api: add node purge SDK function. 2020-08-14 08:40:03 +01:00
Lang Martin
8a095fca90 CSI: volume and plugin allocations in the API (#8590)
* command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints
2020-08-11 12:24:41 -04:00
Tim Gross
fbefdb98c3 csi: nomad volume detach command (#8584)
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.

The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
2020-08-11 10:18:54 -04:00
Seth Hoenig
e664f9b69a consul: able to set pass/fail thresholds on consul service checks
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
  https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical

Nomad doesn't do much besides pass the fields through to Consul.

Fixes #6913
2020-08-10 14:08:09 -05:00
Drew Bailey
940817aeb5 Merge pull request #8453 from hashicorp/oss-multi-vault-ns
oss compoments for multi-vault namespaces
2020-07-27 08:45:22 -04:00
Drew Bailey
19810365f6 oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
James Rasell
f34530b997 api: add namespace to scaling status GET response object. 2020-07-24 11:19:25 +02:00
Mahmood Ali
c5b2895b0b Fix pro tags 2020-07-17 11:02:00 -04:00
Jasmine Dahilig
e1edb29675 add poststart hook to task hook coordinator & structs 2020-07-08 11:01:35 -07:00
Tim Gross
8f98ff2da6 fix volume deregister -force params in API (#8380)
The CSI `volume deregister -force` flag was using the documented parameter
`force` everywhere except in the API, where it was incorrectly passing `purge`
as the query parameter.
2020-07-08 10:08:22 -04:00
Chris Baker
15a66e60a8 fixed api tests for changes 2020-07-04 19:23:58 +00:00
Chris Baker
7f8176a188 changes to make sure that Max is present and valid, to improve error messages
* made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected
* added checks to HCL parsing that it is present
* when Scaling.Max is absent/invalid, don't return extraneous error messages during validation
* tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs

resolves #8355
2020-07-04 19:05:50 +00:00
Lang Martin
bde973e366 api: nomad debug new /agent/host (#8325)
* command/agent/host: collect host data, multi platform

* nomad/structs/structs: new HostDataRequest/Response

* client/agent_endpoint: add RPC endpoint

* command/agent/agent_endpoint: add Host

* api/agent: add the Host endpoint

* nomad/client_agent_endpoint: add Agent Host with forwarding

* nomad/client_agent_endpoint: use findClientConn

This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.

* command/debug: call agent hosts

* command/agent/host: eliminate calling external programs
2020-07-02 09:51:25 -04:00
Tim Gross
95799663b8 csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Nick Ethier
7c4bff9549 command: correctly show host IP in ports output /w multi-host networks (#8289) 2020-06-25 15:16:01 -04:00
Seth Hoenig
520d35e085 consul/connect: split connect native flag and task in service 2020-06-23 10:22:22 -05:00
Seth Hoenig
7e8d5c2392 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00
Michael Schurter
6886edd1fc Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00
Nick Ethier
18ed6a7a85 test: fix up testing around host networks 2020-06-19 13:53:31 -04:00
Drew Bailey
ef7b7b2a9b allow raw body instead of JSON encoded string (#8211) 2020-06-19 10:57:09 -04:00
Nick Ethier
ad8ced3873 multi-interface network support 2020-06-19 09:42:10 -04:00