Commit Graph

991 Commits

Author SHA1 Message Date
Seth Hoenig
5ef844c827 deps: update api go version and dependencies
This PR sets the minimum Go version for the `api` submodule to Go 1.17.

It also upgrades
 - gorilla/websocket 1.4.1 -> 1.4.2
 - mitchelh/mapstructure 1.4.2 -> 1.4.3
 - stretchr/testify 1.5.1 -> 1.7.0

Closes #11518 #11602 #11528
2022-01-24 12:23:26 -06:00
Seth Hoenig
e2ab16847d deps: adjust to gzip handler zero length response body
After swapping gzip handler to use the gorilla library, we
must account for a quirk in how zero/minimal length response
bodies are delivered.

The previous gzip handler was configured to compress all responses
regardless of size - even if the data was zero length or below the
network MTU. This behavior changed in [v1.1.0](c551b6c3b4 (diff-de723e6602cc2f16f7a9d85fd89d69954edc12a49134dab8901b10ee06d1879d))
which is why we could not upgrade.

The Nomad HTTP Client mutates the http.Response.Body object, making
a strong assumption that if the Content-Encoding header is set to "gzip",
the response will be readable via gzip decoder. This is no longer true
for the nytimes gzip handler, and is also not true for the gorilla gzip
handler.

It seems in practice this only makes a difference on the /v1/operator/license
endpoint which returns an empty response in OSS Nomad.

The fix here is to simply not wrap the response body reader if we
encounter an io.EOF while creating the gzip reader - indicating there
is no data to decode.
2022-01-19 11:52:19 -06:00
Seth Hoenig
4b20581dc5 cleanup: stop referencing depreceted HeaderMap field
Remove reference to the deprecated ResponseRecorder.HeaderMap field,
instead calling .Response.Header() to get the same data.

closes #10520
2022-01-12 10:32:54 -06:00
Derek Strickland
43edd0e709 Expose Consul template configuration parameters (#11606)
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.

- `wait`

It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.

- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry` 
- `wait` 

Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.

- `wait_bounds`

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-01-10 10:19:07 -05:00
Charlie Voiselle
6e61606eba Make number of scheduler workers reloadable (#11593)
## Development Environment Changes
* Added stringer to build deps

## New HTTP APIs
* Added scheduler worker config API
* Added scheduler worker info API

## New Internals
* (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume()
* Update shutdown to use context
* Add mutex for contended server data
    - `workerLock` for the `workers` slice
    - `workerConfigLock` for the `Server.Config.NumSchedulers` and
      `Server.Config.EnabledSchedulers` values

## Other
* Adding docs for scheduler worker api
* Add changelog message

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-01-06 11:56:13 -05:00
Tim Gross
072d3b6b74 cli: ensure -stale flag is respected by nomad operator debug (#11678)
When a cluster doesn't have a leader, the `nomad operator debug`
command can safely use stale queries to gracefully degrade the
consistency of almost all its queries. The query parameter for these
API calls was not being set by the command.

Some `api` package queries do not include `QueryOptions` because
they target a specific agent, but they can potentially be forwarded to
other agents. If there is no leader, these forwarded queries will
fail. Provide methods to call these APIs with `QueryOptions`.
2021-12-15 10:44:03 -05:00
Luiz Aoqui
86e2dd718c api: return error when LicenseGet status is not 200 (#11644) 2021-12-14 19:47:09 -05:00
Tim Gross
35c22bcb6c provide -no-shutdown-delay flag for job/alloc stop (#11596)
Some operators use very long group/task `shutdown_delay` settings to
safely drain network connections to their workloads after service
deregistration. But during incident response, they may want to cause
that drain to be skipped so they can quickly shed load.

Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and
`nomad job stop` commands that bypasses the delay. This sets a new
desired transition state on the affected allocations that the
allocation/task runner will identify during pre-kill on the client.

Note (as documented here) that using this flag will almost always
result in failed inbound network connections for workloads as the
tasks will exit before clients receive updated service discovery
information and won't be gracefully drained.
2021-12-13 14:54:53 -05:00
Tim Gross
972708aaed evaluations list pagination and filtering (#11648)
API queries can request pagination using the `NextToken` and `PerPage`
fields of `QueryOptions`, when supported by the underlying API.

Add a `NextToken` field to the `structs.QueryMeta` so that we have a
common field across RPCs to tell the caller where to resume paging
from on their next API call. Include this field on the `api.QueryMeta`
as well so that it's available for future versions of List HTTP APIs
that wrap the response with `QueryMeta` rather than returning a simple
list of structs. In the meantime callers can get the `X-Nomad-NextToken`.

Add pagination to the `Eval.List` RPC by checking for pagination token
and page size in `QueryOptions`. This will allow resuming from the
last ID seen so long as the query parameters and the state store
itself are unchanged between requests.

Add filtering by job ID or evaluation status over the results we get
out of the state store.

Parse the query parameters of the `Eval.List` API into the arguments
expected for filtering in the RPC call.
2021-12-10 13:43:03 -05:00
Tim Gross
2c3db7ee1d scheduler: config option to reject job registration (#11610)
During incident response, operators may find that automated processes
elsewhere in the organization can be generating new workloads on Nomad
clusters that are unable to handle the workload. This changeset adds a
field to the `SchedulerConfiguration` API that causes all job
registration calls to be rejected unless the request has a management
ACL token.
2021-12-06 15:20:34 -05:00
James Rasell
80dcae7216 core: allow setting and propagation of eval priority on job de/registration (#11532)
This change modifies the Nomad job register and deregister RPCs to
accept an updated option set which includes eval priority. This
param is optional and override the use of the job priority to set
the eval priority.

In order to ensure all evaluations as a result of the request use
the same eval priority, the priority is shared to the
allocReconciler and deploymentWatcher. This creates a new
distinction between eval priority and job priority.

The Nomad agent HTTP API has been modified to allow setting the
eval priority on job update and delete. To keep consistency with
the current v1 API, job update accepts this as a payload param;
job delete accepts this as a query param.

Any user supplied value is validated within the agent HTTP handler
removing the need to pass invalid requests to the server.

The register and deregister opts functions now all for setting
the eval priority on requests.

The change includes a small change to the DeregisterOpts function
which handles nil opts. This brings the function inline with the
RegisterOpts.
2021-11-23 09:23:31 +01:00
dependabot[bot]
9088bc3df0 build(deps): bump github.com/hashicorp/cronexpr from 1.1.0 to 1.1.1 in /api (#11132)
* build(deps): bump github.com/hashicorp/cronexpr in /api

Bumps [github.com/hashicorp/cronexpr](https://github.com/hashicorp/cronexpr) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/hashicorp/cronexpr/releases)
- [Commits](https://github.com/hashicorp/cronexpr/compare/v1.1.0...v1.1.1)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/cronexpr
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* go mod tidy

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim Gross <tim@0x74696d.com>
2021-11-17 11:46:48 -05:00
dependabot[bot]
3e2bbc1db4 build(deps): bump github.com/kr/pretty from 0.1.0 to 0.3.0 in /api (#11135)
* build(deps): bump github.com/kr/pretty from 0.1.0 to 0.3.0 in /api

Bumps [github.com/kr/pretty](https://github.com/kr/pretty) from 0.1.0 to 0.3.0.
- [Release notes](https://github.com/kr/pretty/releases)
- [Commits](https://github.com/kr/pretty/compare/v0.1.0...v0.3.0)

---
updated-dependencies:
- dependency-name: github.com/kr/pretty
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update in core as well and tidy

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim Gross <tim@0x74696d.com>
2021-11-17 10:41:21 -05:00
dependabot[bot]
a5e9fe6ffb build(deps): bump github.com/mitchellh/mapstructure in /api (#11188)
Bumps [github.com/mitchellh/mapstructure](https://github.com/mitchellh/mapstructure) from 1.3.3 to 1.4.2.
- [Release notes](https://github.com/mitchellh/mapstructure/releases)
- [Changelog](https://github.com/mitchellh/mapstructure/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitchellh/mapstructure/compare/v1.3.3...v1.4.2)

---
updated-dependencies:
- dependency-name: github.com/mitchellh/mapstructure
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-11-17 10:04:12 -05:00
dependabot[bot]
903c3907a3 build(deps): bump github.com/hashicorp/go-cleanhttp in /api (#11133)
Bumps [github.com/hashicorp/go-cleanhttp](https://github.com/hashicorp/go-cleanhttp) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/hashicorp/go-cleanhttp/releases)
- [Commits](https://github.com/hashicorp/go-cleanhttp/compare/v0.5.1...v0.5.2)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-cleanhttp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-11-17 08:42:34 -05:00
dependabot[bot]
abfdb6a0b5 build(deps): bump github.com/mitchellh/go-testing-interface in /api (#11136)
Bumps [github.com/mitchellh/go-testing-interface](https://github.com/mitchellh/go-testing-interface) from 1.0.0 to 1.14.1.
- [Release notes](https://github.com/mitchellh/go-testing-interface/releases)
- [Commits](https://github.com/mitchellh/go-testing-interface/compare/v1.0.0...v1.14.1)

---
updated-dependencies:
- dependency-name: github.com/mitchellh/go-testing-interface
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-11-17 08:38:35 -05:00
Alessandro De Blasis
9a5248b932 cli: show host_network in nomad status (#11432)
Enhance the CLI in order to return the host network in two flavors 
(default, verbose) of the `node status` command.

Fixes: #11223.
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2021-11-05 09:02:46 -04:00
Luiz Aoqui
89447fb42e Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799)" (#11433) 2021-11-02 17:42:52 -04:00
Luiz Aoqui
8c799b3980 add dispatch idempotency token support in the CLI (#10930) 2021-10-22 12:39:05 -04:00
Charlie Voiselle
8ba714e211 Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799) 2021-10-13 21:23:13 -04:00
Michael Schurter
6a0dede9b6 Merge pull request #11167 from a-zagaevskiy/master
Support configurable dynamic port range
2021-10-13 16:47:38 -07:00
Mahmood Ali
6c414cd5f9 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Michael Schurter
aff4b47f3c api: add Node.{Min,Max}DynamicPort 2021-09-30 17:05:10 -07:00
James Rasell
e34fa583f9 allow configuration of Docker hostnames in bridge mode (#11173)
Add a new hostname string parameter to the network block which
allows operators to specify the hostname of the network namespace.
Changing this causes a destructive update to the allocation and it
is omitted if empty from API responses. This parameter also supports
interpolation.

In order to have a hostname passed as a configuration param when
creating an allocation network, the CreateNetwork func of the
DriverNetworkManager interface needs to be updated. In order to
minimize the disruption of future changes, rather than add another
string func arg, the function now accepts a request struct along with
the allocID param. The struct has the hostname as a field.

The in-tree implementations of DriverNetworkManager.CreateNetwork
have been modified to account for the function signature change.
In updating for the change, the enhancement of adding hostnames to
network namespaces has also been added to the Docker driver, whilst
the default Linux manager does not current implement it.
2021-09-16 08:13:09 +02:00
Mahmood Ali
fdb8684004 Merge pull request #9160 from hashicorp/f-sysbatch
core: implement system batch scheduler
2021-08-16 09:30:24 -04:00
Michael Schurter
734f1d7eb6 Merge pull request #10848 from ggriffiths/listsnapshot_secrets
CSI Listsnapshot secrets support
2021-08-10 15:59:33 -07:00
Seth Hoenig
61ee443ee6 core: implement system batch scheduler
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
2021-08-03 10:30:47 -04:00
Grant Griffiths
cba476eae6 CSI ListSnapshots secrets implementation
Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>
2021-07-28 11:30:29 -07:00
Mahmood Ali
d3835a1656 api: revert to defaulting to http/1 (#10958)
* api: revert to defaulting to http/1

PR #10778 incidentally changed the api http client to connect with
HTTP/2 first. However, the websocket libraries used in `alloc exec`
features don't handle http/2 well, and don't downgrade to http/1
gracefully.

Given that the switch is incidental, and not requested by users.
Furthermore, api consumers can opt-in to forcing http/2 by setting
custom http clients.

Fixes #10922
2021-07-28 11:21:53 -04:00
Mahmood Ali
0f0bcdb6ed Merge pull request #10806 from hashicorp/munda/idempotent-job-dispatch
Enforce idempotency of dispatched jobs using token on dispatch request
2021-07-08 10:23:31 -04:00
Alex Munda
2bd2f586f4 Set/parse idempotency_token query param 2021-07-07 16:26:55 -05:00
James Rasell
aa8940fb42 Merge pull request #10861 from hashicorp/f-gh-10860
api: Added `NewSystemJob` job creation helper function.
2021-07-07 16:17:15 +02:00
James Rasell
8fc2d79245 api: Added NewSystemJob job creation helper function. 2021-07-07 11:03:20 +02:00
Alex Munda
63d8e92e90 Move idempotency token to write options. Remove DispatchIdempotent 2021-06-30 15:10:48 -05:00
Holt Wilkins
898c6c59a6 Enable parsing of terminating gateways 2021-06-30 05:34:16 +00:00
Alex Munda
f607c35421 Add idempotency token to dispatch request instead of special meta key 2021-06-29 15:59:23 -05:00
Seth Hoenig
67d801b821 consul/connect: fix tests for mesh gateway mode 2021-06-04 09:31:38 -05:00
Seth Hoenig
1ad0212a34 consul/connect: use range on upstream canonicalize
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-06-04 08:55:05 -05:00
Seth Hoenig
37b49ba573 consul/connect: fix upstream mesh gateway default mode setting
This PR fixes the API to _not_ set the default mesh gateway mode. Before,
the mode would be set to "none" in Canonicalize, which is incorrect. We
should pass through the empty string so that folks can make use of Consul
service-defaults Config entries to configure the default mode.
2021-06-04 08:53:12 -05:00
Seth Hoenig
312161c5fc consul/connect: add support for connect mesh gateways
This PR implements first-class support for Nomad running Consul
Connect Mesh Gateways. Mesh gateways enable services in the Connect
mesh to make cross-DC connections via gateways, where each datacenter
may not have full node interconnectivity.

Consul docs with more information:
https://www.consul.io/docs/connect/gateways/mesh-gateway

The following group level service block can be used to establish
a Connect mesh gateway.

service {
  connect {
    gateway {
      mesh {
        // no configuration
      }
    }
  }
}

Services can make use of a mesh gateway by configuring so in their
upstream blocks, e.g.

service {
  connect {
    sidecar_service {
      proxy {
        upstreams {
          destination_name = "<service>"
          local_bind_port  = <port>
          datacenter       = "<datacenter>"
          mesh_gateway {
            mode = "<mode>"
          }
        }
      }
    }
  }
}

Typical use of a mesh gateway is to create a bridge between datacenters.
A mesh gateway should then be configured with a service port that is
mapped from a host_network configured on a WAN interface in Nomad agent
config, e.g.

client {
  host_network "public" {
    interface = "eth1"
  }
}

Create a port mapping in the group.network block for use by the mesh
gateway service from the public host_network, e.g.

network {
  mode = "bridge"
  port "mesh_wan" {
    host_network = "public"
  }
}

Use this port label for the service.port of the mesh gateway, e.g.

service {
  name = "mesh-gateway"
  port = "mesh_wan"
  connect {
    gateway {
      mesh {}
    }
  }
}

Currently Envoy is the only supported gateway implementation in Consul.
By default Nomad client will run the latest official Envoy docker image
supported by the local Consul agent. The Envoy task can be customized
by setting `meta.connect.gateway_image` in agent config or by setting
the `connect.sidecar_task` block.

Gateways require Consul 1.8.0+, enforced by the Nomad scheduler.

Closes #9446
2021-06-04 08:24:49 -05:00
Mahmood Ali
b294f6353a add a note about node connection failure and fallback 2021-05-25 14:24:24 -04:00
Mahmood Ali
a15a61759e exec: api: handle closing errors differently
refactor the api handling of `nomad exec`, and ensure that we process
all received events before handling websocket closing.

The exit code should be the last message received, and we ought to
ignore any websocket close error we receive afterwards.

Previously, we used two channels: one for websocket frames and another
for handling errors. This raised the possibility that we processed the
error before processing the frames, resulting into an "unexpected EOF"
error.
2021-05-25 11:19:42 -04:00
Chris Baker
140e7b3aaa Node Drain Metadata (#10250) 2021-05-07 13:58:40 -04:00
Mahmood Ali
d8e40600f6 Support disabling TCP checks for connect sidecar services 2021-05-07 12:10:26 -04:00
Mahmood Ali
9d86a5277d api: actually set MemoryOversubscriptionEnabled (#10493) 2021-05-02 22:53:53 -04:00
Michael Schurter
2e6eb84a57 Merge pull request #10248 from hashicorp/f-remotetask-2021
core: propagate remote task handles
2021-04-30 08:57:26 -07:00
Michael Schurter
d3d6c60e63 clarify docs from pr comments 2021-04-30 08:31:31 -07:00
Luiz Aoqui
c7114921fa Add metrics for blocked eval resources (#10454)
* add metrics for blocked eval resources

* docs: add new blocked_evals metrics

* fix to call `pruneStats` instead of `stats.prune` directly
2021-04-29 15:03:45 -04:00
Michael Schurter
d50fb2a00e core: propagate remote task handles
Add a new driver capability: RemoteTasks.

When a task is run by a driver with RemoteTasks set, its TaskHandle will
be propagated to the server in its allocation's TaskState. If the task
is replaced due to a down node or draining, its TaskHandle will be
propagated to its replacement allocation.

This allows tasks to be scheduled in remote systems whose lifecycles are
disconnected from the Nomad node's lifecycle.

See https://github.com/hashicorp/nomad-driver-ecs for an example ECS
remote task driver.
2021-04-27 15:07:03 -07:00
Seth Hoenig
549cd4f21f api: include ent fuzzy struct types in oss
Small change to pull in ent struct types in a switch
statement used by ent. They are benign in oss, this
is just to make sure OSS->ENT merges don't create a
diff.
2021-04-20 11:19:38 -06:00