The Nomad clients store their Nomad identity in memory and within
their state store. While active, it is not possible to dump the
state to view the stored identity token, so having a way to view
the current claims while running aids debugging and operations.
This change adds a client identity workflow, allowing operators
to view the current claims of the nodes identity. It does not
return any of the signing key material.
* Add MonitorExport command and handlers
* Implement autocomplete
* Require nomad in serviceName
* Fix race in StreamReader.Read
* Add and use framer.Flush() to coordinate function exit
* Add LogFile to client/Server config and read NomadLogPath in rpcHandler instead of HTTPServer
* Parameterize StreamFixed stream size
The node introduction workflow will utilise JWT's that can be used
as authentication tokens on initial client registration. This
change implements the basic builder for this JWT claim type and
the RPC and HTTP handler functionality that will expose this to
the operator.
The Nomad client will have its identity renewed according to the
TTL which defaults to 24h. In certain situations such as root
keyring rotation, operators may want to force clients to renew
their identities before the TTL threshold is met. This change
introduces a client HTTP and RPC endpoint which will instruct the
node to request a new identity at its next heartbeat. This can be
used via the API or a new command.
While this is a manual intervention step on top of the any keyring
rotation, it dramatically reduces the initial feature complexity
as it provides an asynchronous and efficient method of renewal that
utilises existing functionality.
Nomad Enterprise users operating in air-gapped or otherwise secured environments
don't want to send license reporting metrics directly from their
servers. Implement manual/offline reporting by periodically recording usage
metrics snapshots in the state store, and providing an API and CLI by which
cluster administrators can download the snapshot for review and out-of-band
transmission to HashiCorp.
This is the CE portion of the work required for implemention in the Enterprise
product. Nomad CE does not perform utilization reporting.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/2673
Ref: https://hashicorp.atlassian.net/browse/NMD-68
Ref: https://go.hashi.co/rfc/nmd-210
* Upgrade to using hashicorp/go-metrics@v0.5.4
This also requires bumping the dependencies for:
* memberlist
* serf
* raft
* raft-boltdb
* (and indirectly hashicorp/mdns due to the memberlist or serf update)
Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.
The List Volumes API was originally written for CSI but assumed we'd have future
volume types, dispatched on a query parameter. Dynamic host volumes uses this,
but the resulting code has host volumes concerns comingled in the CSI volumes
endpoint. Refactor this so that we have a top-level `GET /v1/volumes` route that's
shared between CSI and DHV, and have it dispatch to the appropriate handler in
the type-specific endpoints.
Ref: https://github.com/hashicorp/nomad/pull/24479
This changeset implements the HTTP API endpoints for Dynamic Host Volumes.
The `GET /v1/volumes` endpoint is shared between CSI and DHV with a query
parameter for the type. In the interest of getting some working handlers
available for use in development (and minimizing the size of the diff to
review), this changeset doesn't do any sort of refactoring of how the existing
List Volumes CSI endpoint works. That will come in a later PR, as will the
corresponding `api` package updates we need to support the CLI.
Ref: https://hashicorp.atlassian.net/browse/NET-11549
introduce a new API /v1/jobs/statuses, primarily for use in the UI,
which collates info about jobs, their allocations, and latest deployment.
currently the UI gets *all* of /v1/jobs and sorts and paginates them client-side
in the browser, and its "summary" column is based on historical summary data
(which can be visually misleading, and sometimes scary when a job has failed
at some point in the not-yet-garbage-collected past).
this does pagination and filtering and such, and returns jobs sorted by ModifyIndex,
so latest-changed jobs still come first. it pulls allocs and latest deployment
straight out of current state for more a more robust, holistic view of the job status.
it is less efficient per-job, due to the extra state lookups, but should be more efficient
per-page (excepting perhaps for job(s) with very-many allocs).
if a POST body is sent like `{"jobs": [{"namespace": "cool-ns", "id": "cool-job"}]}`,
then the response will be limited to that subset of jobs. the main goal here is to
prevent "jostling" the user in the UI when jobs come into and out of existence.
and if a blocking query is started with `?index=N`, then the query should only
unblock if jobs "on page" change, rather than any change to any of the state
tables being queried ("jobs", "allocs", and "deployment"), to save unnecessary
HTTP round trips.
Replaces #18812
Upgraded with:
```
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';'
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';'
go get
go get -v -u github.com/hashicorp/raft-boltdb/v2
go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80
```
see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0
for details
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
Added the [OIDC Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) `/.well-known/openid-configuration` endpoint to Nomad, but it is only enabled if the `server.oidc_issuer` parameter is set. Documented the parameter, but without a tutorial trying to actually _use_ this will be very hard.
I intentionally did *not* use https://github.com/hashicorp/cap for the OIDC configuration struct because it's built to be a *compliant* OIDC provider. Nomad is *not* trying to be compliant initially because compliance to the spec does not guarantee it will actually satisfy the requirements of third parties. I want to avoid the problem where in an attempt to be standards compliant we ship configuration parameters that lock us in to a certain behavior that we end up regretting. I want to add parameters and behaviors as there's a demonstrable need.
Users always have the escape hatch of providing their own OIDC configuration endpoint. Nomad just needs to know the Issuer so that the JWTs match the OIDC configuration. There's no reason the actual OIDC configuration JSON couldn't live in S3 and get served directly from there. Unlike JWKS the OIDC configuration should be static, or at least change very rarely.
This PR is just the endpoint extracted from #18535. The `RS256` algorithm still needs to be added in hopes of supporting third parties such as [AWS IAM OIDC Provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html).
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Add JWKS endpoint to HTTP API for exposing the root public signing keys used for signing workload identity JWTs.
Part 1 of N components as part of making workload identities consumable by third party services such as Consul and Vault. Identity attenuation (audience) and expiration (+renewal) are necessary to securely use workload identities with 3rd parties, so this merge does not yet document this endpoint.
---------
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0
Fixes#16808
* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack
* deps: use go-msgpack v2.0.0
go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Our auth token parsing code trims space around the `Authorization` header but
not around `X-Nomad-Token`. When using the UI, it's easy to accidentally
introduce a leading or trailing space, which results in spurious authentication
errors. Trim the space at the HTTP server.
* taskapi: return Forbidden on bad credentials
Prior to this change a "Server error" would be returned when ACLs are
enabled which did not match when ACLs are disabled.
* e2e: love love love datacenter wildcard default
* e2e: skip windows nodes on linux only test
The Logfs are a bit weird because they're most useful when converted to
Printfs to make debugging the test much faster, but that makes CI noisy.
In a perfect world Go would expose how many tests are being run and we
could stream output live if there's only 1. For now I left these helpful
lines in as basically glorified comments.
Fixes#14617
Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted.
Includes new UI for editing metadata as well.
---------
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled.
This PR contains the core feature and tests but followup work is required for the following TODO items:
- Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink
- Unit tests for auth middleware
- Caching for auth middleware
- Rate limiting on negative lookups for auth middleware
---------
Co-authored-by: Seth Hoenig <shoenig@duck.com>
In #15605 we fixed the bug where the presense of "stale" query parameter
was mean to imply stale, even if the value of the parameter was "false"
or malformed. In parsing, we missed the case where the slice of values
would be nil which lead to a failing test case that was missed because
CI didn't run against the original PR.
This change add the RPC ACL binding rule handlers. These handlers
are responsible for the creation, updating, reading, and deletion
of binding rules.
The write handlers are feature gated so that they can only be used
when all federated servers are running the required version.
The HTTP API handlers and API SDK have also been added where
required. This allows the endpoints to be called from the API by users
and clients.
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.
The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
Return 429 response on HTTP max connection limit. Instead of silently closing
the connection, return a `429 Too Many Requests` HTTP response with a helpful
error message to aid debugging when the connection limit is unintentionally
reached.
Set a 10-millisecond write timeout and rate limiter for connection-limit 429
response to prevent writing the HTTP response from consuming too many server
resources.
Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections
exceeding concurrency limit.