83 Commits

Author SHA1 Message Date
tehut
d709accaf5 Add nomad monitor export command (#26178)
* Add MonitorExport command and handlers
* Implement autocomplete
* Require nomad in serviceName
* Fix race in StreamReader.Read
* Add and use framer.Flush() to coordinate function exit
* Add LogFile to client/Server config and read NomadLogPath in rpcHandler instead of HTTPServer
* Parameterize StreamFixed stream size
2025-08-01 10:26:59 -07:00
James Rasell
7268053174 vault: Remove legacy token based authentication workflow. (#25155)
The legacy workflow for Vault whereby servers were configured
using a token to provide authentication to the Vault API has now
been removed. This change also removes the workflow where servers
were responsible for deriving Vault tokens for Nomad clients.

The deprecated Vault config options used byi the Nomad agent have
all been removed except for "token" which is still in use by the
Vault Transit keyring implementation.

Job specification authors can no longer use the "vault.policies"
parameter and should instead use "vault.role" when not using the
default workload identity.

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Aimee Ukasick <aimee.ukasick@hashicorp.com>
2025-02-28 07:40:02 +00:00
Seth Hoenig
db0642099e build: update golangci-lint to 1.60.1 (#23807)
* build: update golangci-lint to 1.60.1

* ci: update golangci-lint to v1.60.1

Helps with go1.23 compatability. Introduces some breaking changes / newly
enforced linter patterns so those are fixed as well.
2024-08-14 10:09:31 -05:00
Michael Schurter
23e4b7c9d2 Upgrade go-msgpack to v2 (#20173)
Replaces #18812

Upgraded with:
```
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';'
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';'
go get
go get -v -u github.com/hashicorp/raft-boltdb/v2
go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80
```

see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0
for details
2024-03-21 11:44:23 -07:00
Tim Gross
9d075c44b2 config: remove old Vault/Consul config blocks from parser (#18997)
Remove the now-unused original configuration blocks for Consul and Vault from
the agent configuration parsing. When the agent needs to refer to a Consul or
Vault block it will always be for a specific cluster for the task/service (or
the default cluster for the agent's own use).

This is third of three changesets for this work.

Fixes: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Ref: https://github.com/hashicorp/nomad/pull/18994
2023-11-08 09:30:08 -05:00
James Rasell
ca9e08e6b5 monitor: add log include location option on monitor CLI and API (#18795) 2023-10-20 07:55:22 +01:00
Tim Gross
cbd7248248 auth: use ACLsDisabledACL when ACLs are disabled (#18754)
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: https://github.com/hashicorp/nomad/pull/18703
Ref: https://github.com/hashicorp/nomad/pull/18715
Ref: https://github.com/hashicorp/nomad/pull/16799
Ref: https://github.com/hashicorp/nomad/pull/18730
Ref: https://github.com/hashicorp/nomad/pull/18744
2023-10-16 09:30:24 -04:00
Gerard Nguyen
1339599185 cli: Add prune flag for nomad server force-leave command (#18463)
This feature will help operator to remove a failed/left node from Serf layer immediately
without waiting for 24 hours for the node to be reaped

* Update CLI with prune flag
* Update API /v1/agent/force-leave with prune query string parameter
* Update CLI and API doc
* Add unit test
2023-09-15 08:45:11 -04:00
Tim Gross
a8bad048b6 config: parsing support for multiple Consul clusters in agent config (#18255)
Add the plumbing we need to accept multiple Consul clusters in Nomad agent
configuration, to support upcoming Nomad Enterprise features. The `consul` blocks
are differentiated by a new `name` field, and if the `name` is omitted it
becomes the "default" Consul configuration. All blocks with the same name are
merged together, as with the existing behavior.

As with the `vault` block, we're still using HCL1 for parsing configuration and
the `Decode` method doesn't parse multiple blocks differentiated only by a field
name without a label. So we've had to add an extra parsing pass, similar to what
we've done for HCL1 jobspecs. This also revealed a subtle bug in the `vault`
block handling of extra keys when there are multiple `vault` blocks, which I've
fixed here.

For now, all existing consumers will use the "default" Consul configuration, so
there's no user-facing behavior change in this changeset other than the contents
of the agent self API.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-08-18 15:25:16 -04:00
Tim Gross
74b796e6d0 config: parsing support for multiple Vault clusters in agent config (#18224)
Add the plumbing we need to accept multiple Vault clusters in Nomad agent
configuration, to support upcoming Nomad Enterprise features. The `vault` blocks
are differentiated by a new `name` field, and if the `name` is omitted it
becomes the "default" Vault configuration. All blocks with the same name are
merged together, as with the existing behavior.

Unfortunately we're still using HCL1 for parsing configuration and the `Decode`
method doesn't parse multiple blocks differentiated only by a field name without
a label. So we've had to add an extra parsing pass, similar to what we've done
for HCL1 jobspecs.

For now, all existing consumers will use the "default" Vault configuration, so
there's no user-facing behavior change in this changeset other than the contents
of the agent self API.

Ref: https://github.com/hashicorp/team-nomad/issues/404
2023-08-17 14:10:32 -04:00
hashicorp-copywrite[bot]
a9d61ea3fd Update copyright file headers to BUSL-1.1 2023-08-10 17:27:29 -05:00
Ville Vesilehto
2c463bb038 chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968) 2023-07-26 15:28:09 +01:00
Luiz Aoqui
ee5a08dbb2 Revert "hashicorp/go-msgpack v2 (#16810)" (#17047)
This reverts commit 8a98520d56.
2023-05-01 17:18:34 -04:00
Ian Fijolek
8a98520d56 hashicorp/go-msgpack v2 (#16810)
* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0

Fixes #16808

* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack

* deps: use go-msgpack v2.0.0

go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-17 17:02:05 -04:00
hashicorp-copywrite[bot]
f005448366 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Michael Schurter
9bab96ebd3 Task API via Unix Domain Socket (#15864)
This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled.

This PR contains the core feature and tests but followup work is required for the following TODO items:

- Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink
- Unit tests for auth middleware
- Caching for auth middleware
- Rate limiting on negative lookups for auth middleware

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-02-06 11:31:22 -08:00
James Rasell
c495cd99bf api: ensure all request body decode error return a 400 status code. (#15252) 2022-11-18 17:04:33 +01:00
Michael Schurter
01648e615a client: fix data races in config handling (#14139)
Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked.

This change takes the following approach to safely handling configs in the client:

1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex
2. Since the mutex *only protects the config pointer itself and not fields inside the Config struct:* all config mutation must be done on a *copy* of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates.
3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion.
4. To facilitate deep copying I made an *internally backward incompatible API change:* our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"
2022-08-18 16:32:04 -07:00
Luiz Aoqui
1aa3b56108 api: prevent excessice CPU load on job parse
Add new namespace ACL requirement for the /v1/jobs/parse endpoint and
return early if HCLv2 parsing fails.

The endpoint now requires the new `parse-job` ACL capability or
`submit-job`.
2022-02-09 19:51:47 -05:00
Charlie Voiselle
6e61606eba Make number of scheduler workers reloadable (#11593)
## Development Environment Changes
* Added stringer to build deps

## New HTTP APIs
* Added scheduler worker config API
* Added scheduler worker info API

## New Internals
* (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume()
* Update shutdown to use context
* Add mutex for contended server data
    - `workerLock` for the `workers` slice
    - `workerConfigLock` for the `Server.Config.NumSchedulers` and
      `Server.Config.EnabledSchedulers` values

## Other
* Adding docs for scheduler worker api
* Add changelog message

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-01-06 11:56:13 -05:00
Tim Gross
072d3b6b74 cli: ensure -stale flag is respected by nomad operator debug (#11678)
When a cluster doesn't have a leader, the `nomad operator debug`
command can safely use stale queries to gracefully degrade the
consistency of almost all its queries. The query parameter for these
API calls was not being set by the command.

Some `api` package queries do not include `QueryOptions` because
they target a specific agent, but they can potentially be forwarded to
other agents. If there is no leader, these forwarded queries will
fail. Provide methods to call these APIs with `QueryOptions`.
2021-12-15 10:44:03 -05:00
Dave May
d757507341 fix AgentHostRequest panic found in GH-9546 (#9554)
* debug: refactor nodeclass test
* debug: add case to track down SIGSEGV on client to server Agent.Host RPC
* verify server to avoid panic on AgentHostRequest RPC call, fixes GH-9546
* simplify Agent.Host RPC lookup logic
2020-12-07 17:34:40 -05:00
Lang Martin
b5ef217c90 nomad debug renamed to nomad operator debug (#8602)
* renamed: command/debug.go -> command/operator_debug.go
* website: rename debug -> operator debug
* website/pages/api-docs/agent: name in api docs
2020-08-11 15:39:44 -04:00
Lang Martin
bde973e366 api: nomad debug new /agent/host (#8325)
* command/agent/host: collect host data, multi platform

* nomad/structs/structs: new HostDataRequest/Response

* client/agent_endpoint: add RPC endpoint

* command/agent/agent_endpoint: add Host

* api/agent: add the Host endpoint

* nomad/client_agent_endpoint: add Agent Host with forwarding

* nomad/client_agent_endpoint: use findClientConn

This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.

* command/debug: call agent hosts

* command/agent/host: eliminate calling external programs
2020-07-02 09:51:25 -04:00
Yoan Blanc
c3928fe360 fixup! vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:48:07 -04:00
Yoan Blanc
887f23a351 vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:45:21 -04:00
Michael Schurter
b2ff9bcb1f agent: prevent XSS by controlling Content-Type 2020-03-25 09:45:43 -04:00
Drew Bailey
4e9dc03d7d agent Profile req nil check s.agent.Server()
clean up logic and tests
2020-02-03 13:20:05 -05:00
Drew Bailey
39a6c6374a Fix panic when monitoring a local client node
Fixes a panic when accessing a.agent.Server() when agent is a client
instead. This pr removes a redundant ACL check since ACLs are validated
at the RPC layer. It also nil checks the agent server and uses Client()
when appropriate.
2020-02-03 13:20:04 -05:00
Drew Bailey
a58b8a5e9c refactor api profile methods
comment why we ignore errors parsing params
2020-01-09 15:15:12 -05:00
Drew Bailey
549045fcbb Rename profile package to pprof
Address pr feedback, rename profile package to pprof to more accurately
describe its purpose. Adds gc param for heap lookup profiles.
2020-01-09 15:15:10 -05:00
Drew Bailey
1776458956 address pr feedback 2020-01-09 15:15:09 -05:00
Drew Bailey
a3f73b3e06 leave acl checking to rpc endpoints
fix test expectation

test wrapNonJSON
2020-01-09 15:15:08 -05:00
Drew Bailey
db382d3195 provide helpful error, cleanup logic 2020-01-09 15:15:08 -05:00
Drew Bailey
11563dca1c prevent doubly wrapping with rpc error 2020-01-09 15:15:07 -05:00
Drew Bailey
328075591f region forwarding; prevent recursive forwards for impossible requests
prevent region forwarding loop, backfill tests

fix failing test
2020-01-09 15:15:06 -05:00
Drew Bailey
390e22e421 move shared structs out of client and into nomad 2020-01-09 15:15:05 -05:00
Drew Bailey
57dc0c6a46 test pprof headers and profile methods
tidy up, add comments

clean up seconds param assignment
2020-01-09 15:15:04 -05:00
Drew Bailey
c28e5ad036 warn when enabled debug is on when registering
m -> a receiver name

return codederrors, fix query
2020-01-09 15:15:04 -05:00
Drew Bailey
d077cfe763 acl and debug test table
rename implementation method
2020-01-09 15:15:03 -05:00
Drew Bailey
fb1b4cdc26 Server request forwarding for Agent.Profile
Return rpc errors for profile requests, set up remote forwarding to
target leader or server id for profile requests.

server forwarding, endpoint tests
2020-01-09 15:15:03 -05:00
Drew Bailey
3575f1777f test for known pprof endpoints 2020-01-09 15:15:02 -05:00
Drew Bailey
240c0ee0ec agent pprof endpoints
wip, agent endpoint and client endpoint for pprof profiles

agent endpoint test
2020-01-09 15:15:02 -05:00
Drew Bailey
e46c41553d serverID to target remote leader or server
handle the case where we request a server-id which is this current server

update docs, error on node and server id params

more accurate names for tests

use shared no leader err, formatting

rm bad comment

remove redundant variable
2019-11-14 10:07:35 -05:00
Drew Bailey
fb49f3c35b add server-id to monitor specific server 2019-11-14 09:53:41 -05:00
Drew Bailey
158517972b wireup plain=true|false query param 2019-11-05 11:44:28 -05:00
Drew Bailey
873969cf41 return 400 if invalid log_json param is given
Addresses feedback around monitor implementation

subselect on stopCh to prevent blocking forever.

Set up a separate goroutine to check every 3 seconds for dropped
messages.

rename returned ch to avoid confusion
2019-11-05 09:51:53 -05:00
Drew Bailey
6bf8617d02 rename function, initialize log level better
underscores instead of dashes for query params
2019-11-05 09:51:53 -05:00
Drew Bailey
92d6a30f86 agent:read acl policy for monitor 2019-11-05 09:51:52 -05:00
Drew Bailey
c8d60dd6f9 only look up rpchandler for node if we have nodeid
fix some comments and nomad monitor -h output
2019-11-05 09:51:51 -05:00