Commit Graph

39 Commits

Author SHA1 Message Date
Piotr Kazmierczak
0906f788f0 keyring: warn if removing a key that was used for encrypting variables (#24766)
Adds an additional check in the Keyring.Delete RPC to make sure we're not
trying to delete a key that's been used to encrypt a variable. It also adds a
-force flag for the CLI/API to sidestep that check.
2025-01-07 10:15:02 +01:00
Tim Gross
a7f2cb879e command line tools for redacting keyring from snapshots (#24023)
In #23977 we moved the keyring into Raft, which can expose key material in Raft
snapshots when using the less-secure AEAD keyring instead of KMS. This changeset
adds tools for redacting this material from snapshots:

* The `operator snapshot state` command gains the ability to display key
  metadata (only), which respects the `-filter` option.
* The `operator snapshot save` command gains a `-redact` option that removes key
  material from the snapshot after it's downloaded.
* A new `operator snapshot redact` command allows removing key material from an
  existing snapshot.
2024-09-20 15:30:14 -04:00
Aimee Ukasick
3d06eef65d Docs: CE-705 Highlight that user must backup keyring separately 2024-08-26 11:25:26 -05:00
Sujata Roy
36522ec632 Merge pull request #23850 from hashicorp/Nomad-NET-9394
command/debug: capture more logs by default
2024-08-22 10:43:28 -07:00
Michael Schurter
8b0a88e2f7 docs: update defaults for operator debug 2024-08-22 09:17:03 -07:00
Tim Gross
2f4353412d keyring: support prepublishing keys (#23577)
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  https://github.com/hashicorp/nomad/issues/16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: https://github.com/hashicorp/nomad/issues/19669
Fixes: https://github.com/hashicorp/nomad/issues/23528
Fixes: https://github.com/hashicorp/nomad/issues/19368
2024-07-19 13:29:41 -04:00
Adrian Todorov
3f2729f7f5 remove mentions of old versions of Nomad in various docs (#23567) 2024-07-12 11:01:32 -04:00
James Rasell
1c976d126e docs: update snapshot inspect CLI detail to mirror recent changes. (#23276) 2024-06-10 14:30:13 +01:00
Tim Gross
02d98b9357 operator debug: fix pprof interval handling (#20206)
The `nomad operator debug` command saves a CPU profile for each interval, and
names these files based on the interval.

The same functions takes a goroutine profile, heap profile, etc. but is missing
the logic to interpolate the file name with the interval. This results in the
operator debug command making potentially many expensive profile requests, and
then overwriting the data. Update the command to save every profile it scrapes,
and number them similarly to the existing CPU profile.

Additionally, the command flags for `-pprof-interval` and `-pprof-duration` were
validated backwards, which meant that we always coerced the `-pprof-interval` to
be the same as the `-pprof-duration`, which always resulted in a single profile
being taken at the start of the bundle. Correct the check as well as change the
defaults to be more sensible.

Fixes: https://github.com/hashicorp/nomad/issues/20151
2024-03-25 09:01:06 -04:00
Tim Gross
c4253470a0 autopilot: add operator autopilot health command (#20156)
Add a command line operation that reports Enterprise autopilot data from the
`/operator/autopilot/health` API. I've pulled this feature out of
@lindleywhite's PR in the Enterprise repo.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394

Co-authored-by: Lindley <lindley@hashicorp.com>
2024-03-18 14:46:18 -04:00
Tim Gross
e551814df5 docs: add warnings about backing up keyring to snapshot commands (#19400)
The `operator snapshot` commands and agent don't back up Nomad's key
material. Add some warnings about this to places where users might be looking
for information on cluster recovery.

Fixes: https://github.com/hashicorp/nomad/issues/19389
2023-12-08 16:05:05 -05:00
James Rasell
ca9e08e6b5 monitor: add log include location option on monitor CLI and API (#18795) 2023-10-20 07:55:22 +01:00
Charlie Voiselle
8a93ff3d2d [server] Directed leadership transfer CLI and API (#17383)
* Add directed leadership transfer func
* Add leadership transfer RPC endpoint
* Add ACL tests for leadership-transfer endpoint
* Add HTTP API route and implementation
* Add to Go API client
* Implement CLI command
* Add documentation
* Add changelog

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-10-04 12:20:27 -04:00
Tim Gross
cef87f054a docs: add notes about keyring to snapshot restore (#16663)
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
2023-03-28 08:31:01 -04:00
Tim Gross
6145cdcd11 cli: remove deprecated keyring and keygen commands (#16068)
These command were marked as deprecated in 1.4.0 with intent to remove in
1.5.0. Remove them and clean up the docs.
2023-02-07 09:49:52 -05:00
Bryce Kalow
84ed398e8d docs: fix outstanding content conformance errors (#16040) 2023-02-02 15:40:07 -06:00
Ashlee M Boyer
3444ece549 docs: Migrate link formats (#15779)
* Adding check-legacy-links-format workflow

* Adding test-link-rewrites workflow

* chore: updates link checker workflow hash

* Migrating links to new format

Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com>
2023-01-25 09:31:14 -08:00
Ashlee M Boyer
7ff3177569 Fixing yaml syntax in frontmatter (#15781) 2023-01-13 14:06:46 -05:00
Dao Thanh Tung
30b235345d cli: Add a nomad operator client state command (#15469)
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
2023-01-11 10:03:31 -05:00
Kevin Wang
57dc7c2ab1 fix: website broken links (#14904)
* fix: website broken links

* fix up keyring-rotate link

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-10-17 11:32:10 -04:00
Tim Gross
0c82b1dec9 remove root keyring install API (#14514)
* keyring rotate API should require put/post method
* remove keyring install API
2022-09-09 08:50:35 -04:00
Tim Gross
e1e5bb1dce docs: rename Secure Variables to Variables (#14352) 2022-08-29 11:37:08 -04:00
Tim Gross
587360543b docs: keyring commands (#13690)
Document the secure variables keyring commands, document the aliased
gossip keyring commands, and note that the old gossip keyring commands
are deprecated.
2022-07-20 14:14:10 -04:00
Tim Gross
b209fc47da docs: move operator subcommands under their own trees (#13677)
The sidebar navigation tree for the `operator` sub-sub commands is
getting cluttered and we have a new set of commands coming to support
secure variables keyring as well. Move these all under their own
subtrees.
2022-07-11 14:00:24 -04:00
James Rasell
24220d0a02 core: allow pausing and un-pausing of leader broker routine (#13045)
* core: allow pause/un-pause of eval broker on region leader.

* agent: add ability to pause eval broker via scheduler config.

* cli: add operator scheduler commands to interact with config.

* api: add ability to pause eval broker via scheduler config

* e2e: add operator scheduler test for eval broker pause.

* docs: include new opertor scheduler CLI and pause eval API info.
2022-07-06 16:13:48 +02:00
Tim Gross
ab6f13db1d Fix flaky operator debug test (#12501)
We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes:

* Make first pprof collection synchronous to preserve the existing
  behavior for the common case where the pprof interval matches the
  duration.

* Clamp `operator debug` pprof timing to that of the command. The
  `pprof-duration` should be no more than `duration` and the
  `pprof-interval` should be no more than `pprof-duration`. Clamp the
  values rather than throwing errors, which could change the commands
  that existing users might already have in debugging scripts

* Testing: remove test parallelism

  The `operator debug` tests that stand up servers can't be run in
  parallel, because we don't have a way of canceling the API calls for
  pprof. The agent will still be running the last pprof when we exit,
  and that breaks the next test that talks to that same agent.
  (Because you can only run one pprof at a time on any process!)

  We could split off each subtest into its own server, but this test
  suite is already very slow. In future work we should fix this "for
  real" by making the API call cancelable.


* Testing: assert against unexpected errors in `operator debug` tests.

  If we assert there are no unexpected error outputs, it's easier for
  the developer to debug when something is going wrong with the tests
  because the error output will be presented as a failing test, rather
  than just a failing exit code check. Or worse, no failing exit code
  check!

  This also forces us to be explicit about which tests will return 0
  exit codes but still emit (presumably ignorable) error outputs.

Additional minor bug fixes (mostly in tests) and test refactorings:

* Fix text alignment on pprof Duration in `operator debug` output

* Remove "done" channel from `operator debug` event stream test. The
  goroutine we're blocking for here already tells us it's done by
  sending a value, so block on that instead of an extraneous channel

* Event stream test timer should start at current time, not zero

* Remove noise from `operator debug` test log output. The `t.Logf`
  calls already are picked out from the rest of the test output by
  being prefixed with the filename.

* Remove explicit pprof args so we use the defaults clamped from
  duration/interval
2022-04-07 15:00:07 -04:00
Danish Prakash
ff6ae5fad2 command/operator_debug: add pprof interval (#11938) 2022-04-04 15:24:12 -04:00
Michael Schurter
3020b4e851 docs: add op api examples 2022-03-01 17:15:26 -08:00
Michael Schurter
a1000ee5b8 docs: add op api examples 2022-03-01 17:12:58 -08:00
Michael Schurter
ed95316bdf docs: add op api options 2022-03-01 16:43:53 -08:00
Michael Schurter
6bc962fe03 rename nomad curl to nomad operator api 2022-02-24 15:52:54 -08:00
Dave May
8d28bfe415 cli: Add event stream capture to nomad operator debug (#11865) 2022-01-17 21:35:51 -05:00
Tim Gross
03ea7d1c17 cli: unhide advanced operator raft debugging commands (#11682)
The `nomad operator raft` and `nomad operator snapshot state`
subcommands for inspecting on-disk raft state were hidden and
undocumented. Expose and document these so that advanced operators
have support for these tools.
2021-12-16 10:32:11 -05:00
Dave May
f46b97b2df debug: update default node-id and docs (#11398)
* debug: default node-id to all
* debug: align cli help and website documentation
2021-10-27 13:43:56 -04:00
Buck Doyle
9dcd53685a docs: Fix missing link to operator debug (#10523) 2021-05-06 11:29:41 -05:00
Bryce Kalow
ee79587a67 feat(website): migrates to new nav data format (#10264) 2021-03-31 08:43:17 -05:00
Dave May
9138d374d6 docs: add missing dashes to operator debug Usage (#10192) 2021-03-17 15:13:04 -04:00
Shantanu Gadgil
2a50f73ed2 The encryption key uses 32 bytes now
The encryption key uses 32 bytes now, not 16 bytes
2021-02-11 08:34:39 -05:00
Jeff Escalante
0eae603a86 implement mdx remote 2021-01-05 19:02:39 -05:00