nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	55fe05d353	heartbeat: use leader's ACL token when failing heartbeat (#24241 ) In #23838 we updated the `Node.Update` RPC handler we use for heartbeats to be more strict about requiring node secrets. But when a node goes down, it's the leader that sends the request to mark the node down via `Node.Update` (to itself), and this request was missing the leader ACL needed to authenticate to itself. Add the leader ACL to the request and update the RPC handler test for disconnected-clients to use ACLs, which would have detected this bug. Also added a note to the `Authenticate` comment about how that authentication path requires the leader ACL. Fixes: https://github.com/hashicorp/nomad/issues/24231 Ref: https://hashicorp.atlassian.net/browse/NET-11384	2024-10-17 13:48:20 -04:00
Michael Smithhisler	25b2bd8467	test: add missing checks for vault binary in unit tests (#23986 )	2024-09-18 17:12:29 -04:00
Tim Gross	bc50eebebd	workload identity: add support for extra claims config for Vault (#23675 ) Although we encourage users to use Vault roles, sometimes they're going to want to assign policies based on entity and pre-create entities and aliases based on claims. This allows them to use single default role (or at least small number of them) that has a templated policy, but have an escape hatch from that. When defining Vault entities the `user_claim` must be unique. When writing Vault binding rules for use with Nomad workload identities the binding rule won't be able to create a 1:1 mapping because the selector language allows accessing only a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a job because of namespaces. It's possible to create a JWT auth role with `bound_claims` to avoid this becoming a security problem, but this doesn't allow for correct accounting of user claims. Add support for an `extra_claims` block on the server's `default_identity` blocks for Vault. This allows a cluster administrator to add a custom claim on all allocations. The values for these claims are interpolatable with a limited subset of fields, similar to how we interpolate the task environment. Fixes: https://github.com/hashicorp/nomad/issues/23510 Ref: https://hashicorp.atlassian.net/browse/NET-10372 Ref: https://hashicorp.atlassian.net/browse/NET-10387	2024-08-05 15:01:54 -04:00
Tim Gross	65ae61249c	CSI: include volume namespace in staging path (#20532 ) CSI volumes are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. The per-allocation paths don't need to be namespaced, because an allocation can only mount volumes from its job's own namespace. Rework the CSI hook tests to have more fine-grained control over the mock on-disk state. Add tests covering upgrades from staging paths missing namespaces. Fixes: https://github.com/hashicorp/nomad/issues/18741	2024-05-13 11:24:09 -04:00
Seth Hoenig	4d83733909	tests: swap testify for test in more places (#20028 ) * tests: swap testify for test in plugins/csi/client_test.go * tests: swap testify for test in testutil/ * tests: swap testify for test in host_test.go * tests: swap testify for test in plugin_test.go * tests: swap testify for test in utils_test.go * tests: swap testify for test in scheduler/ * tests: swap testify for test in parse_test.go * tests: swap testify for test in attribute_test.go * tests: swap testify for test in plugins/drivers/ * tests: swap testify for test in command/ * tests: fixup some test usages * go: run go mod tidy * windows: cpuset test only on linux	2024-02-29 12:11:35 -06:00
Luiz Aoqui	4a8b01430b	scheduler: retain eval metrics on port collision (#19933 ) When an allocation can't be placed because of a port collision the resulting blocked eval is expected to have a metric reporting the port that caused the conflict, but this metrics was not being emitted when preemption was enabled.	2024-02-09 18:18:48 -05:00
Tim Gross	9d075c44b2	config: remove old Vault/Consul config blocks from parser (#18997 ) Remove the now-unused original configuration blocks for Consul and Vault from the agent configuration parsing. When the agent needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service (or the default cluster for the agent's own use). This is third of three changesets for this work. Fixes: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Ref: https://github.com/hashicorp/nomad/pull/18994	2023-11-08 09:30:08 -05:00
Michael Schurter	66fbc0f67e	identity: default to RS256 for new workload ids (#18882 ) OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256. Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again. Test Updates Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed. I added a new `testutil.WaitForKeyring` helper to replace `testutil.WaitForLeader` in cases where the keyring must be initialized before the test may continue. However this is mostly used in the `nomad/` package. In the `api` and `command/agent` packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners. TODO - Docs and changelog entries. - Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days. - Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this. Requiem for ed25519 When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset. EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties: 1. Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key. 2. Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious. 3. No parameters. No choices of encodings. It's all well-defined by [RFC 8032](https://datatracker.ietf.org/doc/html/rfc8032). 4. Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus! 5. Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too. Life was good with ed25519, but sadly it could not last. [IDPs](https://en.wikipedia.org/wiki/Identity_provider), such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256: - [PayPal](https://www.paypalobjects.com/.well-known/openid-configuration) supports HS256 - [Yahoo](https://api.login.yahoo.com/.well-known/openid-configuration) supports ES256 RS256 only: - [GitHub](https://token.actions.githubusercontent.com/.well-known/openid-configuration) - [GitLab](https://gitlab.com/.well-known/openid-configuration) - [Google](https://accounts.google.com/.well-known/openid-configuration) - [Intuit](https://developer.api.intuit.com/.well-known/openid_configuration) - [Microsoft](https://login.microsoftonline.com/fabrikamb2c.onmicrosoft.com/v2.0/.well-known/openid-configuration) - [SalesForce](https://login.salesforce.com/.well-known/openid-configuration) - [SimpleLogin (acquired by ProtonMail)](https://app.simplelogin.io/.well-known/openid-configuration/) - [TFC](https://app.terraform.io/.well-known/openid-configuration)	2023-10-31 11:25:20 -07:00
Tim Gross	139a96ad12	e2e: fix bind name to allow Connect reachability (#18878 ) The `BindName` for JWT authentication should always bind to the `nomad_service` field in the JWT and not include the namespace, as the `nomad_service` is what's actually registered in Consul. * Fix the binding rule for the `consulcompat` test * Add a reachability assertion so that we don't miss regressions. * Ensure we have a clean shutdown so that we don't leak state (containers and iptables) between tests.	2023-10-27 10:15:17 -04:00
Tim Gross	6c2d5a0fbb	E2E: Consul compatibility matrix tests (#18799 ) Set up a new test suite that exercises Nomad's compatibility with Consul. This suite installs all currently supported versions of Consul, spins up a Consul agent with appropriate configuration, and a Nomad agent running in dev mode. Then it runs a Connect job against each pair.	2023-10-24 16:03:53 -04:00
Luiz Aoqui	70b1862026	test: add E2E `vaultcompat` test for JWT auth flow (#18822 ) Test the JWT auth flow using real Nomad and Vault agents.	2023-10-23 20:00:55 -04:00
Tim Gross	f5c5035fde	testutil: add ACL bootstrapping to test server configuration (#18811 ) Some of our `api` package tests have ACLs enabled, but none of those tests also run clients and the "wait for the clients to be live" code reads from the Node API. The caller can't bootstrap ACLs until `NewTestServer` returns, and this makes for a circular dependency. Allow developers to provide a bootstrap token to the test server config, and if it's available, have the server bootstrap the ACL system with it before checking for live clients.	2023-10-19 16:50:38 -04:00
Luiz Aoqui	349c032369	vault: update task runner vault hook to support workload identity (#18534 )	2023-10-16 19:37:57 -04:00
Luiz Aoqui	868aba57bb	vault: update identity name to start with `vault_` (#18591 ) * vault: update identity name to start with `vault_` In the original proposal, workload identities used to derive Vault tokens were expected to be called just `vault`. But in order to support multiple Vault clusters it is necessary to associate identities with specific Vault cluster configuration. This commit implements a new proposal to have Vault identities named as `vault_<cluster>`.	2023-09-27 15:53:28 -03:00
Daniel Bennett	7bd5c6e84e	test: Refactor mock CSI manager (#18554 ) and MockCSIManager to support the call counting that csi_hook_test expects instead of implementing csimanager interfaces in two separate places: * client/allocrunner/csi_hook_test * client/csi_endpoint_test they can both use the same mocks defined in client/pluginmanager/csimanager/ alongside the actual implementations of them. also refactor TestCSINode_DetachVolume to use use it like Node_ExpandVolume so we can also test the happy path there	2023-09-21 16:03:53 -05:00
Michael Schurter	ef24e40b39	identity: support jwt expiration and rotation (#18262 ) Implements expirations and renewals for alternate workload identity tokens.	2023-09-08 14:50:34 -07:00
Tim Gross	b51b2a2705	fingerprint: add support for fingerprinting multiple Vault clusters (#18253 ) Add fingerprinting we'll need to accept multiple Vault clusters in upcoming Nomad Enterprise features. The fingerprinter will create a map of Vault clients by cluster name. In Nomad CE, all but the default cluster will be ignored and there will be no visible behavior change.	2023-08-18 15:33:22 -04:00
Seth Hoenig	6fca4fa715	test-e2e: no need to run vaultcomat tests as root (#18215 ) `6747ef8803` fixes the Nomad client to support using the raw_exec driver while running as a non-root user. Remove the use of sudo in the test-e2e workflow for running integration (vaultcompat) tests.	2023-08-15 16:00:54 -05:00
hashicorp-copywrite[bot]	a9d61ea3fd	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:29 -05:00
Seth Hoenig	37dd4c4a69	e2e: modernize vaultcompat testing (#18179 ) * e2e: modernize vaultcompat testing * e2e: cr fixes for vaultcompat	2023-08-09 09:24:51 -05:00
Ville Vesilehto	2c463bb038	chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968 )	2023-07-26 15:28:09 +01:00
Daniel Bennett	e0dd940439	tests: enable newer windows (#17401 ) * "allow" (don't try to drop) linux capabilities in the docker test driver harness (see #15181) * refactor to allow different busybox images since windows containers need to be the same version as the underlying OS, and we're moving from 2016 to 2019 * one docker test was flaky from apparently being a bit slower on windows, so add Wait()	2023-06-02 11:38:38 -05:00
Daniel Bennett	c2dc1c58dd	full task cleanup when alloc prerun hook fails (#17104 ) to avoid leaking task resources (e.g. containers, iptables) if allocRunner prerun fails during restore on client restart. now if prerun fails, TaskRunner.MarkFailedKill() will only emit an event, mark the task as failed, and cancel the tr's killCtx, so then ar.runTasks() -> tr.Run() can take care of the actual cleanup. removed from (formerly) tr.MarkFailedDead(), now handled by tr.Run(): * set task state as dead * save task runner local state * task stop hooks also done in tr.Run() now that it's not skipped: * handleKill() to kill tasks while respecting their shutdown delay, and retrying as needed * also includes task preKill hooks * clearDriverHandle() to destroy the task and associated resources * task exited hooks	2023-05-08 13:17:10 -05:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Michael Schurter	fb085186b7	client/metadata: fix crasher caused by AllowStale = false (#16549 ) Fixes #16517 Given a 3 Server cluster with at least 1 Client connected to Follower 1: If a NodeMeta.{Apply,Read} for the Client request is received by Follower 1 with `AllowStale = false` the Follower will forward the request to the Leader. The Leader, not being connected to the target Client, will forward the RPC to Follower 1. Follower 1, seeing AllowStale=false, will forward the request to the Leader. The Leader, not being connected to... well hoppefully you get the picture: an infinite loop occurs.	2023-03-20 16:32:32 -07:00
Seth Hoenig	1cfa95ee54	tls enforcement flaky tests (#16543 ) * tests: add WaitForLeaders helpers using must/wait timings * tests: start servers for mtls tests together Fixes #16253 (hopefully)	2023-03-17 14:11:13 -05:00
Lance Haig	962b65f5bc	Update ioutil library references to os and io respectively for e2e helper nomad (#16332 ) No user facing changes so I assume no change log is required	2023-03-08 09:39:03 -06:00
Michael Schurter	542b23e999	Accept Workload Identities for Client RPCs (#16254 ) This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs. Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled. --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-02-27 10:17:47 -08:00
Luiz Aoqui	2659757194	core: enforce strict steps for clients reconnect (#15808 ) When a Nomad client that is running an allocation with `max_client_disconnect` set misses a heartbeat the Nomad server will update its status to `disconnected`. Upon reconnecting, the client will make three main RPC calls: - `Node.UpdateStatus` is used to set the client status to `ready`. - `Node.UpdateAlloc` is used to update the client-side information about allocations, such as their `ClientStatus`, task states etc. - `Node.Register` is used to upsert the entire node information, including its status. These calls are made concurrently and are also running in parallel with the scheduler. Depending on the order they run the scheduler may end up with incomplete data when reconciling allocations. For example, a client disconnects and its replacement allocation cannot be placed anywhere else, so there's a pending eval waiting for resources. When this client comes back the order of events may be: 1. Client calls `Node.UpdateStatus` and is now `ready`. 2. Scheduler reconciles allocations and places the replacement alloc to the client. The client is now assigned two allocations: the original alloc that is still `unknown` and the replacement that is `pending`. 3. Client calls `Node.UpdateAlloc` and updates the original alloc to `running`. 4. Scheduler notices too many allocs and stops the replacement. This creates unnecessary placements or, in a different order of events, may leave the job without any allocations running until the whole state is updated and reconciled. To avoid problems like this clients must update _all_ of its relevant information before they can be considered `ready` and available for scheduling. To achieve this goal the RPC endpoints mentioned above have been modified to enforce strict steps for nodes reconnecting: - `Node.Register` does not set the client status anymore. - `Node.UpdateStatus` sets the reconnecting client to the `initializing` status until it successfully calls `Node.UpdateAlloc`. These changes are done server-side to avoid the need of additional coordination between clients and servers. Clients are kept oblivious of these changes and will keep making these calls as they normally would. The verification of whether allocations have been updates is done by storing and comparing the Raft index of the last time the client missed a heartbeat and the last time it updated its allocations.	2023-01-25 15:53:59 -05:00
Seth Hoenig	f05aa6d5ec	vault: configure user agent on Nomad vault clients (#15745 ) * vault: configure user agent on Nomad vault clients This PR attempts to set the User-Agent header on each Vault API client created by Nomad. Still need to figure a way to set User-Agent on the Vault client created internally by consul-template. * vault: fixup find-and-replace gone awry	2023-01-10 10:39:45 -06:00
Seth Hoenig	dab4d7ed7a	ci: swap freeport for portal in packages (#15661 )	2023-01-03 11:25:20 -06:00
Lance Haig	8667dc2607	Add command "nomad tls" (#14296 )	2022-11-22 14:12:07 -05:00
Michael Schurter	617f223242	Fixing flaky TestOverlap test (#14780 ) * test: ensure feasible node selected in overlap test * test: warn when getting close to retry limit	2022-10-03 14:35:02 -07:00
Tim Gross	786dc5ff94	fingerprint: don't clear Consul/Vault attributes on failure (#14673 ) Clients periodically fingerprint Vault and Consul to ensure the server has updated attributes in the client's fingerprint. If the client can't reach Vault/Consul, the fingerprinter clears the attributes and requires a node update. Although this seems like correct behavior so that we can detect intentional removal of Vault/Consul access, it has two serious failure modes: (1) If a local Consul agent is restarted to pick up configuration changes and the client happens to fingerprint at that moment, the client will update its fingerprint and result in evaluations for all its jobs and all the system jobs in the cluster. (2) If a client loses Vault connectivity, the same thing happens. But the consequences are much worse in the Vault case because Vault is not run as a local agent, so Vault connectivity failures are highly correlated across the entire cluster. A 15 second Vault outage will cause a new `node-update` evalution for every system job on the cluster times the number of nodes, plus one `node-update` evaluation for every non-system job on each node. On large clusters of 1000s of nodes, we've seen this create a large backlog of evaluations. This changeset updates the fingerprinting behavior to keep the last fingerprint if Consul or Vault queries fail. This prevents a storm of evaluations at the cost of requiring a client restart if Consul or Vault is intentionally removed from the client.	2022-09-23 14:45:12 -04:00
Mahmood Ali	757c3c94f2	scheduler: stopped-yet-running allocs are still running (#10446 ) * scheduler: stopped-yet-running allocs are still running * scheduler: test new stopped-but-running logic * test: assert nonoverlapping alloc behavior Also add a simpler Wait test helper to improve line numbers and save few lines of code. * docs: tried my best to describe #10446 it's not concise... feedback welcome * scheduler: fix test that allowed overlapping allocs * devices: only free devices when ClientStatus is terminal * test: output nicer failure message if err==nil Co-authored-by: Mahmood Ali <mahmood@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-09-13 12:52:47 -07:00
Seth Hoenig	0a5992bd20	cli: correctly use and validate job with vault token set This PR fixes `job validate` to respect '-vault-token', '$VAULT_TOKEN', '-vault-namespace' if set.	2022-05-19 12:13:34 -05:00
Eng Zer Jun	fca4ee8e05	test: use `T.TempDir` to create temporary test directory (#12853 ) * test: use `T.TempDir` to create temporary test directory This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The directory created by `t.TempDir` is automatically removed when the test and all its subtests complete. Prior to this commit, temporary directory created using `ioutil.TempDir` needs to be removed manually by calling `os.RemoveAll`, which is omitted in some tests. The error handling boilerplate e.g. defer func() { if err := os.RemoveAll(dir); err != nil { t.Fatal(err) } } is also tedious, but `t.TempDir` handles this for us nicely. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix TestLogmon_Start_restart on Windows Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestConsul_Integration t.TempDir fails to perform the cleanup properly because the folder is still in use testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-05-12 11:42:40 -04:00
Seth Hoenig	b242957990	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Seth Hoenig	8492c6576e	build: upgrade and speedup circleci configuration This PR upgrades our CI images and fixes some affected tests. - upgrade go-machine-image to premade latest ubuntu LTS (ubuntu-2004:202111-02) - eliminate go-machine-recent-image (no longer necessary) - manage GOPATH in GNUMakefile (see https://discuss.circleci.com/t/gopath-is-set-to-multiple-directories/7174) - fix tcp dial error check (message seems to be OS specific) - spot check values measured instead of specifically 'RSS' (rss no longer reported in cgroups v2) - use safe MkdirTemp for generating tmpfiles NOT applied: (too flakey) - eliminate setting GOMAXPROCS=1 (build tools were also affected by this setting) - upgrade resource type for all imanges to large (2C -> 4C)	2022-01-24 08:28:14 -06:00
Dave May	6ede4b9285	cli: refactor operator debug capture (#11466 ) * debug: refactor Consul API collection * debug: refactor Vault API collection * debug: cleanup test timing * debug: extend test to multiregion * debug: save cmdline flags in bundle * debug: add cli version to output * Add changelog entry	2021-11-05 19:43:10 -04:00
Michael Schurter	eeb1da8a2e	test: update tests to properly use AllocDir Also use t.TempDir when possible.	2021-10-19 10:49:07 -07:00
Dave May	1d30caafad	cli: rename paths in debug bundle for clarity (#11307 ) * Rename folders to reflect purpose * Improve captured files test coverage * Rename CSI plugins output file * Add changelog entry * fix test and make changelog message more explicit Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2021-10-13 18:00:55 -04:00
Dave May	1bd132f09d	debug: Improve namespace and region support (#11269 ) * Include region and namespace in CLI output * Add region and prefix matching for server members * Add namespace and region API outputs to cluster metadata folder * Add region awareness to WaitForClient helper function * Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice * Refactor test client agent generation * Add tests for region * Add changelog	2021-10-12 16:58:41 -04:00
Dave May	b430bafe90	Add remaining pprof profiles to nomad operator debug (#10748 ) * Add remaining pprof profiles to debug dump * Refactor pprof profile capture * Add WaitForFilesUntil and WaitForResultUntil utility functions * Add CHANGELOG entry	2021-06-21 14:22:49 -04:00
Mahmood Ali	122a4cb844	tests: use standard library testing.TB Glint pulled in an updated version of mitchellh/go-testing-interface which broke some existing tests because the update added a Parallel() method to testing.T. This switches to the standard library testing.TB which doesn't have a Parallel() method.	2021-06-09 16:18:45 -07:00
Charlie Voiselle	d914990e5f	Fixup uses of `sanity` (#10187 ) * Fixup uses of `sanity` * Remove unnecessary comments. These checks are better explained by earlier comments about the context of the test. Per @tgross, moved the tests together to better reinforce the overall shared context. * Update nomad/fsm_test.go	2021-03-16 18:05:08 -04:00
Dennis Schön	582d3b7092	use os.ErrDeadlineExceeded in tests	2020-12-07 10:40:28 -05:00
Dave May	205b0e7cae	nomad operator debug - add client node filtering arguments (#9331 ) * operator debug - add client node filtering arguments * add WaitForClient helper function * use RPC in WaitForClient to avoid unnecessary imports * guard against nil values * move initialization up and shorten test duration * cleanup nodeLookupFailCount logic * only display max node notice if we actually tried to capture nodes	2020-11-12 11:25:28 -05:00
Dave May	71a022ad8c	Metrics gotemplate support, debug bundle features (#9067 ) * add goroutine text profiles to nomad operator debug * add server-id=all to nomad operator debug * fix bug from changing metrics from string to []byte * Add function to return MetricsSummary struct, metrics gotemplate support * fix bug resolving 'server-id=all' when no servers are available * add url to operator_debug tests * removed test section which is used for future operator_debug.go changes * separate metrics from operator, use only structs from go-metrics * ensure parent directories are created as needed * add suggested comments for text debug pprof * move check down to where it is used * add WaitForFiles helper function to wait for multiple files to exist * compact metrics check Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com> * fix github's silly apply suggestion Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-10-14 15:16:10 -04:00
Mahmood Ali	bcc4ec910d	gracefully shutdown test server	2020-05-27 08:59:06 -04:00

1 2 3

124 Commits