nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-04 09:25:46 +03:00

Author	SHA1	Message	Date
Piotr Kazmierczak	bb3d2227a2	e2e: add a test for checking default WI Consul workflow for services and tasks (#19500 )	2024-01-02 16:02:32 +01:00
Daniel Bennett	eb23add189	e2e: sleep in docker job (#19434 )	2023-12-11 15:38:14 -06:00
Tom Davies	c983a8f0ad	Fixes Consul token checking when policies exist within namespaces (#18516 ) * e2e/connect: adds test for namespace policies * consul: use token namespace when fetching policies * changelog * fixup! e2e/connect: adds test for namespace policies	2023-12-11 10:07:32 -06:00
Seth Hoenig	f3cbe2e29a	e2e: sleep a bit in short lived docker jobs (#19384 )	2023-12-08 10:44:43 -06:00
Daniel Bennett	e9ff6d74d3	e2e: unflake oversubscription.testExec (#19373 ) poll with must.Wait() instead of hard-coded sleep waiting for poststart task to run, and wait for longer	2023-12-08 10:20:18 -06:00
Daniel Bennett	7baf3c012c	e2e: even more time for exec+java tests (#19347 )	2023-12-07 10:23:39 -06:00
Seth Hoenig	8cde7a4f70	e2e: turn of extreme verbose metrics test logging (#19330 )	2023-12-06 16:08:49 -06:00
Tim Gross	340c9ebd47	E2E: extend timeout on CSI snapshot test (#19338 ) The EBS snapshot operation can take a long time to complete. Recent runs have shown we sometimes get up to the 10s timeout on the context we're giving the CLI command. Extend this so that we're not getting spurious timeouts. Fixes: https://github.com/hashicorp/nomad/issues/19118	2023-12-06 16:34:54 -05:00
Daniel Bennett	36f69a8e88	e2e: more occasionally slow exec tasks (#19337 )	2023-12-06 15:22:15 -06:00
Daniel Bennett	9fe1f0aadc	e2e: fix ConsulNamespaces tests (#19325 ) * cleanup consul tokens by accessor id rather than secret id, which has been failing for some time with: > 404 (Cannot find token to delete) * expect subset of consul namespaces the consul test cluster may have namespaces from other unrelated tests	2023-12-06 12:21:27 -06:00
Seth Hoenig	87e7bf4ab2	e2e: skip connect test that does a restart of nomad agent (#19316 )	2023-12-05 09:15:09 -06:00
Seth Hoenig	35ccb7ecdb	e2e: use correct url to download zip file from go-getter repository (#19315 )	2023-12-05 09:11:08 -06:00
Seth Hoenig	cc65f39c82	e2e/v3: dump eval if detected as cancelled (#19310 )	2023-12-05 09:07:12 -06:00
Daniel Bennett	c7d01705f5	e2e: push nomad token to servers (#19312 ) so humans with root shell access can use it to debug not ideal security, but this is a short-lived test cluster	2023-12-05 08:54:57 -06:00
Seth Hoenig	6779d7c7b4	e2e: add a ShowState() option to cluster3.Establish options (#19303 ) This will dump much of the interesting parts of cluster state, including available nodes and their status, existing allocations and their status, and existing evaluations and their status.	2023-12-04 12:37:21 -06:00
Daniel Bennett	d34788896f	e2e: jobs3-submitted jobs automatically cleanup (#19284 ) so that cleanup occurs even if the job fails to run (unless configured not to)	2023-12-01 15:57:23 -06:00
Daniel Bennett	bfb2263f30	e2e: give isolation test jobs more time to start (#19276 )	2023-12-01 14:03:40 -06:00
Seth Hoenig	5b3416bd97	e2e: set e2e/v3 debug logging on metrics test (#19263 )	2023-12-01 10:03:55 -06:00
Tim Gross	05fe2ad191	E2E: fix assertion in CT native service lookup test (#19249 ) When porting the `ConsulTemplate` test, I made a last-minute refactor to the assertions for waiting on files, and accidentally inverted the test assertion in the process. Also, when running `jobs3.Submit` you need to include the `Namespace` option so that the cleanup function that gets return deletes the job from the correct namespace. This was causing the namespace cleanup to fail because the job deletion had failed.	2023-12-01 08:54:55 -05:00
Daniel Bennett	639c3f53c9	e2e: give node drain KillTimeout test more time (#19226 ) and error more verbosely if it fails also, add extra information to a failed evaluation for more error visibility in other tests --------- Co-authored-by: Juanadelacuesta <juanita.delacuestamorales@hashicorp.com>	2023-11-30 10:37:20 -06:00
Luiz Aoqui	969cdb0f46	test: add consul namespace rules to consulcompat (#19227 ) When configuring Consul for multi-namespace support, the JWT auth method needs to specify namespace rules. This attribute is set to `nil` in CE but is used in Nomad ENT.	2023-11-30 10:13:08 -05:00
Tim Gross	4e7ad58d2d	E2E: modernize `ConsulTemplate` test and fix some assertions (#19126 ) The `TestTemplateUpdateTriggers` is flaky because of what turned out to be incompatibility between the Consul agent on the E2E cluster and the HCP Consul server we were running but hadn't upgraded in a while. Upgrading the HCP Consul server seems to have fixed the tests, but while I'm in here I've updated this test suite: * Port all the consul template test suite off of the old framework, and upgrade to using e2e "v3" where feasible. * Clean up some of the assertions in the update triggers test to make the purpose of the test more clear. * Remove unnecessary default fields from the job specs. Closes: #19075	2023-11-29 12:16:41 -05:00
Daniel Bennett	f7adcefbb3	e2e: refactor vault secrets test (#19152 ) fixes VaultSecrets test - it was failing due to a regex mismatch (`^job` stopped matching when copywrite headers got prepended to the jobspec). but RegisterFromJobspec (which had the bug) was only used in the one spot, so instead this refactors the whole test to the v3 format with testing.T and some additional fun stuff that we can take advantage of with it. some improvements: * use a namespace * use and extend existing test helpers * add more test helpers	2023-11-28 10:00:27 -06:00
Piotr Kazmierczak	248b2ba5cd	WI: use single auth method for Consul by default (#19169 ) This simplifies the default setup of Nomad workloads WI-based authentication for Consul by using a single auth method with 2 binding rules. Users can still specify separate auth methods for services and tasks.	2023-11-28 12:22:27 +01:00
Daniel Bennett	eb56fce393	e2e: fix ui tests (#19138 )	2023-11-27 12:26:19 -06:00
Tim Gross	ff928a8045	E2E: remove assertion from ACL role test (#19121 ) The ACL role test asserts that the role has various permissions by listing jobs in namespaces. It never creates jobs, because we can make all the assertions we need by checking the error. But the test included an assertion that the namespace was empty. Usually this will be the case, but if the previous test case has not completed its GC (which is sync), then it's possible a stopped job will be in the namespace. Because this assertion is irrelevant for this test, remove it.	2023-11-17 14:35:20 -05:00
Tim Gross	a3f8a52fd4	E2E: fix assertion in dynamic node metadata test (#19120 ) In #18664 we change how null values worked with dynamic node metadata so that they were no longer returned if there wasn't also a static value for that key. The test assertion in E2E was not updated to match the new behavior. Fixes: #19112	2023-11-17 14:35:07 -05:00
Tim Gross	c144af2823	E2E: fix expected error message from token expiration test (#19119 ) In Nomad 1.5 we started masking the specific error returned from the authentication method and returned the "permission denied" error instead. Update the E2E test that covers token expiration to ensure we're asserting the correct error here. Fixes: https://github.com/hashicorp/nomad/issues/16803	2023-11-17 14:00:47 -05:00
Daniel Bennett	4ec9343447	e2e: use tf variable defaults (#19108 )	2023-11-16 14:50:11 -06:00
Tim Gross	da61d278d0	e2e: fix and modernize rescheduling test (#19105 ) The E2E test suite for rescheduling had a few bugs: * Using the command line to stop a job with a failing deployment returns a non-zero exit code, which would cause an otherwise passing test to fail. * Two of the input jobs were actually invalid but were only correctly detected as such because of #17342 This changeset also updates the whole test suite to move it off the v1 "framework". A few test assertions are also de-flaked. Fixes: #19076	2023-11-16 15:39:18 -05:00
Tim Gross	8fac70c92c	E2E: refactor `vaultcompat` to allow for ENT tests (#19081 ) We want to run the Vault compatibility E2E test with Vault Enterprise binaries and use Vault namespaces. Refactor the `vaultcompat` test so as to parameterize most of the test setup logic with the namespace, and add the appropriate build tag for the CE version of the test.	2023-11-14 09:54:47 -05:00
Tim Gross	1c9c75cc83	E2E: refactor `consulcompat` to allow for ENT tests (#19068 ) We want to run the Consul compatibility E2E test with Consul Enterprise binaries and use Consul namespaces. Refactor the `consulcompat` test so as to parameterize most of the test setup logic with the namespace, and add the appropriate build tag for the CE version of the test. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1305	2023-11-10 15:05:51 -05:00
Seth Hoenig	5987ba434f	e2ev3: wait for logs to become ready (#19067 ) Just because an alloc is running does not mean nomad is ready to serve task logs. In a test case where you immediatly read logs after starting a task, it could be that nomad responds with "no logs found" when you try to read logs, in which case you just need to wait longer. Do so in the v3 TaskLogs helper function.	2023-11-10 12:43:16 -06:00
Seth Hoenig	c17333d74a	e2e refactor oversubscription (#19060 ) * e2e: remove old oversubscription test * e2e: fixup and cleanup oversubscription test suite Fix and cleanup this old oversubscription test. * use t.Cleanup instead of defer in tests	2023-11-10 09:25:32 -06:00
Tim Gross	4e38b41d9d	E2E: add template block to `consulcompat` test (#19055 ) The Consul compatibility test focuses on Connect, but it'd be a good idea to ensure we can successfully get template data out of Consul as well. Also tightens up the test's Consul ACL policy for the Nomad agent.	2023-11-10 09:25:37 -05:00
Seth Hoenig	1f957947b4	e2e: refactor nomadexec test suite (#19054 )	2023-11-10 07:09:24 -06:00
Seth Hoenig	2f8d94ae3e	e2e: more cpu and memory for java tasks and some scripts (#19057 )	2023-11-10 07:08:14 -06:00
Seth Hoenig	f211a0ab7c	e2e: update terrform lock file for 1.6.3 (#19049 ) Using the latest version of terraform, the lock file is not the same as when it was generated. Seems like the http module is not needed? versioned? present? anymore.	2023-11-09 10:49:49 -06:00
Seth Hoenig	402540f7fb	e2e: bump packer build instances because faster (#19046 )	2023-11-09 09:33:30 -06:00
Seth Hoenig	a28e5b6965	e2e: refactor metrics test to use NSD and WI (#19022 ) * e2e: remove old metrics suite * e2e: install stress on e2e jammy image * e2e: overhaul metrics test to use nomad service discovery, workload identity * e2e: format metrics hcl files and copywrite * e2e: undo tf lock file * e2e: undo reg auth file perms * e2e: format cpustress.hcl	2023-11-09 08:21:16 -06:00
Tim Gross	9d075c44b2	config: remove old Vault/Consul config blocks from parser (#18997 ) Remove the now-unused original configuration blocks for Consul and Vault from the agent configuration parsing. When the agent needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service (or the default cluster for the agent's own use). This is third of three changesets for this work. Fixes: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Ref: https://github.com/hashicorp/nomad/pull/18994	2023-11-08 09:30:08 -05:00
Seth Hoenig	63da22063b	e2e: update pledge driver to 0.3.0 (#19020 )	2023-11-08 06:58:59 -06:00
Seth Hoenig	a2f7ab2645	e2e disable windows (#19012 ) * e2e: disable windows client * e2e: disable windows artifact test	2023-11-07 09:34:18 -06:00
Luiz Aoqui	bfb2dcd172	Vault small fixes (#18942 ) * vault: remove `token_ttl` from `vaultcompat` setup Since Nomad uses periodic tokens, the right value to set in the role is `token_period`, not `token_ttl`. * vault: set 1.11.0 as min version for JWT auth In order to use workload identities JWT auth with Vault it's required to have a Vault cluster running v1.11.0+, which the version where `user_claim_json_pointer` was introduced.	2023-11-01 08:23:19 -04:00
Michael Schurter	66fbc0f67e	identity: default to RS256 for new workload ids (#18882 ) OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256. Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again. Test Updates Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed. I added a new `testutil.WaitForKeyring` helper to replace `testutil.WaitForLeader` in cases where the keyring must be initialized before the test may continue. However this is mostly used in the `nomad/` package. In the `api` and `command/agent` packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners. TODO - Docs and changelog entries. - Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days. - Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this. Requiem for ed25519 When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset. EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties: 1. Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key. 2. Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious. 3. No parameters. No choices of encodings. It's all well-defined by [RFC 8032](https://datatracker.ietf.org/doc/html/rfc8032). 4. Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus! 5. Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too. Life was good with ed25519, but sadly it could not last. [IDPs](https://en.wikipedia.org/wiki/Identity_provider), such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256: - [PayPal](https://www.paypalobjects.com/.well-known/openid-configuration) supports HS256 - [Yahoo](https://api.login.yahoo.com/.well-known/openid-configuration) supports ES256 RS256 only: - [GitHub](https://token.actions.githubusercontent.com/.well-known/openid-configuration) - [GitLab](https://gitlab.com/.well-known/openid-configuration) - [Google](https://accounts.google.com/.well-known/openid-configuration) - [Intuit](https://developer.api.intuit.com/.well-known/openid_configuration) - [Microsoft](https://login.microsoftonline.com/fabrikamb2c.onmicrosoft.com/v2.0/.well-known/openid-configuration) - [SalesForce](https://login.salesforce.com/.well-known/openid-configuration) - [SimpleLogin (acquired by ProtonMail)](https://app.simplelogin.io/.well-known/openid-configuration/) - [TFC](https://app.terraform.io/.well-known/openid-configuration)	2023-10-31 11:25:20 -07:00
Tim Gross	139a96ad12	e2e: fix bind name to allow Connect reachability (#18878 ) The `BindName` for JWT authentication should always bind to the `nomad_service` field in the JWT and not include the namespace, as the `nomad_service` is what's actually registered in Consul. * Fix the binding rule for the `consulcompat` test * Add a reachability assertion so that we don't miss regressions. * Ensure we have a clean shutdown so that we don't leak state (containers and iptables) between tests.	2023-10-27 10:15:17 -04:00
Piotr Kazmierczak	7f62dec473	consul WI: rename default auth method for services (#18867 ) It should be called nomad-services instead of nomad-workloads.	2023-10-26 09:43:33 +02:00
Tim Gross	6c2d5a0fbb	E2E: Consul compatibility matrix tests (#18799 ) Set up a new test suite that exercises Nomad's compatibility with Consul. This suite installs all currently supported versions of Consul, spins up a Consul agent with appropriate configuration, and a Nomad agent running in dev mode. Then it runs a Connect job against each pair.	2023-10-24 16:03:53 -04:00
Kerim Satirli	5e1bbf90fc	docs: update all URLs to `developer.hashicorp.com` (#16247 )	2023-10-24 11:00:11 -04:00
Luiz Aoqui	70b1862026	test: add E2E `vaultcompat` test for JWT auth flow (#18822 ) Test the JWT auth flow using real Nomad and Vault agents.	2023-10-23 20:00:55 -04:00

1 2 3 4 5 ...

689 Commits