Commit Graph

697 Commits

Author SHA1 Message Date
Piotr Kazmierczak
858a805d7d e2e: add a note about provisioning the infrastructure on macOS/Apple Silicon (#19727) 2024-01-12 14:09:50 +01:00
Seth Hoenig
a58f0eca8e e2e: move rawexec oversub tests into oversubscription e2e test suite (#19717)
* e2e: move rawexec oversub tests into oversubscription e2e test suite

This PR moves two tests for raw_exec and memory oversubscription into
the oversubscription test suite, which has the necessary plumbing to
activate and restore the oversubscription configuration of the scheduler
during the test.

* cr: rename files for better readability
2024-01-11 14:27:05 -06:00
Piotr Kazmierczak
930339a0fa e2e: remove broken Consul WI test (#19697) 2024-01-10 21:31:18 +01:00
Seth Hoenig
cb7d078c1d drivers/raw_exec: enable configuring raw_exec task to have no memory limit (#19670)
* drivers/raw_exec: enable configuring raw_exec task to have no memory limit

This PR makes it possible to configure a raw_exec task to not have an
upper memory limit, which is how the driver would behave pre-1.7.

This is done by setting memory_max = -1. The cluster (or node pool) must
have memory oversubscription enabled.

* cl: add cl
2024-01-09 14:57:13 -06:00
Seth Hoenig
ccfb13a72d e2e: add test for raw_exec memory_max configuration (#19596)
* e2e: add test for raw_exec memory_max configuration

* docs: note raw_exec supports memory_max in resources documentation
2024-01-04 08:25:56 -06:00
Piotr Kazmierczak
aa197cf824 e2e: pass Nomad address to Consul WI test (#19603) 2024-01-04 08:52:39 +01:00
Piotr Kazmierczak
a87aa71f55 e2e: fix typo in Consul e2e (#19589) 2024-01-03 09:34:38 +01:00
Matt Robenolt
656bb5cafa drivers/executor: set oom_score_adj for raw_exec (#19515)
* drivers/executor: set oom_score_adj for raw_exec

This might not be wholly true since I don't know all configurations of
Nomad, but in our use cases, we run some of our tasks as `raw_exec` for
reasons.

We observed that our tasks were running with `oom_score_adj = -1000`,
which prevents them from being OOM'd. This value is being inherited from
the nomad agent parent process, as configured by systemd.

Similar to #10698, we also were shocked to have this value inherited
down to every child process and believe that we should also set this
value to 0 explicitly.

I have no idea if there are other paths that might leverage this or
other ways that `raw_exec` can manifest, but this is how I was able to
observe and fix in one of our configurations.

We have been running in production our tasks wrapped in a script that
does: `echo 0 > /proc/self/oom_score_adj` to avoid this issue.

* drivers/executor: minor cleanup of setting oom adjustment

* e2e: add test for raw_exec oom adjust score

* e2e: set oom score adjust to -999

* cl: add cl

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-01-02 13:35:09 -06:00
Piotr Kazmierczak
bb3d2227a2 e2e: add a test for checking default WI Consul workflow for services and tasks (#19500) 2024-01-02 16:02:32 +01:00
Daniel Bennett
eb23add189 e2e: sleep in docker job (#19434) 2023-12-11 15:38:14 -06:00
Tom Davies
c983a8f0ad Fixes Consul token checking when policies exist within namespaces (#18516)
* e2e/connect: adds test for namespace policies

* consul: use token namespace when fetching policies

* changelog

* fixup! e2e/connect: adds test for namespace policies
2023-12-11 10:07:32 -06:00
Seth Hoenig
f3cbe2e29a e2e: sleep a bit in short lived docker jobs (#19384) 2023-12-08 10:44:43 -06:00
Daniel Bennett
e9ff6d74d3 e2e: unflake oversubscription.testExec (#19373)
poll with must.Wait() instead of hard-coded sleep
waiting for poststart task to run, and wait for longer
2023-12-08 10:20:18 -06:00
Daniel Bennett
7baf3c012c e2e: even more time for exec+java tests (#19347) 2023-12-07 10:23:39 -06:00
Seth Hoenig
8cde7a4f70 e2e: turn of extreme verbose metrics test logging (#19330) 2023-12-06 16:08:49 -06:00
Tim Gross
340c9ebd47 E2E: extend timeout on CSI snapshot test (#19338)
The EBS snapshot operation can take a long time to complete. Recent runs have
shown we sometimes get up to the 10s timeout on the context we're giving the CLI
command. Extend this so that we're not getting spurious timeouts.

Fixes: https://github.com/hashicorp/nomad/issues/19118
2023-12-06 16:34:54 -05:00
Daniel Bennett
36f69a8e88 e2e: more occasionally slow exec tasks (#19337) 2023-12-06 15:22:15 -06:00
Daniel Bennett
9fe1f0aadc e2e: fix ConsulNamespaces tests (#19325)
* cleanup consul tokens by accessor id
rather than secret id, which has been failing for some time with:
> 404 (Cannot find token to delete)

* expect subset of consul namespaces
the consul test cluster may have namespaces from other unrelated tests
2023-12-06 12:21:27 -06:00
Seth Hoenig
87e7bf4ab2 e2e: skip connect test that does a restart of nomad agent (#19316) 2023-12-05 09:15:09 -06:00
Seth Hoenig
35ccb7ecdb e2e: use correct url to download zip file from go-getter repository (#19315) 2023-12-05 09:11:08 -06:00
Seth Hoenig
cc65f39c82 e2e/v3: dump eval if detected as cancelled (#19310) 2023-12-05 09:07:12 -06:00
Daniel Bennett
c7d01705f5 e2e: push nomad token to servers (#19312)
so humans with root shell access can use it to debug

not ideal security, but this is a short-lived test cluster
2023-12-05 08:54:57 -06:00
Seth Hoenig
6779d7c7b4 e2e: add a ShowState() option to cluster3.Establish options (#19303)
This will dump much of the interesting parts of cluster state, including
available nodes and their status, existing allocations and their status,
and existing evaluations and their status.
2023-12-04 12:37:21 -06:00
Daniel Bennett
d34788896f e2e: jobs3-submitted jobs automatically cleanup (#19284)
so that cleanup occurs even if the job fails to run
(unless configured not to)
2023-12-01 15:57:23 -06:00
Daniel Bennett
bfb2263f30 e2e: give isolation test jobs more time to start (#19276) 2023-12-01 14:03:40 -06:00
Seth Hoenig
5b3416bd97 e2e: set e2e/v3 debug logging on metrics test (#19263) 2023-12-01 10:03:55 -06:00
Tim Gross
05fe2ad191 E2E: fix assertion in CT native service lookup test (#19249)
When porting the `ConsulTemplate` test, I made a last-minute refactor to the
assertions for waiting on files, and accidentally inverted the test assertion in
the process.

Also, when running `jobs3.Submit` you need to include the `Namespace` option so
that the cleanup function that gets return deletes the job from the correct
namespace. This was causing the namespace cleanup to fail because the job
deletion had failed.
2023-12-01 08:54:55 -05:00
Daniel Bennett
639c3f53c9 e2e: give node drain KillTimeout test more time (#19226)
and error more verbosely if it fails

also, add extra information to a failed evaluation
for more error visibility in other tests

---------

Co-authored-by: Juanadelacuesta <juanita.delacuestamorales@hashicorp.com>
2023-11-30 10:37:20 -06:00
Luiz Aoqui
969cdb0f46 test: add consul namespace rules to consulcompat (#19227)
When configuring Consul for multi-namespace support, the JWT auth method
needs to specify namespace rules. This attribute is set to `nil` in CE
but is used in Nomad ENT.
2023-11-30 10:13:08 -05:00
Tim Gross
4e7ad58d2d E2E: modernize ConsulTemplate test and fix some assertions (#19126)
The `TestTemplateUpdateTriggers` is flaky because of what turned out to be
incompatibility between the Consul agent on the E2E cluster and the HCP Consul
server we were running but hadn't upgraded in a while. Upgrading the HCP Consul
server seems to have fixed the tests, but while I'm in here I've updated this
test suite:

* Port all the consul template test suite off of the old framework, and upgrade to
  using e2e "v3" where feasible.
* Clean up some of the assertions in the update triggers test to make the
  purpose of the test more clear.
* Remove unnecessary default fields from the job specs.

Closes: #19075
2023-11-29 12:16:41 -05:00
Daniel Bennett
f7adcefbb3 e2e: refactor vault secrets test (#19152)
fixes VaultSecrets test - it was failing due to a
regex mismatch (`^job` stopped matching when
copywrite headers got prepended to the jobspec).

but RegisterFromJobspec (which had the bug)
was only used in the one spot, so instead this
refactors the whole test to the v3 format
with testing.T and some additional fun stuff
that we can take advantage of with it.

some improvements:
* use a namespace
* use and extend existing test helpers
* add more test helpers
2023-11-28 10:00:27 -06:00
Piotr Kazmierczak
248b2ba5cd WI: use single auth method for Consul by default (#19169)
This simplifies the default setup of Nomad workloads WI-based
authentication for Consul by using a single auth method with 2 binding rules.

Users can still specify separate auth methods for services and tasks.
2023-11-28 12:22:27 +01:00
Daniel Bennett
eb56fce393 e2e: fix ui tests (#19138) 2023-11-27 12:26:19 -06:00
Tim Gross
ff928a8045 E2E: remove assertion from ACL role test (#19121)
The ACL role test asserts that the role has various permissions by listing jobs
in namespaces. It never creates jobs, because we can make all the assertions we
need by checking the error. But the test included an assertion that the
namespace was empty. Usually this will be the case, but if the previous test
case has not completed its GC (which is sync), then it's possible a stopped job
will be in the namespace. Because this assertion is irrelevant for this test,
remove it.
2023-11-17 14:35:20 -05:00
Tim Gross
a3f8a52fd4 E2E: fix assertion in dynamic node metadata test (#19120)
In #18664 we change how null values worked with dynamic node metadata so that
they were no longer returned if there wasn't also a static value for that
key. The test assertion in E2E was not updated to match the new behavior.

Fixes: #19112
2023-11-17 14:35:07 -05:00
Tim Gross
c144af2823 E2E: fix expected error message from token expiration test (#19119)
In Nomad 1.5 we started masking the specific error returned from the
authentication method and returned the "permission denied" error instead. Update
the E2E test that covers token expiration to ensure we're asserting the correct
error here.

Fixes: https://github.com/hashicorp/nomad/issues/16803
2023-11-17 14:00:47 -05:00
Daniel Bennett
4ec9343447 e2e: use tf variable defaults (#19108) 2023-11-16 14:50:11 -06:00
Tim Gross
da61d278d0 e2e: fix and modernize rescheduling test (#19105)
The E2E test suite for rescheduling had a few bugs:
* Using the command line to stop a job with a failing deployment returns a non-zero exit
  code, which would cause an otherwise passing test to fail.
* Two of the input jobs were actually invalid but were only correctly detected
  as such because of #17342

This changeset also updates the whole test suite to move it off the v1
"framework". A few test assertions are also de-flaked.

Fixes: #19076
2023-11-16 15:39:18 -05:00
Tim Gross
8fac70c92c E2E: refactor vaultcompat to allow for ENT tests (#19081)
We want to run the Vault compatibility E2E test with Vault Enterprise binaries
and use Vault namespaces. Refactor the `vaultcompat` test so as to parameterize
most of the test setup logic with the namespace, and add the appropriate build
tag for the CE version of the test.
2023-11-14 09:54:47 -05:00
Tim Gross
1c9c75cc83 E2E: refactor consulcompat to allow for ENT tests (#19068)
We want to run the Consul compatibility E2E test with Consul Enterprise binaries
and use Consul namespaces. Refactor the `consulcompat` test so as to
parameterize most of the test setup logic with the namespace, and add the
appropriate build tag for the CE version of the test.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1305
2023-11-10 15:05:51 -05:00
Seth Hoenig
5987ba434f e2ev3: wait for logs to become ready (#19067)
Just because an alloc is running does not mean nomad is ready to serve
task logs. In a test case where you immediatly read logs after starting
a task, it could be that nomad responds with "no logs found" when you
try to read logs, in which case you just need to wait longer. Do so in
the v3 TaskLogs helper function.
2023-11-10 12:43:16 -06:00
Seth Hoenig
c17333d74a e2e refactor oversubscription (#19060)
* e2e: remove old oversubscription test

* e2e: fixup and cleanup oversubscription test suite

Fix and cleanup this old oversubscription test.

* use t.Cleanup instead of defer in tests
2023-11-10 09:25:32 -06:00
Tim Gross
4e38b41d9d E2E: add template block to consulcompat test (#19055)
The Consul compatibility test focuses on Connect, but it'd be a good idea to
ensure we can successfully get template data out of Consul as well.

Also tightens up the test's Consul ACL policy for the Nomad agent.
2023-11-10 09:25:37 -05:00
Seth Hoenig
1f957947b4 e2e: refactor nomadexec test suite (#19054) 2023-11-10 07:09:24 -06:00
Seth Hoenig
2f8d94ae3e e2e: more cpu and memory for java tasks and some scripts (#19057) 2023-11-10 07:08:14 -06:00
Seth Hoenig
f211a0ab7c e2e: update terrform lock file for 1.6.3 (#19049)
Using the latest version of terraform, the lock file is not the same
as when it was generated. Seems like the http module is not needed?
versioned? present? anymore.
2023-11-09 10:49:49 -06:00
Seth Hoenig
402540f7fb e2e: bump packer build instances because faster (#19046) 2023-11-09 09:33:30 -06:00
Seth Hoenig
a28e5b6965 e2e: refactor metrics test to use NSD and WI (#19022)
* e2e: remove old metrics suite

* e2e: install stress on e2e jammy image

* e2e: overhaul metrics test to use nomad service discovery, workload identity

* e2e: format metrics hcl files and copywrite

* e2e: undo tf lock file

* e2e: undo reg auth file perms

* e2e: format cpustress.hcl
2023-11-09 08:21:16 -06:00
Tim Gross
9d075c44b2 config: remove old Vault/Consul config blocks from parser (#18997)
Remove the now-unused original configuration blocks for Consul and Vault from
the agent configuration parsing. When the agent needs to refer to a Consul or
Vault block it will always be for a specific cluster for the task/service (or
the default cluster for the agent's own use).

This is third of three changesets for this work.

Fixes: https://github.com/hashicorp/nomad/issues/18947
Ref: https://github.com/hashicorp/nomad/pull/18991
Ref: https://github.com/hashicorp/nomad/pull/18994
2023-11-08 09:30:08 -05:00
Seth Hoenig
63da22063b e2e: update pledge driver to 0.3.0 (#19020) 2023-11-08 06:58:59 -06:00