Commit Graph

22943 Commits

Author SHA1 Message Date
Tim Gross
a29023ef69 E2E: fix debug logging on disconnected clients test (#12621) 2022-04-22 09:07:05 -04:00
James Rasell
2c6966c61a cli: add pagination flags to service info command. (#12730) 2022-04-22 10:32:40 +02:00
Tim Gross
cf913ba66b E2E: make UIs runnable from any working directory (#12739)
The E2E test runner is running from the root of the Nomad
repository. Make this run independent of the working directory for
convenience of developers and the test runner.
2022-04-21 17:00:01 -04:00
Michael Schurter
7af0c3c9e5 cli: add -json flag to support job commands (#12591)
* cli: add -json flag to support job commands

While the CLI has always supported running JSON jobs, its support has
been via HCLv2's JSON parsing. I have no idea what format it expects the
job to be in, but it's absolutely not in the same format as the API
expects.

So I ignored that and added a new -json flag to explicitly support *API*
style JSON jobspecs.

The jobspecs can even have the wrapping {"Job": {...}} envelope or not!

* docs: fix example for `nomad job validate`

We haven't been able to validate inside driver config stanzas ever since
the move to task driver plugins. 😭
2022-04-21 13:20:36 -07:00
Tim Gross
42bcb74a51 cli: detect directory when applying namespace spec file (#12738)
The new `namespace apply` feature that allows for passing a namespace
specification file detects the difference between an empty namespace
and a namespace specification by checking if the file exists. For most
cases, the file will have an extension like `.hcl` and so there's
little danger that a user will apply a file spec when they intended to
apply a file name.

But because directory names typically don't include an extension,
you're much more likely to collide when trying to `namespace apply` by
name only, and then you get a confusing error message of the form:

   Failed to read file: read $namespace: is a directory

Detect the case where the namespace name collides with a directory in
the current working directory, and skip trying to load the directory.
2022-04-21 14:53:45 -04:00
Phil Renaud
a977577e44 [ui, bugfix] Link fix for volumes where per_alloc=true (#12713)
* Allocation page linkfix

* fix added to task page and computed prop moved to allocation model

* Fallback query added to task group when specific volume isnt knowable

* Delog

* link text reflects alloc suffix

* Helper instead of in-template conditionals

* formatVolumeName unit test

* Removing unused helper import
2022-04-21 13:57:18 -04:00
Seth Hoenig
f1fcd50938 Merge pull request #12736 from hashicorp/build-update-go-1.17.9
build: update golang to 1.17.9
2022-04-21 12:13:07 -05:00
Seth Hoenig
7637a6c9c2 build: update golang version script to use .go-version file 2022-04-21 12:10:14 -05:00
Seth Hoenig
8206621833 Merge pull request #12737 from hashicorp/buid-update-ec2-instances
build: update ec2 instance profiles
2022-04-21 11:57:40 -05:00
Seth Hoenig
96b6a8d985 build: update ec2 instance profiles
using tools/ec2info
2022-04-21 11:47:40 -05:00
Seth Hoenig
91d91e28e4 build: update golang to 1.17.9 2022-04-21 11:43:01 -05:00
Tim Gross
55ca76e205 docker: back out cgroup v2 OOM detection (#12735)
When shutting down an allocation that ends up needing to be
force-killed, we're getting a spurious "OOM Killed (137)" message on
the task termination event. We introduced this as part of cgroups v2
support because the Docker daemon isn't detecting the container status
correctly. Although exit code 137 is the exit code we get for
OOM-killed processes, that's because OOM kill is a `SIGKILL`. So any
sigkilled process will get that exit code.
2022-04-21 12:31:34 -04:00
Tim Gross
5c17f91117 E2E: set longer timeout for CSI plugin alloc start (#12732)
The CSI plugin allocations take a while to be marked healthy,
sometimes causing E2E test flakes during the setup phase of the
tests. There's nothing CSI specific about marking plugin allocs
healthy, as the plugin supervisor hook does all the fingerprinting in
the postrun hook (the prestart hook just makes a couple of empty
directories). The timeouts we're seeing may be because of where we're
pulling the images from; most our jobs pull from a CDN-backed public
registry whereas these are pulling from ECR. Set a 1min timeout for
these to make sure we have enough time to pull the image and start the
task.
2022-04-21 11:11:43 -04:00
James Rasell
15e6e5befc api: Add support for filtering and pagination to the node list endpoint (#12727) 2022-04-21 17:04:33 +02:00
Tim Gross
1f1c970135 docs: fix broken link from template to client config (#12733) 2022-04-21 11:04:04 -04:00
Derek Strickland
5b1413a1ae reconciler: Handle canaries when client disconnects (#12539)
* plan_apply: Allow node updates in disconnected node plans
* plan: Keep the job when persisting unknown allocs
* reconciler: stop unknown allocs when stopping all
* reconcile_util: reorder filtering to handle canaries; skip rescheduling unknown
* heartbeat: Fix bug in node heartbeating
2022-04-21 10:05:58 -04:00
Tim Gross
d1aa801407 E2E: playwright configuration and smoke test (#12721)
Scripts for running playwright tests in a Docker container that has
chromium and webkit preinstalled. Includes a basic smoke test for
authentication so that we can be sure the test rig is working
end-to-end. Wiring this up in CI will be in an upcoming PR.
2022-04-21 09:13:10 -04:00
James Rasell
a911d83cf4 docs: update HCL2 dynamic example to use block with label. (#12715) 2022-04-21 10:18:04 +02:00
James Rasell
61ec5f0456 autopilot: correctly return errors within state functions. (#12714) 2022-04-21 08:54:50 +02:00
Luiz Aoqui
7f1b838abb ui: fix bug that prevented files streaming (#12719)
During the Ember dependecy upgrade work,
https://github.com/hashicorp/nomad/commit/ce8c039f4ce7359d60ede5dee36b9cef82
moved the `isSupported` method from using Ember's `reopenClass` to a
getter, but `reopenClass` creates a static method, so the getter must be
static as well.
2022-04-20 14:39:18 -04:00
Gowtham
f601cc39b1 Add Concurrent Download Support for artifacts (#11531)
* add concurrent download support - resolves #11244

* format imports

* mark `wg.Done()` via `defer`

* added tests for successful and failure cases and resolved some goleak

* docs: add changelog for #11531

* test typo fixes and improvements

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-04-20 10:15:56 -07:00
James Rasell
8eb569faf4 job_hooks: add implicit constraint when using Consul for services. (#12602) 2022-04-20 14:09:13 +02:00
James Rasell
4c55339cc6 client: add NOMAD_SHORT_ALLOC_ID allocation env var. (#12603) 2022-04-20 10:30:48 +02:00
Tim Gross
aafcf97984 E2E: provide options for reverse proxy for web UI (#12671)
Our E2E test environment is deployed with mTLS, but it's impractical
for us to use mTLS in headless browsers for automated testing (or even
in manual testing). Provide certificates for proxying the web UI via
Nginx. This proxy uses client certs for proxying to the HTTP endpoint
and a self-signed cert for the browser-facing endpoint. We can accept
certificate errors in the automated tests we'll be adding in the next
step of this work.
2022-04-19 16:55:05 -04:00
Tim Gross
e2a8d45f2d E2E: terraform provisioner upgrades (#12652)
While working on infrastructure for testing the UI in E2E, we needed
to upgrade the certificate provider. Performing a provider upgrade via
the TF `init -upgrade` brought in updates for the file and AWS
providers as well. These updates include deprecating the use of
`sensitive_content` fields, removing CA algorithm parameters that can
be inferred from keys, and removing the requirement to manually
specify AWS assume role parameters in the provider config if they're
available in the calling environment's AWS config file (as they are
via doormat or our E2E environment).
2022-04-19 14:27:14 -04:00
Seth Hoenig
19c9779d57 Merge pull request #12604 from hashicorp/b-fixup-chroot-test
ci: fixup task runner chroot test
2022-04-19 12:58:03 -05:00
Seth Hoenig
121d959745 Merge pull request #12622 from hashicorp/b-fix-docker-logger-test
ci: fix docker logger not supported test
2022-04-19 12:57:47 -05:00
Seth Hoenig
a6f345c8f5 ci: fixup task runner chroot test
This PR is 2 fixes for the flaky TestTaskRunner_TaskEnv_Chroot test.

And also the TestTaskRunner_Download_ChrootExec test.

- Use TinyChroot to stop copying gigabytes of junk, which causes GHA
to fail to create the environment in time.

- Pre-create cgroups on V2 systems. Normally the cgroup directory is
managed by the cpuset manager, but that is not active in taskrunner tests,
so create it by hand in the test framework.
2022-04-19 10:37:46 -05:00
Seth Hoenig
cbb09f31a5 ci: fix docker logger not supported test
This test checks for behavior when asking for logs of a docker task
configured with a log driver that does not support streaming logs.

Previously this was using the 'gelf' log driver, but it seems that no
longer returns an error as expected. Instead we can just use the 'none'
log driver, which has the desired effect

2022-04-19T10:23:19.129-0500 [ERROR] docklog/docker_logger.go:133: log streaming ended with terminal error: error="API error (501): configured logging driver does not support reading"
2022-04-19 10:27:01 -05:00
Luiz Aoqui
2319a081b9 changelog: fix entry for #11927 (#12577) 2022-04-19 10:46:25 -04:00
Luiz Aoqui
997252622d changelog: add entry for #11944 (#12578) 2022-04-19 10:46:11 -04:00
Seth Hoenig
3e394fce69 Merge pull request #12586 from hashicorp/f-local-si-token
connect: create SI tokens in local scope
2022-04-19 07:53:01 -05:00
Seth Hoenig
dd2724a8ab cl: add missing prefix 2022-04-19 07:48:56 -05:00
Derek Strickland
9d7ea218bb consul-template: revert function_denylist logic (#12071)
* consul-template: replace config rather than append
Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>
2022-04-18 13:57:56 -04:00
chavacava
334c25834a QueryOptions.SetTimeToBlock should take pointer receiver
Fixes a bug where blocking queries that are retried don't have their blocking 
timeout reset, resulting in them running longer than expected.
2022-04-18 10:41:27 -04:00
Tim Gross
5628caee53 CI: build binaries for UI branches (#12594)
Build binaries for every code change, not just backend code
changes. This means that we'll have up-to-date compiled assets for
every commit available in CircleCI artifacts.
2022-04-18 10:29:20 -04:00
Seth Hoenig
b2a2f77d40 docs: update documentation with connect acls changes
This PR updates the changelog, adds notes the 1.3 upgrade guide, and
updates the connect integration docs with documentation about the new
requirement on Consul ACL policies of Consul agent default anonymous ACL
tokens.
2022-04-18 08:22:33 -05:00
Jorge Marey
7bfb482b1e Change consul SI tokens to be local 2022-04-18 08:22:33 -05:00
Shishir
c86642bae4 Add os to NodeListStub struct. (#12497)
* Add os to NodeListStub struct.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>

* Add os as a query param to /v1/nodes.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>

* Add test: os as a query param to /v1/nodes.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2022-04-15 17:22:45 -07:00
Tim Gross
0c2732ddce CSI: replace structs->api with serialization extension (#12583)
The CSI HTTP API has to transform the CSI volume to redact secrets,
remove the claims fields, and to consolidate the allocation stubs into
a single slice of alloc stubs. This was done manually in #8590 but
this is a large amount of code and has proven both very bug prone
(see #8659, #8666, #8699, #8735, and #12150) and requires updating
lots of code every time we add a field to volumes or plugins.

In #10202 we introduce encoding improvements for the `Node` struct
that allow a more minimal transformation. Apply this same approach to
serializing `structs.CSIVolume` to API responses.

Also, the original reasoning behind #8590 for plugins no longer holds
because the counts are now denormalized within the state store, so we
can simply remove this transformation entirely.
2022-04-15 14:29:34 -04:00
Tim Gross
acc6baa5fd CSI: fix volume status prefix matching in CLI (#12584)
The API for `CSIVolume.List` sorts by created index and not by ID,
which breaks the logic for prefix matching in the `volume status`
output when the prefix is also an exact match. Ensure that we're
handling this case correctly.
2022-04-15 14:16:30 -04:00
Kevin Wang
dbf269f6bd chore: redirects (#12560)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-04-15 13:13:40 -04:00
Derek Strickland
b43de99ac1 heartbeat: Handle transitioning from disconnected to down (#12559) 2022-04-15 09:47:45 -04:00
Derek Strickland
f5de802993 system_scheduler: support disconnected clients (#12555)
* structs: Add helper method for checking if alloc is configured to disconnect
* system_scheduler: Add support for disconnected clients
2022-04-15 09:31:32 -04:00
Tim Gross
fd21cebec7 CSI: handle per-alloc volumes in alloc status -verbose CLI (#12573)
The Nomad client's `csi_hook` interpolates the alloc suffix with the
volume request's name for CSI volumes with `per_alloc = true`, turning
`example` into `example[1]`. We need to do this same behavior in the
`alloc status` output so that we show the correct volume.
2022-04-15 09:26:19 -04:00
Seth Hoenig
04f6b0aa43 Merge pull request #12579 from hashicorp/ci-missing-packages-oss
ci: ensure package coverage of test-core
2022-04-15 08:11:41 -05:00
Lars Lehtonen
f1ec08cb28 command/agent: check err before close (#12574) 2022-04-15 08:54:03 -04:00
Seth Hoenig
72a4677415 ci: ensure package coverage of test-core 2022-04-14 19:04:06 -05:00
Michael Schurter
19bac3caa8 docs: add plan for node rejected details and more (#12564)
- Moved federation docs to the bottom since *everyone* is potentially
  affected by the other sections on the page, but only users of
  federation are affected by it.
- Added section on the plan for node rejected bug since it is fairly
  easy to diagnose and removing affected nodes is a fairly reliable
  workaround.
- Mention 5s cliff for wait_for_index.
- Remove the lie that we do not have job status metrics! How old was
  that?!
- Reinforce the importance of monitoring basic system resources
2022-04-14 16:09:33 -07:00
Tim Gross
4ca980311c E2E: add debugging outputs for disconnected clients test (#12572)
This test has a failure that's happening only occassionally and not
very reproducibly. Print out the allocation status on test failure so
that we can do some post-mortum debugging of the test on nightly.
2022-04-14 17:03:57 -04:00