Commit Graph

24847 Commits

Author SHA1 Message Date
Tim Gross
18327cd367 consul: handle "not found" errors from Consul when deleting tokens (#17847)
In Consul 1.15.0, the Delete Token API was changed so as to return an error when
deleting a non-existent ACL token. This means that if Nomad successfully deletes
the token but fails to persist that fact, it will get stuck trying to delete a
non-existent token forever.

Update the token deletion function to ignore "not found" errors and treat them
as successful deletions.

Fixes: #17833
2023-07-07 16:22:13 -04:00
Daniel Bennett
243429be11 ci: pull secrets from Vault in nomad-enterprise (#17841) 2023-07-07 14:27:12 -05:00
Seth Hoenig
100c460467 env/aws: updates from ec2info (#17835) 2023-07-07 10:12:05 -05:00
Daniel Bennett
03b8a9add0 ci: windows tests on public runners (#17829)
currently our self-hosted windows runners lack `docker`,
so for now just revert to public runners.
2023-07-06 17:06:55 -05:00
Yorick Gersie
709f20c04a cni: ensure to setup CNI addresses in deterministic order (#17766)
* cni: ensure to setup CNI addresses in deterministic order

  Currently as commented in the code the go-cni library returns an unordered map
  of interfaces. In cases where there are multiple CNI interfaces being created this
  creates a problem with service registration and healthchecking because the first
  address in the map is being used.

  The use case we have where this is an issue is that we run CNI with the macvlan
  plugin to isolate workloads, but they still need to be able to access the host on
  a static address to be able to perform local resolving and hit host services like
  the Consul agent API. To make this work there are 2 options, you either add a
  macvlan interface on the host with an assigned address for each VLAN you have or
  you create an additional veth bridged interface in the container namespace.
  We chose the latter option through a custom CNI plugin but the ordering issue
  leaves us with incorrect service registration.

* Updates after feedback

 * First check for the CNIResult interfaces length, if it's zero we don't need to proceed
   at all.
 * Use sorted interfaces list for the address fallback scenario as well.
 * Remove "found" log message logic, when an address isn't found an error is returned stating
   the allocation could not be configured as an address was missing from the CNIResult. If we
   still need a Warn message then we can add it to the condition that returns the error if no
   address could be found instead of using the "found" bool logic.
2023-07-06 13:25:29 -07:00
Seth Hoenig
878e6b9cf4 website: use full registry name so it works with podman again (#17809) 2023-07-06 13:22:12 -05:00
Daniel Bennett
3d87b3d91f ci: clean GOCACHE before build (#17808)
this is basically to avoid Fear/Uncertainty/Doubt

the github action actions/setup-go
(and, with a different chache key, hashicorp/setup-golang)
caches both GOMODCACHE (go source files), which is good,
and GOCACHE (build outputs), which *might* be bad,
if the cache was built on an OS with an older glibc
than we want to support. from `go help cache`:
> [...] the build cache does not detect changes to
> C libraries imported with cgo.
2023-07-06 12:47:43 -05:00
Daniel Bennett
4738d305c5 ci: dynamic runs-on values for oss/ent (#17775)
so in enterprise we can use Vault for secrets,
without merge conflicts from oss->ent.

also:
* use hashicorp/setup-golang
* setup-js for self-hosted runners
  they don't come with yarn, nor chrome,
  and might not always match node version.
2023-07-06 12:41:17 -05:00
am-ak
bb95009305 docs: fix broken link in security model docs (#17812)
correcting a broken link under "similar to consul" and correcting list formatting under "general mechanisms"
2023-07-06 10:01:36 -04:00
Patric Stout
ede662a828 metrics: add "total_ticks_count" for CPU metrics (#17579)
This counter tells you the total amount of ticks for that CPU
entry since the start of Nomad.
2023-07-05 10:28:55 -04:00
deverton-godaddy
e75ae1de96 [api] Add NetworkStatus to allocation response (#17280)
Service discovery or mesh network systems consuming the Nomad event stream or API need to know the CNI assigned IP for the allocation. This data is returned by the underlying Nomad API but isn't mapped in the response struct.
2023-07-04 19:35:38 -04:00
James Rasell
c883621a9e docs: fix up constraint jobspec HCL format. (#17795) 2023-07-04 13:33:46 +01:00
Phil Renaud
17f63cfd9a Report shows a 3rd party browser extension puts a banner at the top of page and awkwardly shifts nav; this fixes that (#17783) 2023-06-30 17:09:42 -04:00
Phil Renaud
6124c1a6fd [ui] Text wrap long lines of code and logs (#17754)
* Text and code wrapping as a localStorage var

* task-log uses wrapping and kb shortcut

* Word wrap keyboard labels

* Wrapper as a toggle not a button

* Changelog and fixed an extra space trailing log lines

* Moves toggle to inside

* Acceptance tests for ww and toggle click
2023-06-30 17:07:57 -04:00
Tim Gross
63d6af6187 docs: clarify network topology requirements for clients (#17779)
The requirements for client-to-server and client-to-client topologies are not
well-documented in the production install requirements docs. Document that
clients make connections to servers (and not the other way around), and that
clients don't need to communicate with each other (with some exceptions).

Fixes: #17631
2023-06-30 10:46:29 -04:00
James Rasell
1e0d691452 job: ensure node pool is canonicalized for state restores. (#17765) 2023-06-30 07:37:22 +01:00
Sarah Thompson
9e5fc77689 Update the revision used by the docker build action. (#17755)
Update the revision used by the docker action. This should always reflect the commit that's being built as this may differ from the default <github.sha> that the workflow was invoked at.

Goes with https://github.com/hashicorp/actions-docker-build/pull/59 - and should not be merged until this PR is merged and a new version of the action is cut.
2023-06-29 09:19:54 -04:00
Phil Renaud
fccfb1d19d [ui] HCL-in-UI: Re-arrange buttons, add save-as-file (#17752)
* Move buttons over as expected

* Let a user download file locally

* test mock fns for jobeditor

* Changelog
2023-06-28 21:57:03 -04:00
Daniel Bennett
6bd509869b e2e: use DNS instead of HTTP to get my_public_ipv4 (#17759) 2023-06-28 13:11:57 -05:00
Tim Gross
350c435799 Merge pull request #17758 from hashicorp/post-release-1.6.0-beta.1
Post release 1.6.0 beta.1
2023-06-28 12:26:00 -04:00
hc-github-team-nomad-core
0be2b421b1 Prepare for next release 2023-06-28 11:06:28 -04:00
hc-github-team-nomad-core
2baba5821a Generate files for 1.6.0-beta.1 release 2023-06-28 11:06:20 -04:00
Tim Gross
81d3575def release: submit build workflow from the file on the release's own branch 2023-06-28 11:06:13 -04:00
Tim Gross
a678b422b5 Prepare release 1.6.0-beta.1 2023-06-28 11:06:05 -04:00
Phil Renaud
43587c5852 Link to allocations.allocation by ID reference, not by model (#17753) 2023-06-28 10:00:59 -04:00
Phil Renaud
f3df01e422 [ui] Move Placement Failures notification above job status panel (#17750)
* Moves the Placement Failures box above job status, should it exist

* Move it for non-service job-types as well
2023-06-27 19:32:51 -04:00
Phil Renaud
72a9f2b551 [ui] links to allocations explicitly go through their route model hook (#17737)
* links to allocations explicitly go through their route model hook

* Acceptance test to make sure alloc clicking loads alloc endpoint obj
2023-06-27 10:01:50 -04:00
Seth Hoenig
6b06b02384 fix changelog entry typo (#17743) 2023-06-27 08:02:06 -05:00
Seth Hoenig
f06948fd64 deps: update cronexpr to capture license file in SBOM tools (#17733) 2023-06-27 07:58:20 -05:00
Juana De La Cuesta
20280d2445 Update checklist-rpc-endpoint.md (#17698)
* Update checklist-rpc-endpoint.md

* Update checklist-rpc-endpoint.md

* Update contributing/checklist-rpc-endpoint.md

Co-authored-by: Tim Gross <tgross@hashicorp.com>

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-06-27 10:52:38 +02:00
Phil Renaud
d06dfd2abc Node Pools moved to after Type in jobs index columns (#17738) 2023-06-26 17:00:01 -04:00
Seth Hoenig
ec4fa55bbf drivers/docker: refactor use of clients in docker driver (#17731)
* drivers/docker: refactor use of clients in docker driver

This PR refactors how we manage the two underlying clients used by the
docker driver for communicating with the docker daemon. We keep two clients
- one with a hard-coded timeout that applies to all operations no matter
what, intended for use with short lived / async calls to docker. The other
has no timeout and is the responsibility of the caller to set a context
that will ensure the call eventually terminates.

The use of these two clients has been confusing and mistakes were made
in a number of places where calls were making use of the wrong client.

This PR makes it so that a user must explicitly call a function to get
the client that makes sense for that use case.

Fixes #17023

* cr: followup items
2023-06-26 15:21:42 -05:00
sejalapeno
05c84d64d2 Update allocations.go (#17726)
* Update allocations.go

updated missing client status "unknown" #17688

* changelog

* Update .changelog/17726.txt

adding relevant desc.

Co-authored-by: Seth Hoenig <shoenig@duck.com>

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-26 13:33:29 -05:00
nicoche
a9135bc6d5 deploymentwatcher: fail early whenever possible (#17341)
Given a deployment that has a `progress_deadline`, if a task group runs
out of reschedule attempts, allow it to fail at this time instead of
waiting until the `progress_deadline` is reached.

Fixes: #17260
2023-06-26 14:01:03 -04:00
Phil Renaud
d20faf5855 [ui] alignment and spacing for job status panel (#17708)
* CSS alignment and spacing for job status panel

* Only fade the count, not the legend icon, when count is 0

* Unrounded version corners

* changelog

* css has to only remove border radius when count is present

* Seed stabilization for services test

* Try consolidating the testfixes from before

* Total test isolation and bonus logs

* Drop the isolation but keep the logs

* Remove bonus logging
2023-06-26 12:18:12 -04:00
hashicorp-copywrite[bot]
d778ecfc7d [COMPLIANCE] Add Copyright and License Headers (#17732)
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-06-26 11:11:17 -05:00
dependabot[bot]
f4385cc817 build(deps): bump github.com/containerd/go-cni from 1.1.7 to 1.1.9 (#17582) 2023-06-26 16:47:20 +01:00
James Rasell
09cc4b12bc test: add drain config tests. (#17724) 2023-06-26 16:23:13 +01:00
Seth Hoenig
37df529e7a e2e: refactor pids isolation tests (#17717)
This PR refactors some old PID isolation tests to make use of the e2e/v3
packages. Should be quite a bit easier to read. Adds 'alloc exec' capability
to the jobs3 package.
2023-06-26 09:51:18 -05:00
Tim Gross
78f4f76520 adjust prioritized client updates (#17541)
In #17354 we made client updates prioritized to reduce client-to-server
traffic. When the client has no previously-acknowledged update we assume that
the update is of typical priority; although we don't know that for sure in
practice an allocation will never become healthy quickly enough that the first
update we send is the update saying the alloc is healthy.

But that doesn't account for allocations that quickly fail in an unrecoverable
way because of allocrunner hook failures, and it'd be nice to be able to send
those failure states to the server more quickly. This changeset does so and adds
some extra comments on reasoning behind priority.
2023-06-26 09:14:24 -04:00
dependabot[bot]
1d4f8869fd build(deps): bump github.com/opencontainers/runtime-spec (#17719)
Bumps [github.com/opencontainers/runtime-spec](https://github.com/opencontainers/runtime-spec) from 1.0.3-0.20210326190908-1c3f411f0417 to 1.1.0-rc.3.
- [Release notes](https://github.com/opencontainers/runtime-spec/releases)
- [Changelog](https://github.com/opencontainers/runtime-spec/blob/main/ChangeLog)
- [Commits](https://github.com/opencontainers/runtime-spec/commits/v1.1.0-rc.3)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/runtime-spec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 08:03:50 -05:00
Piotr Kazmierczak
807907e001 chore: gofmt docker driver handle.go (#17721) 2023-06-26 10:38:23 +02:00
Johan Forssell
5b46b74b94 drivers: OOM kill logging for Docker driver (#17518)
Explicit error log of the docker ID and container image name
2023-06-26 10:13:23 +02:00
Tim Gross
555214199a cli: fix broken node pool jobs test (#17715)
In #17705 we fixed a bug in the treatment of the "all" node pool for the `node
pool jobs` command but missed a test in the CLI.
2023-06-23 14:10:45 -07:00
Tim Gross
fc611fc5f4 docs: clarify drain's -force flag behavior with system/CSI jobs (#17703)
If you use `nomad node drain -force`, the drain deadline is set to -1ns. If you
have not prevented system and CSI node plugin allocations from being drained
with `-ignore-system`, they will be immediately drained as well. This is
typically not safe for CSI node plugins.

Also fix some broken links.

Fixes: #17696
2023-06-23 16:38:11 -04:00
Luiz Aoqui
276c69bffd api: prevent panic on job plan (#17689)
Check for a nil job ID to prevent a panic when calling Jobs().Plan().
2023-06-23 16:20:52 -04:00
Luiz Aoqui
b7c2d65a0e build: add Docker image (#17017)
Co-authored-by: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>
2023-06-23 15:57:09 -04:00
Luiz Aoqui
aea6146656 np: fix list of jobs for node pool all (#17705)
Unlike nodes, jobs are allowed to be registered in the node pool `all`,
in which case all nodes are used for evaluating placements. When listing
jobs for the `all` node pool only those that are explicitly in this node
pool should be returned.
2023-06-23 15:47:53 -04:00
Luiz Aoqui
6b70896480 changelog: add entry for node pools (#17707) 2023-06-23 15:47:35 -04:00
Tim Gross
8dca3632ff docs: split out unsupported versions in changelog (#17704)
Our changelog has become large enough that GitHub's rendering is very slow,
resulting in error pages ("angry unicorns"). Split out the older unsupported
versions of Nomad into their own file so that we only need to render the most
recent versions, while keeping the older versions relatively searchable by
having them in a single file.
2023-06-23 15:17:57 -04:00