Commit Graph

24699 Commits

Author SHA1 Message Date
Seth Hoenig
89ce092b20 docker: stop network pause container of lost alloc after node restart (#17455)
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
2023-06-09 08:46:29 -05:00
Phil Renaud
408ab828f7 [ui] Parallelize ember tests (#17442)
* Exam to parallelize tests

* Logging to try to solve test flakiness

* Logging in another failure

* Hardening for one test and snapshot for another

* Explicitly set the first one as the servicedAlloc instead of randomly picking

* A wild CircleCI test failure appears

* de-log
2023-06-07 17:01:35 -04:00
Seth Hoenig
225693ad28 client: fix client panic during drain cause by shutdown (#17450)
During shutdown of a client with drain_on_shutdown there is a race between
the Client ending the cgroup and the task's cpuset manager cleaning up
the cgroup. During the path traversal, skip anything we cannot read, which
avoids the nil DirEntry we try to dereference now.
2023-06-07 15:12:44 -05:00
Tim Gross
ceb3b4c0f1 build: update to go1.20.5 (#17451)
Go released a security update to fix build-time code injection and execution via
CGO. This doesn't impact already-released versions of Nomad, just the build
toolchain, so we won't be releasing a Nomad security update to go with it.
2023-06-07 11:44:59 -04:00
Tim Gross
9a6078a2ae node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Luiz Aoqui
354d741c95 node pool: implement nomad node pool nodes CLI (#17444) 2023-06-07 10:37:27 -04:00
Tim Gross
84e7cf39f6 node pools: implement CLI for node pool jobs command (#17432) 2023-06-06 15:02:26 -04:00
Tim Gross
385dbfb8d1 node pools: implement HTTP API to list jobs in pool (#17431)
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.

Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
2023-06-06 11:40:13 -04:00
Luiz Aoqui
f0f4cbb848 node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Jerome Eteve
0d41fb6747 client checks kernel module in /sys/module for WSL2 bridge networking (#17306) 2023-06-06 10:26:50 -04:00
Luiz Aoqui
637ddf516e node pools: add event stream support (#17412) 2023-06-06 10:14:47 -04:00
Dao Thanh Tung
67e39d5d24 Add check for missing path in client host_volume config (#17393) 2023-06-05 19:31:19 -04:00
Tim Gross
bb0140803e node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
Seth Hoenig
560315a49e test: ensure cpuset cgroup is setup before fingerprinting (#17428)
This PR fixes a racey test where we need to ensure the cpuset cgroup
is setup before trying to fingerprint it.
2023-06-05 14:15:00 -05:00
Luiz Aoqui
08ce8a3ac5 node pools: fix node upsert and state mutation tests (#17430) 2023-06-05 14:58:32 -04:00
Phil Renaud
e25c316b16 [ui] Remove Ember Assets Github Actions workflow (#17426)
* Remove Ember Assets gha workflow

* PR write added to permissions
2023-06-05 13:52:20 -04:00
hashicorp-copywrite[bot]
8597061b52 [COMPLIANCE] Add Copyright and License Headers (#17429)
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-06-05 13:23:59 -04:00
dependabot[bot]
1e31268b85 build(deps): bump go.etcd.io/bbolt from 1.3.6 to 1.3.7 (#16228)
* build(deps): bump go.etcd.io/bbolt from 1.3.6 to 1.3.7

Bumps [go.etcd.io/bbolt](https://github.com/etcd-io/bbolt) from 1.3.6 to 1.3.7.
- [Release notes](https://github.com/etcd-io/bbolt/releases)
- [Commits](https://github.com/etcd-io/bbolt/compare/v1.3.6...v1.3.7)

---
updated-dependencies:
- dependency-name: go.etcd.io/bbolt
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* cl: update cl for bbolt

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-05 10:19:14 -05:00
dependabot[bot]
d17f19da24 build(deps): bump github.com/dustin/go-humanize from 1.0.0 to 1.0.1 (#16227)
Bumps [github.com/dustin/go-humanize](https://github.com/dustin/go-humanize) from 1.0.0 to 1.0.1.
- [Release notes](https://github.com/dustin/go-humanize/releases)
- [Commits](https://github.com/dustin/go-humanize/compare/v1.0.0...v1.0.1)

---
updated-dependencies:
- dependency-name: github.com/dustin/go-humanize
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 10:17:04 -05:00
dependabot[bot]
992b62028d build(deps): bump github.com/hashicorp/raft from 1.3.11 to 1.5.0 (#17421)
* build(deps): bump github.com/hashicorp/raft from 1.3.11 to 1.5.0

Bumps [github.com/hashicorp/raft](https://github.com/hashicorp/raft) from 1.3.11 to 1.5.0.
- [Release notes](https://github.com/hashicorp/raft/releases)
- [Changelog](https://github.com/hashicorp/raft/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/raft/compare/v1.3.11...v1.5.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/raft
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* cl: add cl for raft 1.5.0

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-05 09:03:02 -05:00
dependabot[bot]
0596ae4975 build(deps): bump google.golang.org/protobuf from 1.28.1 to 1.30.0 (#17420)
Bumps google.golang.org/protobuf from 1.28.1 to 1.30.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 08:57:33 -05:00
KamilCuk
da9ec8ce1e Add group_add docker option (#17313) 2023-06-02 20:26:01 -04:00
dependabot[bot]
91a3eb7012 build(deps): bump github.com/shirou/gopsutil/v3 from 3.23.1 to 3.23.4 (#17338)
Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.23.1 to 3.23.4.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.23.1...v3.23.4)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-02 19:30:59 -04:00
Luiz Aoqui
81f0b359dd node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Luiz Aoqui
c09ca1e765 node pools: implement CLI (#17388) 2023-06-02 15:49:57 -04:00
hc-github-team-es-release-engineering
e41b99b6d3 ci: finish migration from CCI to GHA (#17103)
namely, these workflows:
  test-e2e, test-ui, and test-windows

extra-curricularly, as part of the overall
migration effort company-wide, this also includes
some standardization such as:
 * explicit permissions:read on various workflows
 * pinned action version shas (per https://github.com/hashicorp/security-public-tsccr)
 * actionlint, which among other things runs
   shellcheck on GHA run steps

Co-authored-by: emilymianeil <eneil@hashicorp.com>
Co-authored-by: Daniel Kimsey <daniel.kimsey@hashicorp.com>
2023-06-02 14:35:55 -05:00
Daniel Bennett
e0dd940439 tests: enable newer windows (#17401)
* "allow" (don't try to drop) linux capabilities
  in the docker test driver harness (see #15181)
* refactor to allow different busybox images
  since windows containers need to be the same
  version as the underlying OS, and we're
  moving from 2016 to 2019
* one docker test was flaky from apparently
  being a bit slower on windows, so add Wait()
2023-06-02 11:38:38 -05:00
Luiz Aoqui
8c148165b1 np: fix node pool search permission check (#17400)
When checking if a token is allowed to query the search endpoints we
need to return an error if the search context includes `node_pool` and
the token doesn't have access to _any_ pool. This prevents returning an
empty list instead of a permission denied error.
2023-06-02 12:22:47 -04:00
Phil Renaud
70927edc69 UI GHA test changes re-implemented (#17399) 2023-06-02 11:59:08 -04:00
Samantha
7ef1905333 check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Tim Gross
16156b361f node pools: validate pool exists on job registration (#17386)
Add a new job admission hook for node pools that enforces the pool exists on
registration. Also provide the skeleton function we need for Enterprise
enforcement functions we'll implement later.
2023-06-02 09:32:07 -04:00
Luiz Aoqui
1664796f43 core: refactor task validation (#17344)
Move all validations related to task fields to Task.Validate(). Prior to
this, some task validations were being done inside TaskGroup.Validate()
because they required access to some group values.

But similarly to how TaskGroup.Validate() tasks the job as parameter,
it's fair to expect the task to receive its group.
2023-06-01 19:26:42 -04:00
Luiz Aoqui
0ede6ac4f6 core: fix kill_timeout validation when progress_deadline is 0 (#17342) 2023-06-01 19:01:32 -04:00
Luiz Aoqui
9ee68fc02c node pool: add search support (#17385) 2023-06-01 17:48:14 -04:00
Tim Gross
2d059bbf22 node pools: add node_pool field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00
Luiz Aoqui
970e998b00 node pools: add CRUD API (#17384) 2023-06-01 15:55:49 -04:00
Bryce Kalow
45fcc59a82 Delete check-legacy-links-format.yml (#17380) 2023-06-01 13:41:17 -04:00
Luiz Aoqui
2ae7cd384d np: implement ACL for node pools (#17365) 2023-06-01 13:03:20 -04:00
Seth Hoenig
1eeb3298b9 repo: block pushing to release branches in git hook (#17377) 2023-06-01 09:36:20 -05:00
Daniel Kimsey
20ee6e9a23 Merge pull request #17376 from hashicorp/revert-accidental-17103-commit-on-main
Revert "fixup: address review changes"
2023-06-01 09:22:30 -05:00
Daniel Kimsey
70c5191f39 Revert "fixup: address review changes"
This reverts commit ba736e4521.

This was accidentally added by fat-fingered Admin push...
2023-06-01 08:58:07 -05:00
Daniel Kimsey
ba736e4521 fixup: address review changes 2023-06-01 08:48:42 -05:00
Tim Gross
893d4a77c8 prioritized client updates (#17354)
The allocrunner sends several updates to the server during the early lifecycle
of an allocation and its tasks. Clients batch-up allocation updates every 200ms,
but experiments like the C2M challenge has shown that even with this batching,
servers can be overwhelmed with client updates during high volume
deployments. Benchmarking done in #9451 has shown that client updates can easily
represent ~70% of all Nomad Raft traffic.

Each allocation sends many updates during its lifetime, but only those that
change the `ClientStatus` field are critical for progressing a deployment or
kicking off a reschedule to recover from failures.

Add a priority to the client allocation sync and update the `syncTicker`
receiver so that we only send an update if there's a high priority update
waiting, or on every 5th tick. This means when there are no high priority
updates, the client will send updates at most every 1s instead of
200ms. Benchmarks have shown this can reduce overall Raft traffic by 10%, as
well as reduce client-to-server RPC traffic.

This changeset also switches from a channel-based collection of updates to a
shared buffer, so as to split batching from sending and prevent backpressure
onto the allocrunner when the RPC is slow. This doesn't have a major performance
benefit in the benchmarks but makes the implementation of the prioritized update
simpler.

Fixes: #9451
2023-05-31 15:34:16 -04:00
dependabot[bot]
9362620da8 build(deps): bump github.com/elazarl/go-bindata-assetfs (#17339)
Bumps [github.com/elazarl/go-bindata-assetfs](https://github.com/elazarl/go-bindata-assetfs) from 1.0.1-0.20200509193318-234c15e7648f to 1.0.1.
- [Release notes](https://github.com/elazarl/go-bindata-assetfs/releases)
- [Changelog](https://github.com/elazarl/go-bindata-assetfs/blob/master/.goreleaser.yml)
- [Commits](https://github.com/elazarl/go-bindata-assetfs/commits/v1.0.1)

---
updated-dependencies:
- dependency-name: github.com/elazarl/go-bindata-assetfs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-31 10:19:59 -04:00
Phil Renaud
473bae7f16 Text type to password type input on profile sign-in page (#17345) 2023-05-30 16:58:34 -04:00
Phil Renaud
822b3adae6 Observe newlines when displaying variables (#17343) 2023-05-30 16:58:16 -04:00
Luiz Aoqui
26d38837d3 cli: output errors when monitoring deployment (#17348) 2023-05-30 11:12:12 -04:00
Luiz Aoqui
389dff42a1 cli: fix panic on job restart (#17346)
When monitoring the replacement allocation, if the
`Allocations().Info()` request fails, the `alloc` variable is `nil`, so
it should not be read.
2023-05-30 11:08:49 -04:00
Luiz Aoqui
4068b68b29 client: fix Consul version finterprint (#17349)
Consul v1.13.8 was released with a breaking change in the /v1/agent/self
endpoint version where a line break was being returned.

This caused the Nomad finterprint to fail because `NewVersion` errors on
parse.

This commit removes any extra space from the Consul version returned by
the API.
2023-05-30 11:07:57 -04:00
Seth Hoenig
60e0404bb5 compliance: add headers with fixed copywrite tool (#17353)
Closes #17117
2023-05-30 09:20:32 -05:00