Commit Graph

24724 Commits

Author SHA1 Message Date
dependabot[bot]
ea2cdc0f20 build(deps-dev): bump webpack from 5.69.1 to 5.86.0 in /ui (#17488)
Bumps [webpack](https://github.com/webpack/webpack) from 5.69.1 to 5.86.0.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](https://github.com/webpack/webpack/compare/v5.69.1...v5.86.0)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-15 10:39:06 -04:00
Seth Hoenig
55e224f7af tests: set timeout on test-ui (#17549)
This seems to finish in about 20 minutes, or run for 6+ hours until hitting
a default timeout. Set a timeout to 30 minutes so we aren't wasting time
and runners.
2023-06-15 09:38:50 -05:00
Tim Gross
288ff2f0c4 docs: add missing client.allocs metrics (#17540)
The docs were missing counter metrics emitted by the task runner around task
state changes.
2023-06-15 09:18:11 -04:00
Luiz Aoqui
57f31eb39a rpc: fix log message in Node.UpdateStatus (#17537) 2023-06-14 16:51:46 -04:00
Tim Gross
eee2315d5d docs: clarify node pool apply/delete behavior (#17529) 2023-06-14 15:58:53 -04:00
Tim Gross
068d0ea9af node pools: add pool as label on client metrics (#17528)
This changeset adds the node pool as a label anywhere we're already emitting
labels with additional information such as node class or ID about the client.
2023-06-14 15:58:38 -04:00
Tim Gross
0ac85db680 cli: fix missing -quiet flag for var init (#17526)
The `var init` command was intended to have support for a `-quiet` flag but it
was not documented and never parsed.
2023-06-14 14:52:46 -04:00
Tim Gross
6bd1ebed29 docs: note namespace apply/delete behaviors, fix metric (#17527)
This changeset includes some fixes to documentation discovered while working on
node pools, but we didn't want to include in the node pool PRs so they can get
backported easily:

* namespace apply/delete commands are forwarded to the authoritative region
* deleting a namespace requires there are no non-terminal jobs in any of the
  federated regions
* fixed a typo in the name of the `nomad.client.allocated.disk` metric
2023-06-14 14:52:06 -04:00
Phil Renaud
ee8cf15e73 [ui] Job status panel: tooltips on individual allocs (#17514)
* Tooltip on individual allocs in the panel

* Isolate allocation cells to their own component

* Tipsy trigger

* Aria label for failed-or-lost tooltips

* Buildfix

* Try adding percy exec back to exam run
2023-06-14 12:45:36 -04:00
Luiz Aoqui
30921a1bb4 client: fix panic on alloc stop in non-Linux environments (#17515)
Provide a no-op implementation of the drivers.DriverNetoworkManager
interface to be used by systems that don't support network isolation and
prevent panics where a network manager is expected.
2023-06-14 10:22:38 -04:00
James Rasell
b30f76e7d7 build: add agent bindata file to copywrite ignore list. (#17507) 2023-06-14 11:13:59 +01:00
Tim Gross
0aeeaf1083 node pools: implement node pool init command (#17479)
Implement a `nomad node pool init` command that generates an example spec file
in either HCL or JSON format.
2023-06-13 14:51:29 -04:00
Luiz Aoqui
5db9e64cdd node pool: node pool upsert on multiregion node register (#17503)
When registering a node with a new node pool in a non-authoritative
region we can't create the node pool because this new pool will not be
replicated to other regions.

This commit modifies the node registration logic to only allow automatic
node pool creation in the authoritative region.

In non-authoritative regions, the client is registered, but the node
pool is not created. The client is kept in the `initialing` status until
its node pool is created in the authoritative region and replicated to
the client's region.
2023-06-13 11:28:28 -04:00
Tim Gross
2c77bf72f1 node pools: protect against deleting occupied pools (#17457)
We don't want to delete node pools that have nodes or non-terminal jobs. Add a
check in the `DeleteNodePools` RPC to check locally and in federated regions,
similar to how we check that it's safe to delete namespaces.
2023-06-13 09:57:42 -04:00
stswidwinski
887d3060c4 conf: Add preemption_config to the server extra HCL keys which should be removed (#17481)
Add preemption_config to the set of keys which should be pruned from the server
config as described in #17480.
2023-06-13 10:48:19 +02:00
Daniel Bennett
5733fa7516 ci: remove circleci (#17502)
all of our workflows are in GitHub Actions now 🎉
2023-06-12 16:28:19 -05:00
Tim Gross
5bb6e5758d node pools: replicate from authoritative region (#17456)
Upserts and deletes of node pools are forwarded to the authoritative region,
just like we do for namespaces, quotas, ACL policies, etc. Replicate node pools
from the authoritative region.
2023-06-12 13:24:24 -04:00
dependabot[bot]
ca86582f69 build(deps): bump github.com/hashicorp/go-plugin from 1.4.9 to 1.4.10 (#17486) 2023-06-12 14:22:33 +01:00
Tim Gross
95b6d7abd8 node pools: prevent panic on upsert during upgrades (#17474)
Whenever we write a Raft log entry for node pools, we need to first make sure
that all servers can safely apply the log without panicking. Gate upsert and
delete RPCs on all servers being upgraded to the minimum version.
2023-06-12 09:01:30 -04:00
Tim Gross
cff3c9b874 replication: fix potential panic during upgrades (#17476)
If the authoritative region has been upgraded to a version of Nomad that has new
replicated objects (such as ACL Auth Methods, ACL Binding Rules, etc.), the
non-authoritative regions will start replicating those objects as soon as their
leader is upgraded. If a server in the non-authoritative region is upgraded and
then becomes the leader before all the other servers in the region have been
upgraded, then it will attempt to write a Raft log entry that the followers
don't understand. The followers will then panic.

Add same the minimum version checks that we do for RPC writes to the leader's
replication loop.
2023-06-12 08:53:56 -04:00
dependabot[bot]
b94cb322ee build(deps): bump github.com/shoenig/go-m1cpu from 0.1.5 to 0.1.6 (#17487) 2023-06-12 12:08:16 +01:00
dependabot[bot]
1f1c0a1f20 build(deps): bump github.com/fatih/color from 1.13.0 to 1.15.0 (#17485) 2023-06-12 10:44:18 +01:00
Phil Renaud
667d0026cd [ui] Don't show a service as healthy when its parent alloc is not running (#17465)
* Fix: dont show a service as healthy when its parent alloc is not running

* Test for Health Unknown
2023-06-09 15:43:11 -04:00
Piotr Kazmierczak
be8f04e89f docs: corrections and additional information for OIDC-related concepts (#17470) 2023-06-09 16:50:22 +02:00
Piotr Kazmierczak
c1a9fe93ac docs: add missing login API endpoint documentation (#17467) 2023-06-09 15:59:01 +02:00
Seth Hoenig
89ce092b20 docker: stop network pause container of lost alloc after node restart (#17455)
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
2023-06-09 08:46:29 -05:00
Phil Renaud
408ab828f7 [ui] Parallelize ember tests (#17442)
* Exam to parallelize tests

* Logging to try to solve test flakiness

* Logging in another failure

* Hardening for one test and snapshot for another

* Explicitly set the first one as the servicedAlloc instead of randomly picking

* A wild CircleCI test failure appears

* de-log
2023-06-07 17:01:35 -04:00
Seth Hoenig
225693ad28 client: fix client panic during drain cause by shutdown (#17450)
During shutdown of a client with drain_on_shutdown there is a race between
the Client ending the cgroup and the task's cpuset manager cleaning up
the cgroup. During the path traversal, skip anything we cannot read, which
avoids the nil DirEntry we try to dereference now.
2023-06-07 15:12:44 -05:00
Tim Gross
ceb3b4c0f1 build: update to go1.20.5 (#17451)
Go released a security update to fix build-time code injection and execution via
CGO. This doesn't impact already-released versions of Nomad, just the build
toolchain, so we won't be releasing a Nomad security update to go with it.
2023-06-07 11:44:59 -04:00
Tim Gross
9a6078a2ae node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Luiz Aoqui
354d741c95 node pool: implement nomad node pool nodes CLI (#17444) 2023-06-07 10:37:27 -04:00
Tim Gross
84e7cf39f6 node pools: implement CLI for node pool jobs command (#17432) 2023-06-06 15:02:26 -04:00
Tim Gross
385dbfb8d1 node pools: implement HTTP API to list jobs in pool (#17431)
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.

Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
2023-06-06 11:40:13 -04:00
Luiz Aoqui
f0f4cbb848 node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Jerome Eteve
0d41fb6747 client checks kernel module in /sys/module for WSL2 bridge networking (#17306) 2023-06-06 10:26:50 -04:00
Luiz Aoqui
637ddf516e node pools: add event stream support (#17412) 2023-06-06 10:14:47 -04:00
Dao Thanh Tung
67e39d5d24 Add check for missing path in client host_volume config (#17393) 2023-06-05 19:31:19 -04:00
Tim Gross
bb0140803e node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
Seth Hoenig
560315a49e test: ensure cpuset cgroup is setup before fingerprinting (#17428)
This PR fixes a racey test where we need to ensure the cpuset cgroup
is setup before trying to fingerprint it.
2023-06-05 14:15:00 -05:00
Luiz Aoqui
08ce8a3ac5 node pools: fix node upsert and state mutation tests (#17430) 2023-06-05 14:58:32 -04:00
Phil Renaud
e25c316b16 [ui] Remove Ember Assets Github Actions workflow (#17426)
* Remove Ember Assets gha workflow

* PR write added to permissions
2023-06-05 13:52:20 -04:00
hashicorp-copywrite[bot]
8597061b52 [COMPLIANCE] Add Copyright and License Headers (#17429)
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-06-05 13:23:59 -04:00
dependabot[bot]
1e31268b85 build(deps): bump go.etcd.io/bbolt from 1.3.6 to 1.3.7 (#16228)
* build(deps): bump go.etcd.io/bbolt from 1.3.6 to 1.3.7

Bumps [go.etcd.io/bbolt](https://github.com/etcd-io/bbolt) from 1.3.6 to 1.3.7.
- [Release notes](https://github.com/etcd-io/bbolt/releases)
- [Commits](https://github.com/etcd-io/bbolt/compare/v1.3.6...v1.3.7)

---
updated-dependencies:
- dependency-name: go.etcd.io/bbolt
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* cl: update cl for bbolt

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-05 10:19:14 -05:00
dependabot[bot]
d17f19da24 build(deps): bump github.com/dustin/go-humanize from 1.0.0 to 1.0.1 (#16227)
Bumps [github.com/dustin/go-humanize](https://github.com/dustin/go-humanize) from 1.0.0 to 1.0.1.
- [Release notes](https://github.com/dustin/go-humanize/releases)
- [Commits](https://github.com/dustin/go-humanize/compare/v1.0.0...v1.0.1)

---
updated-dependencies:
- dependency-name: github.com/dustin/go-humanize
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 10:17:04 -05:00
dependabot[bot]
992b62028d build(deps): bump github.com/hashicorp/raft from 1.3.11 to 1.5.0 (#17421)
* build(deps): bump github.com/hashicorp/raft from 1.3.11 to 1.5.0

Bumps [github.com/hashicorp/raft](https://github.com/hashicorp/raft) from 1.3.11 to 1.5.0.
- [Release notes](https://github.com/hashicorp/raft/releases)
- [Changelog](https://github.com/hashicorp/raft/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/raft/compare/v1.3.11...v1.5.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/raft
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* cl: add cl for raft 1.5.0

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-05 09:03:02 -05:00
dependabot[bot]
0596ae4975 build(deps): bump google.golang.org/protobuf from 1.28.1 to 1.30.0 (#17420)
Bumps google.golang.org/protobuf from 1.28.1 to 1.30.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 08:57:33 -05:00
KamilCuk
da9ec8ce1e Add group_add docker option (#17313) 2023-06-02 20:26:01 -04:00
dependabot[bot]
91a3eb7012 build(deps): bump github.com/shirou/gopsutil/v3 from 3.23.1 to 3.23.4 (#17338)
Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.23.1 to 3.23.4.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.23.1...v3.23.4)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-02 19:30:59 -04:00
Luiz Aoqui
81f0b359dd node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Luiz Aoqui
c09ca1e765 node pools: implement CLI (#17388) 2023-06-02 15:49:57 -04:00