Commit Graph

24637 Commits

Author SHA1 Message Date
Lance Haig
7e93f150b5 cli: tls certs not created with correct SANs (#16959)
The `nomad tls cert` command did not create certificates with the correct SANs for
them to work with non default domain and region names. This changset updates the
code to support non default domains and regions in the certificates.
2023-05-22 09:31:56 -04:00
Roberto Hidalgo
a6f15ba0ac allow periodic jobs to use workload identity ACL policies (#17018)
When resolving ACL policies, we were not using the parent ID for the policy
lookup for dispatch/periodic jobs, even though the claims were signed for that
parent ID. This prevents all calls to the Task API (and other WI-authenticated
API calls) from a periodically-dispatched job failing with 403.

Fix this by using the parent job ID whenever it's available.
2023-05-22 09:19:16 -04:00
Tim Gross
87ea64c583 document which fields can be updated by volume register (#17249)
The `volume register` command can update a small subset of the volume's fields
in-place, with some restrictions depending on whether the volume is currently in
use. Document these in the `volume register` command docs and the volume
specification docs.

Fixes: #17247
2023-05-22 09:15:25 -04:00
dependabot[bot]
85376af1b7 build(deps): bump github.com/shoenig/test from 0.6.4 to 0.6.6 in /api (#17178)
* build(deps): bump github.com/shoenig/test from 0.6.4 to 0.6.5 in /api

* deps: update shoenig/test to v0.6.5

* deps: update again to v0.6.6

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-05-22 07:53:12 -05:00
Phil Renaud
3e0dc235eb Updates static JS/UI assets for upcoming 1.6 release (#17263) 2023-05-19 19:03:37 -04:00
Phil Renaud
2600ab7fce [ui, deployments] Add status panel to child jobs (#17217)
* Treated same-route as sub-route and didnt cancel watchers

* Adds panel to child jobs and sub-sorts

* removed the safety check in module-for-job tests

* [ui] Adds status panel to Sysbatch jobs (#17243)

* In working out periodic/param child jobs, realized the intersection with sysbatch is high enough that it ought to be worked on now

* Further removal of jobclientstatussummary

* Explicitly making mocked jobs in no-deployment mode

* remove last remnants of job-client-status-summary component

* Screwed up my sorting order a few commits ago; this corrects it

* noActiveDeployment gonna be the death of me
2023-05-19 15:51:35 -04:00
Tim Gross
2275a83cbf docs: describe the default Workload Identity ACL policy (#17245)
Workload Identities have an implicit default policy. This policy can't currently
be described via HCL because it includes task interpolation for Variables and
access to the Services API (which doesn't exist as its own ACL
capbility). Describe this in our WI documentation.

Fixes: #16277
2023-05-19 11:38:05 -04:00
Tim Gross
d4f9a4ae90 build: pin semgrep action (#17248)
The file path in the TSCCR repo for the `returntocorp/semgrep` action was
incorrect, so the pinning tool was not able to find the correct entry and it was
not pinned in #17238.

The repository is fixed in https://github.com/hashicorp/security-tsccr/pull/431
2023-05-19 10:27:51 -04:00
Tim Gross
c9f44250c8 build: move GitHub actions to versions allowed by prodsec (#17238)
The `backspace/ember-asset-size` action we're using is unmaintained and has a
bunch of vulns in it, so it won't pass security screening (this is a NodeJS
action so it has piles of dependencies, 99% of which won't be in use but fails
automated screening anyways). Move this to the upstream version.

The `machine-learning-apps/pr-comment` action also presents a problem for the
ProdSec security screening because it's archived and also runs an external
Docker image. Move this to a likely-ok maintained action for now, until we can
spare some time to remove this in lieu of something more reasonable that isn't a
GitHub action.
2023-05-19 09:07:02 -04:00
Phil Renaud
d9027a7238 Fixes to scheduling-filtering-in-ui (#17244) 2023-05-18 17:38:34 -04:00
Yethal
93fc71bb37 cli: show leader status in json output of server members (#17138) 2023-05-18 16:43:57 -04:00
Phil Renaud
4335f9f460 [ui] Adds a "Scheduling" filter to the job.allocations page (#17227)
* Basic filter concept

* Make sure NextAllocation gets sent up with allocation stub
2023-05-18 16:24:41 -04:00
Luiz Aoqui
e80945f37d Post 1.5.5 release (#17241)
* Generate files for 1.5.5 release

* Prepare for next release

---------

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2023-05-18 14:06:56 -04:00
Jai
2917231d7d ui: add option to filter for jobs that are packs (#17226)
* refact:  update job model

* refact: update view layer

* refact: update test
2023-05-18 12:47:11 -04:00
Bram Vogelaar
89a4930b1d agent: display node id on start up for servers (#17084)
Signed-off-by: Bram Vogelaar <bram@attachmentgenie.com>
2023-05-18 11:23:12 -04:00
Tim Gross
7fc37fa2ad logs: fix logs.disabled on Windows (#17199)
On Windows the executor returns an error when trying to open the `NUL` device
when we pass it `os.DevNull` for the stdout/stderr paths. Instead of opening the
device, use the discard pipe so that we have platform-specific behavior from the
executor itself.

Fixes: #17148
2023-05-18 09:14:39 -04:00
James Rasell
13dd5263fd variable: fixup metadata copy comment and remove unrequired type. (#17234) 2023-05-18 13:49:41 +01:00
Phil Renaud
0f1fd0a104 [ui] Latest Deployment component removed (#17192)
* Latest Deployment component removed

* Integration test selector update
2023-05-17 16:59:42 -04:00
Mike Nomitch
3db97acf8a docs: add documentation on ephemeral disk and logs (#15829) 2023-05-17 16:58:11 -04:00
Roman Zipp
22f5217b85 docs: remove unneeded brackets from job specification template docs (#17219) 2023-05-17 16:45:00 -04:00
Phil Renaud
830b9bc6c4 [ui, deployments] Fix a bug where watchers on a parent (periodic) job would continue on a child route (#17214)
* Treated same-route as sub-route and didnt cancel watchers

* Changelog
2023-05-17 16:36:15 -04:00
Tim Gross
e41231b04e build: upgrade deprecated actions syntax (#17222)
Missed these in the previous pass.
2023-05-17 11:39:55 -04:00
hashicorp-tsccr[bot]
1b7668c17f build: trusted workflow pinning (#16992)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-05-17 10:38:10 -04:00
Tim Gross
feb34c738d scheduler: count implicit spread targets as a single target (#17195)
When calculating the score in the `SpreadIterator`, the score boost is
proportional to the difference between the current and desired count. But when
there are implicit spread targets, the current count is the sum of the possible
implicit targets, which results in incorrect scoring unless there's only one
implicit target.

This changeset updates the `propertySet` struct to accept a set of explicit
target values so it can detect when a property value falls into the implicit set
and should be combined with other implicit values.

Fixes: #11823
2023-05-17 10:25:00 -04:00
Tim Gross
bf04ea12cb build: update deprecated GitHub Actions (#17218)
Many of the GitHub Actions from the build pipeline are written in a truly
ancient version of NodeJS. Upgrade to more recent versions.

Remove RelEng from codeowners
2023-05-17 08:57:28 -04:00
Phil Renaud
955fcab2d7 [ui] Remove unnecessary subnav for parent jobs (#17190)
* Nest job subnav in a parent-check

* Move the evaluation tab test to withn an else of the children condition
2023-05-16 16:07:12 -04:00
Tim Gross
c80c8ab43d scheduler: prevent -Inf in spread scoring (#17198)
When spread targets have a percent value of zero it's possible for them to
return -Inf scoring because of a float divide by zero. This is very hard for
operators to debug because the string "-Inf" is returned in the API and that
breaks the presentation of debugging data.

Most scoring iterators are bracketed to -1/+1, but spread iterators do not so
that they can handle greatly unbalanced scoring so we can't simply return a -1
score without generating a score that might be greater than the negative scores
set by other spread targets. Instead, track the lowest-seen spread boost and use
that as the spread boost for any cases where we'd divide by zero.

Fixes: #8863
2023-05-16 16:01:32 -04:00
dependabot[bot]
eafdbd4826 build(deps-dev): bump prettier from 2.2.1 to 2.8.8 in /website (#16965)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-16 12:12:53 -05:00
Seth Hoenig
3cc25949fa client: ignore restart issued to terminal allocations (#17175)
* client: ignore restart issued to terminal allocations

This PR fixes a bug where issuing a restart to a terminal allocation
would cause the allocation to run its hooks anyway. This was particularly
apparent with group_service_hook who would then register services but
then never deregister them - as the allocation would be effectively in
a "zombie" state where it is prepped to run tasks but never will.

* e2e: add e2e test for alloc restart zombies

* cl: tweak text

Co-authored-by: Tim Gross <tgross@hashicorp.com>

---------

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-05-16 10:19:41 -05:00
Tim Gross
bf7b82b52b drivers: make internal DisableLogCollection capability public (#17196)
The `DisableLogCollection` capability was introduced as an experimental
interface for the Docker driver in 0.10.4. The interface has been stable and
allowing third-party task drivers the same capability would be useful for those
drivers that don't need the additional overhead of logmon.

This PR only makes the capability public. It doesn't yet add it to the
configuration options for the other internal drivers.

Fixes: #14636 #15686
2023-05-16 09:16:03 -04:00
Piotr Kazmierczak
6fe808c1e8 refactor acl.UpsertTokens to avoid unnecessary RPC calls. (#17194)
New RPC endpoints introduced during OIDC and JWT auth perform unnecessary many RPC calls when they upsert generated ACL tokens, as pointed out by @tgross.

This PR moves the common logic from acl.UpsertTokens method into a helper method that contains common logic, and sidesteps authentication, metrics, etc.
2023-05-16 09:31:51 +02:00
Michael Schurter
fea29c57a4 remove unused helper/fields package (#17197)
Has been unused since we switched task drivers to plugins and used
hclschema for config in #4936 (v0.9.0-beta1)
2023-05-15 12:10:11 -07:00
dependabot[bot]
a430abedf0 build(deps-dev): bump @hashicorp/platform-content-conformance (#17030)
Bumps @hashicorp/platform-content-conformance from 0.0.10 to 0.0.11.

---
updated-dependencies:
- dependency-name: "@hashicorp/platform-content-conformance"
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-15 11:28:03 -04:00
Luiz Aoqui
1e679ccba2 node pool: initial base work (#17163)
Implementation of the base work for the new node pools feature. It includes a new `NodePool` struct and its corresponding state store table.

Upon start the state store is populated with two built-in node pools that cannot be modified nor deleted:

  * `all` is a node pool that always includes all nodes in the cluster.
  * `default` is the node pool where nodes that don't specify a node pool in their configuration are placed.
2023-05-15 10:49:08 -04:00
dependabot[bot]
d706b8f1c9 build(deps-dev): bump next from 12.3.1 to 13.4.2 in /website (#17177)
Bumps [next](https://github.com/vercel/next.js) from 12.3.1 to 13.4.2.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v12.3.1...v13.4.2)

---
updated-dependencies:
- dependency-name: next
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-15 10:46:40 -04:00
Mark Lewis
e329a45b98 Update delete.mdx (#17184)
Fix typo
2023-05-15 13:31:52 +01:00
Phil Renaud
8b7ff66a2d [ui, deployments] Deployment history on steady state (#17167)
* Deployment history on steady state

* de-hds after chat
2023-05-12 16:49:40 -04:00
Tim Gross
88323bab4a allocrunner: provide factory function so we can build mock ARs (#17161)
Tools like `nomad-nodesim` are unable to implement a minimal implementation of
an allocrunner so that we can test the client communication without having to
lug around the entire allocrunner/taskrunner code base. The allocrunner was
implemented with an interface specifically for this purpose, but there were
circular imports that made it challenging to use in practice.

Move the AllocRunner interface into an inner package and provide a factory
function type. Provide a minimal test that exercises the new function so that
consumers have some idea of what the minimum implementation required is.
2023-05-12 13:29:44 -04:00
Phil Renaud
f2e603dab8 [ui] Keyboard shortcuts to switch regions (#17169)
* Regions keynav

* Dont show if you only have a single region (global by default)
2023-05-12 11:46:00 -04:00
Jai
17633d8370 chore: add percy tests (#17157) 2023-05-12 09:57:22 -04:00
Jai
c47c2c988f 16664/upgrade (#17158)
* chore:  upgrade

Upgrade @babel/helper-string-parserprop-types

* chore: add resolution

* chore: update component API for breaking changes

* chore: update arguments

* api: forgive user for pass wrong args

* chore:  update tests

* chore: update yarn lock

* chore: upgrade to Glimmer component

* styling: add properties to component invocation

* chore: add inset styles
2023-05-12 09:54:13 -04:00
Jai
ca00af7b81 chore: write js doc (#17156)
* chore:  write jsdoc comments

* chore: update comments
2023-05-11 15:30:54 -04:00
Seth Hoenig
81eb6f157c core: eliminate second index on job_submissions table (#17146)
* core: eliminate second index on job_submissions table

This PR refactors the job_submissions state store code to eliminate the
use of a second index formerly used for purging all versions of a given
job. In practice we ended up with duplicate entries on the table. Instead,
use index prefix scanning on the primary index and tidy up any potential
for creating (or removing) duplicates.

* core: pr comments followup
2023-05-11 09:51:08 -05:00
Phil Renaud
106aef95a8 [ui, deployments] Denominator based on completed allocations for batch jobs (#17147)
* Denominator based on completed allocations for batch jobs

* Test for denominatored batch job change
2023-05-11 10:23:23 -04:00
Tim Gross
116f24d768 client: de-duplicate alloc updates and gate during restore (#17074)
When client nodes are restarted, all allocations that have been scheduled on the
node have their modify index updated, including terminal allocations. There are
several contributing factors:

* The `allocSync` method that updates the servers isn't gated on first contact
  with the servers. This means that if a server updates the desired state while
  the client is down, the `allocSync` races with the `Node.ClientGetAlloc`
  RPC. This will typically result in the client updating the server with "running"
  and then immediately thereafter "complete".

* The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even
  if it's possible to assert that the server has definitely seen the client
  state. The allocrunner may queue-up updates even if we gate sending them. So
  then we end up with a race between the allocrunner updating its internal state
  to overwrite the previous update and `allocSync` sending the bogus or duplicate
  update.

This changeset adds tracking of server-acknowledged state to the
allocrunner. This state gets checked in the `allocSync` before adding the update
to the batch, and updated when `Node.UpdateAlloc` returns successfully. To
implement this we need to be able to equality-check the updates against the last
acknowledged state. We also need to add the last acknowledged state to the
client state DB, otherwise we'd drop unacknowledged updates across restarts.

The client restart test has been expanded to cover a variety of allocation
states, including allocs stopped before shutdown, allocs stopped by the server
while the client is down, and allocs that have been completely GC'd on the
server while the client is down. I've also bench tested scenarios where the task
workload is killed while the client is down, resulting in a failed restore.

Fixes #16381
2023-05-11 09:05:24 -04:00
Seth Hoenig
5d6409bfab cli: upload var file(s) content on job submission (#17128)
This PR makes it so that the content of any -var-file files is uploaded
to Nomad on job run.
2023-05-11 08:04:33 -05:00
Jai
c7f68914c1 ui: add sign-in link on err page (#17140) 2023-05-11 08:24:58 -04:00
Luiz Aoqui
889a4e38e1 deps: update go-metrics to prevent panic (#17133)
nomad#15861 describes intermitent panics caused by go-metrics Prometheus
client. We have not been able to further debug this problem due to the
lack of information when the panic happens.

go-metrics#146 prevents the panic from happening and also logs
additional information that can help us understand the root cause of the
problem.

This commits pins the go-metric dependency to this branch until we can
better debug the issue.
2023-05-10 21:33:15 -04:00
Phil Renaud
c33fc33441 [ui] Batch jobs, aside from child jobs, get the new status panel (#17118)
* Batch jobs, aside from child jobs, get the new status panel

* Clean up the imported jobAllocStatuses

* Note for mirage that batch jobs now have a historical status panel

* Batch job test for complete status

* Parameterized and periodic child jobs get the panel treatment

* Undo parameterized and periodic child test situations
2023-05-10 16:59:33 -04:00
Phil Renaud
6fef94a73f Versions returned as an array rather than an object now, pending changed to unknown, and sorted (#17150) 2023-05-10 15:40:54 -04:00