Commit Graph

23564 Commits

Author SHA1 Message Date
Tim Gross
d1faead371 rename SecureVariables to Variables throughout 2022-08-26 16:06:24 -04:00
Tim Gross
91c81ba984 file rename 2022-08-26 16:06:24 -04:00
Jai
e23b47e448 service-health-bar (#14295)
* ui: add service-status-bar

* test: service-status-bar
2022-08-26 12:04:59 -04:00
Vladimir Sokolov
b665da6c5f cli: force periodic job if its id equals search prefix 2022-08-26 10:54:37 -04:00
Seth Hoenig
6d99ca1122 Merge pull request #14318 from hashicorp/cleanup-create-pointer-compare
cleanup: create pointer.Compare helper function
2022-08-26 09:15:41 -05:00
Luiz Aoqui
119bb50151 Post 1.3.4 release (#14329)
* Generate files for 1.3.4 release

* Prepare for next release

* Update CHANGELOG.md

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2022-08-26 10:09:13 -04:00
dependabot[bot]
ececd19808 build(deps): bump github.com/hashicorp/go-memdb from 1.3.2 to 1.3.3 (#14206)
Bumps [github.com/hashicorp/go-memdb](https://github.com/hashicorp/go-memdb) from 1.3.2 to 1.3.3.
- [Release notes](https://github.com/hashicorp/go-memdb/releases)
- [Changelog](https://github.com/hashicorp/go-memdb/blob/main/changes.go)
- [Commits](https://github.com/hashicorp/go-memdb/compare/v1.3.2...v1.3.3)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-memdb
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-26 10:07:41 -04:00
Seth Hoenig
7788f09567 cleanup: create pointer.Compare helper function
This PR creates a pointer.Compare helper for comparing equality of
two pointers. Strictly only works with primitive types we know are
safe to derefence and compare using '=='.
2022-08-26 08:55:59 -05:00
dependabot[bot]
adafab0b9a build(deps): bump github.com/hashicorp/go-hclog from 1.2.0 to 1.2.2 (#14208)
Bumps [github.com/hashicorp/go-hclog](https://github.com/hashicorp/go-hclog) from 1.2.0 to 1.2.2.
- [Release notes](https://github.com/hashicorp/go-hclog/releases)
- [Commits](https://github.com/hashicorp/go-hclog/compare/v1.2.0...v1.2.2)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-hclog
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-26 09:31:54 -04:00
dependabot[bot]
5a7279292a build(deps): bump github.com/aws/aws-sdk-go from 1.42.27 to 1.44.84 (#14326)
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.42.27 to 1.44.84.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.42.27...v1.44.84)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-26 09:13:37 -04:00
Charlie Voiselle
a36ae675ff SV API: return upserted variable to caller (#14325)
* Return created variable to caller in HTTP and Go APIs
* Update tests for returned values
2022-08-25 17:38:15 -04:00
dependabot[bot]
aa74aa0f14 build(deps): bump github.com/shirou/gopsutil/v3 from 3.21.12 to 3.22.7 (#14209)
* build(deps): bump github.com/shirou/gopsutil/v3 from 3.21.12 to 3.22.7

Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.21.12 to 3.22.7.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.21.12...v3.22.7)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* changelog entry

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-08-25 14:15:41 -04:00
Seth Hoenig
27406a8e46 Merge pull request #14230 from hashicorp/b-fix-cpuset-init
client: refactor cpuset manager initialization
2022-08-25 11:19:39 -05:00
Seth Hoenig
13bd08b15b client: refactor cpuset manager initialization
This PR refactors the code path in Client startup for setting up the cpuset
cgroup manager (non-linux systems not affected).

Before, there was a logic bug where we would try to read the cpuset.cpus.effective
cgroup interface file before ensuring nomad's parent cgroup existed. Therefor that
file would not exist, and the list of useable cpus would be empty. Tasks started
thereafter would not have a value set for their cpuset.cpus.

The refactoring fixes some less than ideal coding style. Instead we now bootstrap
each cpuset manager type (v1/v2) within its own constructor. If something goes
awry during bootstrap (e.g. cgroups not enabled), the constructor returns the
noop implementation and logs a warning.

Fixes #14229
2022-08-25 11:18:43 -05:00
Seth Hoenig
7cd0c35060 Merge pull request #14301 from hashicorp/b-fix-check-status-test-racey
testing: fix flakey check status test
2022-08-25 08:30:46 -05:00
Luiz Aoqui
546bdb8b95 ui: task lifecycle restart all tasks (#14223)
Now that tasks that have finished running can be restarted, the UI needs
to use the actual task state to determine which CSS class to use when
rendering the task lifecycle chart element.
2022-08-24 18:43:44 -04:00
Luiz Aoqui
f74f50804a Task lifecycle restart (#14127)
* allocrunner: handle lifecycle when all tasks die

When all tasks die the Coordinator must transition to its terminal
state, coordinatorStatePoststop, to unblock poststop tasks. Since this
could happen at any time (for example, a prestart task dies), all states
must be able to transition to this terminal state.

* allocrunner: implement different alloc restarts

Add a new alloc restart mode where all tasks are restarted, even if they
have already exited. Also unifies the alloc restart logic to use the
implementation that restarts tasks concurrently and ignores
ErrTaskNotRunning errors since those are expected when restarting the
allocation.

* allocrunner: allow tasks to run again

Prevent the task runner Run() method from exiting to allow a dead task
to run again. When the task runner is signaled to restart, the function
will jump back to the MAIN loop and run it again.

The task runner determines if a task needs to run again based on two new
task events that were added to differentiate between a request to
restart a specific task, the tasks that are currently running, or all
tasks that have already run.

* api/cli: add support for all tasks alloc restart

Implement the new -all-tasks alloc restart CLI flag and its API
counterpar, AllTasks. The client endpoint calls the appropriate restart
method from the allocrunner depending on the restart parameters used.

* test: fix tasklifecycle Coordinator test

* allocrunner: kill taskrunners if all tasks are dead

When all non-poststop tasks are dead we need to kill the taskrunners so
we don't leak their goroutines, which are blocked in the alloc restart
loop. This also ensures the allocrunner exits on its own.

* taskrunner: fix tests that waited on WaitCh

Now that "dead" tasks may run again, the taskrunner Run() method will
not return when the task finishes running, so tests must wait for the
task state to be "dead" instead of using the WaitCh, since it won't be
closed until the taskrunner is killed.

* tests: add tests for all tasks alloc restart

* changelog: add entry for #14127

* taskrunner: fix restore logic.

The first implementation of the task runner restore process relied on
server data (`tr.Alloc().TerminalStatus()`) which may not be available
to the client at the time of restore.

It also had the incorrect code path. When restoring a dead task the
driver handle always needs to be clear cleanly using `clearDriverHandle`
otherwise, after exiting the MAIN loop, the task may be killed by
`tr.handleKill`.

The fix is to store the state of the Run() loop in the task runner local
client state: if the task runner ever exits this loop cleanly (not with
a shutdown) it will never be able to run again. So if the Run() loops
starts with this local state flag set, it must exit early.

This local state flag is also being checked on task restart requests. If
the task is "dead" and its Run() loop is not active it will never be
able to run again.

* address code review requests

* apply more code review changes

* taskrunner: add different Restart modes

Using the task event to differentiate between the allocrunner restart
methods proved to be confusing for developers to understand how it all
worked.

So instead of relying on the event type, this commit separated the logic
of restarting an taskRunner into two methods:
- `Restart` will retain the current behaviour and only will only restart
  the task if it's currently running.
- `ForceRestart` is the new method where a `dead` task is allowed to
  restart if its `Run()` method is still active. Callers will need to
  restart the allocRunner taskCoordinator to make sure it will allow the
  task to run again.

* minor fixes
2022-08-24 17:43:07 -04:00
Tim Gross
e886d5d055 vault: detect namespace change in config reload (#14298)
The `namespace` field was not included in the equality check between old and new
Vault configurations, which meant that a Vault config change that only changed
the namespace would not be detected as a change and the clients would not be
reloaded.

Also, the comparison for boolean fields such as `enabled` and
`allow_unauthenticated` was on the pointer and not the value of that pointer,
which results in spurious reloads in real config reload that is easily missed in
typical test scenarios.

Includes a minor refactor of the order of fields for `Copy` and `Merge` to match
the struct fields in hopes it makes it harder to make this mistake in the
future, as well as additional test coverage.
2022-08-24 17:03:29 -04:00
Seth Hoenig
ee501f4f7d Merge pull request #14283 from hashicorp/f-java-corretto-test-case
drivers/java: add parsing test case for corretto 17
2022-08-24 15:28:20 -05:00
Seth Hoenig
bb72d81a71 Merge pull request #14297 from hashicorp/b-logmon-fork-mystery-bin
client/logmon: acquire executable in init block
2022-08-24 15:25:09 -05:00
Seth Hoenig
3ae6db666a testing: fix flakey check status test
This PR fixes a flakey test where we did not wait on the check
status to actually become failing (go too fast and you just get
a pending check).

Instead add a helper for waiting on any check in the alloc to become
the state we are looking for.
2022-08-24 15:11:41 -05:00
Seth Hoenig
24a1c48f47 client/logmon: acquire executable in init block
This PR causes the logmon task runner to acquire the binary of the
Nomad executable in an 'init' block, so as to almost certainly get
the name while the nomad file still exists.

This is an attempt at fixing the case where a deleted Nomad file
(e.g. during upgrade) may be getting renamed with a mysterious
suffix first.

If this doesn't work, as a last resort we can literally just trim
the mystery string.

Fixes: #14079
2022-08-24 13:17:20 -05:00
Piotr Kazmierczak
34e4b080f6 template: custom change_mode scripts (#13972)
This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707
2022-08-24 17:43:01 +02:00
Luiz Aoqui
9e7758247a changelog: update #14212 to breaking-change (#14292) 2022-08-24 11:36:53 -04:00
Luiz Aoqui
abeeecbe71 deps: sync versions of go-discover in go.mod (#14269)
In #13491 the version of `go-discover` was updated in `go.mod` but the
comment above it mentions that it also needs to be updated in the
`replace` directive.
2022-08-24 10:32:13 -04:00
Piotr Kazmierczak
2d4acce3da docs: Update upgrade guide to reflect enterprise changes introduced in nomad-enterprise (#14212)
This PR documents a change made in the enterprise version of nomad that addresses the following issue:

When a user tries to filter audit logs, they do so with a stanza that looks like the following:

audit {
  enabled = true

  filter "remove deletes" {
    type = "HTTPEvent"
    endpoints  = ["*"]
    stages = ["OperationComplete"]
    operations = ["DELETE"]
  }
}

When specifying both an "endpoint" and a "stage", the events with both matching a "endpoint" AND a matching "stage" will be filtered.

When specifying both an "endpoint" and an "operation" the events with both matching a "endpoint" AND a matching "operation" will be filtered.

When specifying both a "stage" and an "operation" the events with a matching a "stage" OR a matching "operation" will be filtered.

The "OR" logic with stages and operations is unexpected and doesn't allow customers to get specific on which events they want to filter. For instance the following use-case is impossible to achieve: "I want to filter out all OperationReceived events that have the DELETE verb".
2022-08-24 16:31:49 +02:00
Seth Hoenig
cd4be96bfb drivers/java: add parsing test case for corretto 17 2022-08-24 09:16:38 -05:00
Seth Hoenig
2d425bf21d build: set osusergo build tag by default (#14248)
This PR activates the osuergo build tag in GNUMakefile. This forces the os/user
package to be compiled without CGO. Doing so seems to resolve a race condition
in getpwnam_r that causes alloc creation to hang or panic on `user.Lookup("nobody")`.
2022-08-24 08:11:56 -05:00
Luiz Aoqui
43fe45d972 fix minor issues found durint ENT merge (#14250) 2022-08-23 17:22:18 -04:00
Michael Schurter
a4f007a217 core: move LicenseConfig to shared file (#14247)
This moves LicenseConfig and its Copy method to a shared file so that it can be shared with enterprise code.
2022-08-23 13:44:10 -07:00
Luiz Aoqui
3953d414e6 ui: use task state to determine if task is active (#14224)
The current implementation uses the task's finishedAt field to determine
if a task is active of not, but this check is not accurate. A task in
the "pending" state will not have finishedAt value but it's also not
active.

This discrepancy results in some components, like the inline stats chart
of the task row component, to be displayed even whey they shouldn't.
2022-08-23 15:50:40 -04:00
Luiz Aoqui
dc1c50e259 ci: fix gofmt on tasklifecycle (#14232) 2022-08-23 15:47:15 -04:00
Tim Gross
e4e445ed1d docs: fix an anchor link in secure vars docs (#14231) 2022-08-23 10:46:24 -04:00
Seth Hoenig
e49bb89093 Merge pull request #14215 from hashicorp/docs-update-checks-for-nsd
docs: update check documentation with NSD specifics
2022-08-23 09:23:53 -05:00
Seth Hoenig
701c926b45 docs: fix checks doc typo
Co-authored-by: Piotr Kazmierczak <phk@mm.st>
2022-08-23 09:23:36 -05:00
Seth Hoenig
62a4c539ea Merge pull request #14221 from hashicorp/build-require-go1.19
build: go.mod should require go1.19
2022-08-23 07:53:13 -05:00
Luiz Aoqui
6070fa0c8d allocrunner: refactor task coordinator (#14009)
The current implementation for the task coordinator unblocks tasks by
performing destructive operations over its internal state (like closing
channels and deleting maps from keys).

This presents a problem in situations where we would like to revert the
state of a task, such as when restarting an allocation with tasks that
have already exited.

With this new implementation the task coordinator behaves more like a
finite state machine where task may be blocked/unblocked multiple times
by performing a state transition.

This initial part of the work only refactors the task coordinator and
is functionally equivalent to the previous implementation. Future work
will build upon this to provide bug fixes and enhancements.
2022-08-22 18:38:49 -04:00
Phil Renaud
d752a6d5cc Remove a test pause and a lint error from #14199 (#14222) 2022-08-22 16:51:49 -04:00
Tim Gross
2eaf3d7270 allow ACL policies to be associated with workload identity (#14140)
The original design for workload identities and ACLs allows for operators to
extend the automatic capabilities of a workload by using a specially-named
policy. This has shown to be potentially unsafe because of naming collisions, so
instead we'll allow operators to explicitly attach a policy to a workload
identity.

This changeset adds workload identity fields to ACL policy objects and threads
that all the way down to the command line. It also a new secondary index to the
ACL policy table on namespace and job so that claim resolution can efficiently
query for related policies.
2022-08-22 16:41:21 -04:00
Charlie Voiselle
ab67f30e44 Make var get a blocking query as expected (#14205) 2022-08-22 16:37:21 -04:00
Charlie Voiselle
36c1c686bb Add .0 to make goenv happy (#14218) 2022-08-22 16:36:02 -04:00
Seth Hoenig
422971af34 Merge pull request #14219 from hashicorp/e2e-nsd-checks
e2e: add e2e tests for nomad service disco checks
2022-08-22 15:31:35 -05:00
Seth Hoenig
f8beb53471 e2e: add e2e tests for nomad service disco checks
This PR adds 2 e2e tests for ensuring nomad service discovery checks
get created and produce status results as expected.
2022-08-22 15:31:13 -05:00
Luiz Aoqui
934bafb922 template: use pointer values for gid and uid (#14203)
When a Nomad agent starts and loads jobs that already existed in the
cluster, the default template uid and gid was being set to 0, since this
is the zero value for int. This caused these jobs to fail in
environments where it was not possible to use 0, such as in Windows
clients.

In order to differentiate between an explicit 0 and a template where
these properties were not set we need to use a pointer.
2022-08-22 16:25:49 -04:00
Michael Schurter
90c143f66b client: stats need latest allocdir (#14204)
In #14139 this code was changed to use the original copy of the config,
but Config.AllocDir is updated in the `Client.init()` method for dev
agents.

This uses the latest version of the alloc dir (which cannot change
further at runtime without a client restart which would reinitialize
the stats collector as well).
2022-08-22 09:28:53 -07:00
Seth Hoenig
c7ab7a8313 docs: update check documentation with NSD specifics
This PR updates the checks documentation to mention support for checks
when using the Nomad service provider. There are limitations of NSD
compared to Consul, and those configuration options are now noted as
being Consul-only.
2022-08-22 10:50:26 -05:00
Phil Renaud
3940864c7c [ui] Allocation route services table: show task-level services (#14199)
Adds service fragments to allocations and union taskGroup and task services
2022-08-22 11:45:12 -04:00
Seth Hoenig
c084296b7d Merge pull request #14196 from hashicorp/f-nsd-checks-in-alloc-status
cli: display nomad service check status output in CLI commands
2022-08-22 08:33:56 -05:00
James Rasell
89d2c990dd readme: remove Gitter lobby link. (#14195)
gitter is not an officially supported forum, so we should not link
to it from the readme.
2022-08-22 10:33:20 +02:00
Seth Hoenig
21a2afd464 build: go.mod should require go1.19
Since we started using atomic.Pointer, we should specify the go1.19
requirement in our go.mod files.
2022-08-21 20:41:49 -05:00