Commit Graph

18911 Commits

Author SHA1 Message Date
Mahmood Ali
60dd7aecc9 nvidia: support disabling the nvidia plugin (#8353) 2020-07-21 10:11:16 -04:00
Mahmood Ali
ae7626362d Changelog updates and tweaks (#8479) 2020-07-21 08:05:20 -04:00
Buck Doyle
56d96940c5 Update CHANGELOG with 8460 and 8463 (#8474) 2020-07-20 16:21:35 -05:00
Buck Doyle
39d3174207 Add specificity to exec allocation URL generation (#8463)
Thanks to @notnoop for this UX improvement suggestion.
The allocation’s task group is always known, so it
might as well be preselected in the sidebar when the
exec window opens. Also, if the task group only has
one task, might as well preselect it too.
2020-07-20 16:07:39 -05:00
Buck Doyle
a4a5343efa Fix typo in exec button URL-generation (#8460)
This closes #8422, another bug facilitated by the difficulty
of automated testing when opening another window. Thanks to
@notnoop for narrowing this down.
2020-07-20 16:06:55 -05:00
Tim Gross
e855fc07bc remove stalebot (#8466)
Bring Nomad in line with other HashiCorp projects and remove stalebot. We get
little value in cleaning up issues automatically this way, it adds extra work
for maintainers when we have issues waiting on the backlog that we intend to
do, and it presents an unkind experience to issue contributors who get their
issues closed by an impersonal bot.
2020-07-20 14:50:32 -04:00
Tim Gross
d5de6c919f changelog for host_network bug (#8469) 2020-07-20 13:33:49 -04:00
Mahmood Ali
e023f9c64c Merge pull request #8467 from hashicorp/c-golang-1.14.6
Use golang 1.14.6
2020-07-20 12:37:12 -04:00
Mahmood Ali
3969d17b05 update changelog 2020-07-20 12:14:25 -04:00
dependabot[bot]
6b3ff2ee19 Bump lodash from 4.17.14 to 4.17.19 in /ui (#8449)
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.14 to 4.17.19.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.14...4.17.19)
2020-07-20 11:08:06 -05:00
Mahmood Ali
039cd28b92 Use golang 1.14.6
Pick up fixes [golang 1.14.6 bug fixes](https://github.com/golang/go/issues?q=milestone%3AGo1.14.6+label%3ACherryPickApproved), specially the one where reflect.DeepEqual returns true even if values don't match, affecting our tests integrity.
2020-07-20 12:04:38 -04:00
Tim Gross
c4eb7af5f8 changelog item for MRD canary bugfix (#8465) 2020-07-20 11:36:24 -04:00
Tim Gross
6ed0f4e564 scheduler: DesiredCanaries can be set on every pass safely
The reconcile loop sets `DeploymentState.DesiredCanaries` only on the first
pass through the loop and if the job is not paused/pending. In MRD,
deployments will make one pass though the loop while "pending", and were not
ever getting `DesiredCanaries` set. We can't set it in the initial
`DeploymentState` constructor because the first pass through setting up
canaries expects it's not there yet. However, this value is static for a given
version of a job because it's coming from the update stanza, so it's safe to
re-assign the value on subsequent passes.
2020-07-20 11:25:53 -04:00
Tim Gross
5cb11b78d1 refactor: make it clear where we're accessing dstate
The field name `Deployment.TaskGroups` contains a map of `DeploymentState`,
which makes it a little harder to follow state updates when combined with
inconsistent naming conventions, particularly when we also have the state
store or actual `TaskGroup`s in scope. This changeset changes all uses to
`dstate` so as not to be confused with actual TaskGroups.
2020-07-20 11:25:53 -04:00
Lang Martin
5f7d252361 structs: Job.Validate only allows stop_after_client_disconnected on batch and service jobs (#8444)
* nomad/structs/structs: add to Job.Validate

* Update nomad/structs/structs.go

Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>

* nomad/structs/structs: match error strings to the config file

* nomad/structs/structs_test: clarify the test a bit

* nomad/structs/structs_test: typo in the test error comparison

Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-07-20 10:27:25 -04:00
Mahmood Ali
485cdf2bd9 Merge pull request #8461 from hashicorp/backport-ent-changes-20200720
Back-port Some Enterprise changes
2020-07-20 10:25:34 -04:00
Mahmood Ali
c9e51fb255 Remove unused state.TestInitState 2020-07-20 09:55:55 -04:00
Mahmood Ali
68bd10c22d minor tweaks from Ent 2020-07-20 09:25:09 -04:00
Mahmood Ali
040cc8f5af enterprise specific state store objects 2020-07-20 09:22:26 -04:00
Michael Schurter
3400d77936 Merge pull request #8458 from hashicorp/docs-8457
docs: document enterprise upgrade bug #8457
2020-07-17 13:14:18 -07:00
Michael Schurter
6109cf3e85 docs: document enterprise upgrade bug #8457 2020-07-17 11:57:29 -07:00
Mahmood Ali
32910ae0ba Refactor setupLoggers 2020-07-17 11:05:57 -04:00
Mahmood Ali
c7828c1f7c Set AgentShutdown 2020-07-17 11:04:57 -04:00
Mahmood Ali
c5b2895b0b Fix pro tags 2020-07-17 11:02:00 -04:00
Tim Gross
3b52b39c50 mrd: reconcile should treat pending deployments as paused (#8446)
If a job update includes a task group that has no changes, those allocations
have their version bumped in-place. The ends up triggering an eval from
`deploymentwatcher` when it verifies their health. Although this eval is a
no-op, we were only treating pending deployments the same as paused when
the deployment was a new MRD. This means that any eval after the initial one
will kick off the deployment, and that caused pending deployments to "jump
the queue" and run ahead of schedule, breaking MRD invariants and resulting in
a state with all regions blocked.

This behavior can be replicated even in the case of job updates with no
in-place updates by patching `deploymentwatcher` to inject a spurious no-op
eval. This changeset fixes the behavior by treating pending deployments the
same as paused in all cases in the reconciler.
2020-07-16 13:00:08 -04:00
Charlie Voiselle
238f7dcb57 Merge pull request #8437 from angrycub/d-reschedule-in-deploys
[docs] Rescheduling does happen during deployments
2020-07-15 15:24:21 -04:00
Mahmood Ali
24a7506a36 Merge pull request #8435 from hashicorp/b-atomic-job-register
Atomic eval insertion with job (de-)registration
2020-07-15 13:48:07 -04:00
Michael Schurter
7346774771 Merge pull request #8441 from hashicorp/build-go1.14.5
build: update from Go 1.14.4 to Go 1.14.5
2020-07-15 10:34:15 -07:00
Mahmood Ali
bee8efd771 Merge pull request #8383 from hashicorp/docs-security-model-followup
Revise security model feedback
2020-07-15 13:11:39 -04:00
Michael Schurter
b7e677d315 build: update from Go 1.14.4 to Go 1.14.5
Go 1.14.4 contains two CVEs which are fixed in 1.14.5:

 - [CVE-2020-15586](https://golang.org/issue/34902)
 - [CVE-2020-14039](https://golang.org/issue/39360)

Upon consideration with HashiCorp security these CVEs are considered low
severity for Nomad and no new security fix binary will be released.
2020-07-15 09:49:06 -07:00
Mahmood Ali
71d433dfa2 Merge pull request #8436 from kneufeld/master
fixed typo in output
2020-07-15 12:18:48 -04:00
Mahmood Ali
37ad947607 comment compat concern in fsm.go 2020-07-15 11:23:49 -04:00
Mahmood Ali
921a42b487 no need to handle duplicate evals anymore 2020-07-15 11:14:49 -04:00
Mahmood Ali
a6a96c47e4 only set args.Eval after all servers upgrade
We set the Eval field on job (de-)registration only after all servers
get upgraded, to avoid dealing with duplicate evals.
2020-07-15 11:10:57 -04:00
Mahmood Ali
6a082ade33 time.Now().UTC().UnixNano() -> time.Now().UnixNano() 2020-07-15 08:49:17 -04:00
Charlie Voiselle
2611a98f74 [docs] Reschedule does happen during deployments 2020-07-14 16:29:30 -04:00
Kurt Neufeld
789e3091a1 fixed typo in output (#1) 2020-07-14 10:33:17 -06:00
Mahmood Ali
97c69ee9a7 Atomic eval insertion with job (de-)registration
This fixes a bug where jobs may get "stuck" unprocessed that
dispropotionately affect periodic jobs around leadership transitions.
When registering a job, the job registration and the eval to process it
get applied to raft as two separate transactions; if the job
registration succeeds but eval application fails, the job may remain
unprocessed. Operators may detect such failure, when submitting a job
update and get a 500 error code, and they could retry; periodic jobs
failures are more likely to go unnoticed, and no further periodic
invocations will be processed until an operator force evaluation.

This fixes the issue by ensuring that the job registration and eval
application get persisted and processed atomically in the same raft log
entry.

Also, applies the same change to ensure atomicity in job deregistration.

Backward Compatibility

We must maintain compatibility in two scenarios: mixed clusters where a
leader can handle atomic updates but followers cannot, and a recent
cluster processes old log entries from legacy or mixed cluster mode.

To handle this constraints: ensure that the leader continue to emit the
Evaluation log entry until all servers have upgraded; also, when
processing raft logs, the servers honor evaluations found in both spots,
the Eval in job (de-)registration and the eval update entries.

When an updated server sees mix-mode behavior where an eval is inserted
into the raft log twice, it ignores the second instance.

I made one compromise in consistency in the mixed-mode scenario: servers
may disagree on the eval.CreateIndex value: the leader and updated
servers will report the job registration index while old servers will
report the index of the eval update log entry. This discripency doesn't
seem to be material - it's the eval.JobModifyIndex that matters.
2020-07-14 11:59:29 -04:00
Tim Gross
3703489702 MRD: all regions should start pending (#8433)
Deployments should wait until kicked off by `Job.Register` so that we can
assert that all regions have a scheduled deployment before starting any
region. This changeset includes the OSS fixes to support the ENT work.

`IsMultiregionStarter` has no more callers in OSS, so remove it here.
2020-07-14 10:57:37 -04:00
Tim Gross
5cfd314660 changelog for MRD datacenters validation (#8429) 2020-07-13 14:03:40 -04:00
Tim Gross
3dd29953f7 multiregion: allow empty region DCs (#8426)
It's supposed to be possible for a region not to have `datacenters` set so
that it can use the job's `datacenters` field. This requires that operators
use the same DC name across multiple regions, but that's the default client
configuration.
2020-07-13 13:34:19 -04:00
Michael Schurter
f6f697a3fb Merge pull request #7042 from hashicorp/docs-healthy-deadline
docs: clarify healthy/progress_deadline relationship
2020-07-13 08:40:11 -07:00
Buck Doyle
fc1fce6934 Change edition to Octane (#8418)
This updates the Ember edition setting to Octane, which I removed from #8319
because it required the template-only Glimmer components setting to be turned
on, which this does. These changes to templates accommodate that setting.
2020-07-13 09:26:12 -05:00
Michael Lange
dc6aa33916 Merge pull request #8412 from hashicorp/b-ui/prefix-run-button
UI: Filter out new records from the job list page
2020-07-10 15:41:43 -07:00
Michael Lange
8c3d514d0c Changelog addition 2020-07-10 15:31:25 -07:00
Michael Lange
0d433b1c89 Filter out new records from the job list page
When a prefix is set and the run job button is clicked,
the new job causes an error because it has no name yet.
2020-07-10 15:29:52 -07:00
Michael Lange
99fce00e21 Merge pull request #8413 from hashicorp/b-ui/namespaces-after-token
UI: Reset the system and refetch namespaces with every token change
2020-07-10 15:29:35 -07:00
Michael Lange
da024109ee Changelog additions 2020-07-10 15:20:25 -07:00
Michael Lange
970ed734df Reset the system and refetch namespaces with every token change 2020-07-10 15:18:36 -07:00
Seth Hoenig
1307395298 Merge pull request #8419 from hashicorp/docs-cl-vault-id-checks
docs: update changelog for vault policies lookup fix
2020-07-10 13:18:11 -05:00