Commit Graph

16684 Commits

Author SHA1 Message Date
Mahmood Ali
af2e2bc7ed cli: sequence cli.Ui operations
Fixes a bug where if a command flag parsing errors, the resulting error
and help usage messages get interleaved in unexpected and non-user
friendly way.

The reason is that we have flag parsing library effectively writes to
ui.Error in a goroutine.  This is problematic: first, we lose the sequencing between help
usage and error message; second, cli.Ui methods are not concurrent safe.

Here, we introduce a custom error writer that buffers result and calls
ui.Error() in the write method and in the same goroutine.

For context, we need to wrap ui.Error because it's line-oriented, while
flags library expects a io.Writer which is bytes oriented.
2019-12-16 10:08:17 -05:00
Michael Schurter
7700d38457 Merge pull request #6855 from hashicorp/b-interp-connect-task
connect: canonicalize before adding sidecar
2019-12-13 09:26:44 -08:00
Mahmood Ali
20f8227c0a Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob
driver: allow disabling log collection
2019-12-13 11:41:20 -05:00
Mahmood Ali
0bdf7a9e23 Merge pull request #6556 from hashicorp/c-vendor-multierror-20191025
Update go-multierror library
2019-12-13 11:32:42 -05:00
Mahmood Ali
e82dad732b address review comments 2019-12-13 11:21:00 -05:00
Mahmood Ali
013097b241 tests: fix error format assertion
multierror library changed formatting slightly.
2019-12-13 11:01:20 -05:00
Buck Doyle
fd043056e1 Update changelog with #6817 2019-12-13 09:16:31 -06:00
Mahmood Ali
7c749ff874 Update go-multierror to 72917a1
To pick up https://github.com/hashicorp/go-multierror/pull/28
2019-12-13 10:13:31 -05:00
Buck Doyle
fabcf7c1a9 Fix flapping status light test (#6852)
I unintentionally introduced a flapping test in #6817. The
draining status of the node will be randomly chosen and
that flag takes precedence over eligibility. This forces
the draining flag to be false rather than random so the
test should no longer flap.

See here for an example failure:
https://circleci.com/gh/hashicorp/nomad/26368
2019-12-13 09:02:02 -06:00
Preetha Appan
14e34dd948 update changelog 2019-12-13 08:22:14 -06:00
Mahmood Ali
93694f8dd3 Merge pull request #6839 from hashicorp/b-cgroup-cleanup
executor: stop joining executor to container cgroup
2019-12-13 09:05:09 -05:00
Michael Schurter
4f718f7ba6 docs: add #6855 to changelog
Also make Connect related fixes more consistent in the changelog. I
suspect users won't care if a Connect related fix is in the server's
admission controller or in the client's groupservice hook or somewhere
else, so I think grouping them by `consul/connect:` makes the most
sense.
2019-12-12 20:58:49 -08:00
Michael Schurter
c74de6a455 connect: canonicalize before adding sidecar
Fixes #6853

Canonicalize jobs first before adding any sidecars. This fixes a bug
where sidecar tasks were added without interpolated names and broke
validation. Sidecar tasks must be canonicalized independently.

Also adds a group network to the mock connect job because it wasn't a
valid connect job before!
2019-12-12 20:55:56 -08:00
Mahmood Ali
416b3e7483 Merge pull request #6854 from hashicorp/update-changelog
Add notarization details to changelog
2019-12-12 20:56:53 -05:00
Michele
7060880626 Add clarifying update 2019-12-12 15:28:47 -08:00
Michele
d7818c2b4e Add apple notarization note 2019-12-12 15:24:18 -08:00
Preetha
37d421e782 Merge pull request #6849 from hashicorp/b-debug-preemption
Use debug logging for scheduler internals
2019-12-12 16:15:46 -06:00
Preetha Appan
be897cadc3 More error->debug for logging in the bin packing iterator 2019-12-12 15:50:16 -06:00
ebarriosjr
ba1e66c42b driver/pot: Added extra_hosts and args commands (#6577) 2019-12-12 16:29:45 -05:00
Buck Doyle
83d92251c5 UI: Fix client sorting (#6817)
There are two changes here, and some caveats/commentary:

1. The “State“ table column was actually sorting only by status. The state was not an actual property, just something calculated in each client row, as a product of status, isEligible, and isDraining. This PR adds isDraining as a component of compositeState so it can be used for sorting.

2. The Sortable mixin declares dependent keys that cause the sort to be live-updating, but only if the members of the array change, such as if a new client is added, but not if any of the sortable properties change. This PR adds a SortableFactory function that generates a mixin whose listSorted computed property includes dependent keys for the sortable properties, so the table will live-update if any of the sortable properties change, not just the array members. There’s a warning if you use SortableFactory without dependent keys and via the original Sortable interface, so we can eventually migrate away from it.
2019-12-12 13:06:54 -06:00
Michael Lange
630723cf2d Merge pull request #6808 from hashicorp/b-ui/unclosed-log-streams
UI: Unclosed log streams
2019-12-12 10:55:49 -08:00
Preetha Appan
ed1f30e799 Use debug logging for scheduler internals
We currently log an error if preemption is unable to find a suitable set of
allocations to preempt. This commit changes that to debug level since not finding
preemptable allocations is not an error condition.
2019-12-12 12:05:29 -06:00
Tim Gross
7cda140981 e2e: run client/allocs metrics tests nightly (#6842)
Refactor the metrics end-to-end tests so they can be run with our e2e
test framework. Runs fabio/prometheus and a collection of jobs that
will cause metrics to be measured. We then query Prometheus to ensure
we're publishing those allocation metrics and some metrics from the
clients as well.

Includes adding a placeholder for running the same tests on Windows.
2019-12-12 12:45:16 -05:00
Mahmood Ali
f794b49ec6 simplify cgroup path lookup 2019-12-11 12:43:25 -05:00
Seth Hoenig
57b38a0fa8 Merge pull request #6838 from hashicorp/f-parallelize-state-store-tests
tests: parallelize state store tests
2019-12-11 11:05:52 -06:00
Mahmood Ali
596d0be5d8 executor: stop joining executor to container cgroup
Stop joining libcontainer executor process into the newly created task
container cgroup, to ensure that the cgroups are fully destroyed on
shutdown, and to make it consistent with other plugin processes.

Previously, executor process is added to the container cgroup so the
executor process resources get aggregated along with user processes in
our metric aggregation.

However, adding executor process to container cgroup adds some
complications with much benefits:

First, it complicates cleanup.  We must ensure that the executor is
removed from container cgroup on shutdown.  Though, we had a bug where
we missed removing it from the systemd cgroup.  Because executor uses
`containerState.CgroupPaths` on launch, which includes systemd, but
`cgroups.GetAllSubsystems` which doesn't.

Second, it may have advese side-effects.  When a user process is cpu
bound or uses too much memory, executor should remain functioning
without risk of being killed (by OOM killer) or throttled.

Third, it is inconsistent with other drivers and plugins.  Logmon and
DockerLogger processes aren't in the task cgroups.  Neither are
containerd processes, though it is equivalent to executor in
responsibility.

Fourth, in my experience when executor process moves cgroup while it's
running, the cgroup aggregation is odd.  The cgroup
`memory.usage_in_bytes` doesn't seem to capture the full memory usage of
the executor process and becomes a red-harring when investigating memory
issues.

For all the reasons above, I opted to have executor remain in nomad
agent cgroup and we can revisit this when we have a better story for
plugin process cgroup management.
2019-12-11 11:28:09 -05:00
Mahmood Ali
2f4b9da61a drivers/exec: test all cgroups are destroyed 2019-12-11 11:12:29 -05:00
Seth Hoenig
35fdada2f9 tests: parallelize state store tests
It has been decided we're going to live in a many core world.
Let's take advantage of that and parallelize these state store
tests which all run in memory and are largely CPU bound.

An unscientific benchmark demonstrating the improvement:

[mp state (master)] $ go test
PASS
ok  	github.com/hashicorp/nomad/nomad/state	5.162s

[mp state (f-parallelize-state-store-tests)] $ go test
PASS
ok  	github.com/hashicorp/nomad/nomad/state	1.527s
2019-12-11 09:36:37 -06:00
Tim Gross
8babbf4f1b doc: spread is inherited from job to group (#6837) 2019-12-11 09:59:26 -05:00
Drew Bailey
c28722e74d Merge pull request #6834 from hashicorp/monitor-changelog
add 6828 to changelog
2019-12-11 08:17:12 -05:00
Michael Schurter
948b24acc2 Merge pull request #6833 from hashicorp/sentinel-imports-note
Make note of Sentinel standard imports
2019-12-10 13:56:01 -08:00
Chris Arcand
bdb70ef09c Make note of Sentinel standard imports
> Sentinel-embedded applications can choose to whitelist or blacklist
certain standard imports. Please reference the documentation for the
Sentinel-enabled application you're using to determine if all standard
imports are available.
2019-12-10 14:44:51 -06:00
Drew Bailey
fbf22eff9d add 6828 to changelog 2019-12-10 15:02:34 -05:00
Tim Gross
5e3efbd3ec doc: explain ALLOC_INDEX uniqueness guarantees (#6830)
The `ALLOC_INDEX` isn't guaranteed to be unique, and this has caused
some user confusion. The servers make a best-effort attempt to make
this value unique from 0 to count-1 but when you have canaries on the
task group, there are reused indexes because you have multiple job
versions running at the same time. If a user needs a unique number for
interpolating a value in your application, they can get this by
combining the job version and the alloc index.

Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
2019-12-10 10:30:26 -05:00
Danielle
34a5a3a6a6 Merge pull request #6828 from hashicorp/b/nomad-monitor-panic
command: error when no node is found for `monitor`
2019-12-10 14:29:32 +01:00
Danielle Lancashire
c91f8da7f0 command: error when no node is found for monitor
Currently `nomad monitor -node-id` will panic when a node-id does not
match any nodes, as there is no empty result bounds checking. Here we
return an error to the user when no nodes are found.
2019-12-10 13:10:47 +01:00
Chris Dickson
bbb6b2af09 client: expose allocated CPU per task (#6784) 2019-12-09 15:40:22 -05:00
Seth Hoenig
ffba749001 Merge pull request #6800 from hashicorp/b-update-freeport
tests: swap lib/freeport for tweaked helper/freeport
2019-12-09 09:50:26 -06:00
Tim Gross
e7f9a06c9f Merge pull request #6631 from hashicorp/dependabot/npm_and_yarn/ui/lodash.mergewith-4.6.2
Bump lodash.mergewith from 4.6.1 to 4.6.2 in /ui
2019-12-09 09:47:14 -05:00
Tim Gross
3717c3cae2 Merge pull request #6629 from hashicorp/dependabot/npm_and_yarn/ui/lodash.defaultsdeep-4.6.1
Bump lodash.defaultsdeep from 4.6.0 to 4.6.1 in /ui
2019-12-09 09:47:05 -05:00
Seth Hoenig
94c60b4cfa tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
dependabot[bot]
551ff191ce Bump lodash.mergewith from 4.6.1 to 4.6.2 in /ui
Bumps [lodash.mergewith](https://github.com/lodash/lodash) from 4.6.1 to 4.6.2.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/commits)

Signed-off-by: dependabot[bot] <support@github.com>
2019-12-09 13:37:45 +00:00
Tim Gross
cb7ed5fee9 Merge pull request #6628 from hashicorp/dependabot/npm_and_yarn/ui/handlebars-4.1.2
Bump handlebars from 4.1.1 to 4.1.2 in /ui
2019-12-09 08:37:10 -05:00
Tim Gross
701cab81f5 Bump fstream from 1.0.11 to 1.0.12 in /ui (#6630)
Bumps [fstream](https://github.com/npm/fstream) from 1.0.11 to 1.0.12.
- [Release notes](https://github.com/npm/fstream/releases)
- [Commits](https://github.com/npm/fstream/compare/v1.0.11...v1.0.12)

Signed-off-by: dependabot[bot] <support@github.com>
2019-12-09 08:36:57 -05:00
dependabot[bot]
132b0dc513 Bump lodash.merge from 4.6.1 to 4.6.2 in /ui (#6632)
Bumps [lodash.merge](https://github.com/lodash/lodash) from 4.6.1 to 4.6.2.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/commits)

Signed-off-by: dependabot[bot] <support@github.com>
2019-12-09 08:36:44 -05:00
Mahmood Ali
943854469d driver: allow disabling log collection
Operators commonly have docker logs aggregated using various tools and
don't need nomad to manage their docker logs.  Worse, Nomad uses a
somewhat heavy docker api call to collect them and it seems to cause
problems when a client runs hundreds of log collections.

Here we add a knob to disable log aggregation completely for nomad.
When log collection is disabled, we avoid running logmon and
docker_logger for the docker tasks in this implementation.

The downside here is once disabled, `nomad logs ...` commands and API
no longer return logs and operators must corrolate alloc-ids with their
aggregated log info.

This is meant as a stop gap measure.  Ideally, we'd follow up with at
least two changes:

First, we should optimize behavior when we can such that operators don't
need to disable docker log collection.  Potentially by reverting to
using pre-0.9 syslog aggregation in linux environments, though with
different trade-offs.

Second, when/if logs are disabled, nomad logs endpoints should lookup
docker logs api on demand.  This ensures that the cost of log collection
is paid sparingly.
2019-12-08 14:15:03 -05:00
Mahmood Ali
16aef03331 Merge pull request #6788 from hashicorp/b-timeout-logmon-stop
logmon: add timeout to RPC operations
2019-12-06 19:12:06 -05:00
abhip
ad2af255b1 Update consensus.html.md (#6813)
The url for raft algorithm pdf is no longer valid. Here is correct url - https://raft.github.io/raft.pdf  and website is https://raft.github.io/
2019-12-06 06:17:30 -08:00
Seth Hoenig
3ae03b4fed Merge pull request #6814 from hashicorp/f-use-golangci-lint
swap gometalint for golangci-lint
2019-12-06 08:16:17 -06:00
dependabot[bot]
c9577724f6 Bump lodash.defaultsdeep from 4.6.0 to 4.6.1 in /ui
Bumps [lodash.defaultsdeep](https://github.com/lodash/lodash) from 4.6.0 to 4.6.1.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.6.0...4.6.1)

Signed-off-by: dependabot[bot] <support@github.com>
2019-12-06 14:13:32 +00:00