Commit Graph

21872 Commits

Author SHA1 Message Date
Luiz Aoqui
600bf12b75 Merge missing commits from 1.2.0-beta1 release branch (#11319) 2021-10-14 16:10:05 -04:00
Luiz Aoqui
d17b6a2c2b Merge release branch (#11317) 2021-10-14 13:06:04 -04:00
Luiz Aoqui
f5d560d360 fix nomad job allocs command name (#11314) 2021-10-14 12:44:59 -04:00
Luiz Aoqui
681eeca515 docs: update Nvidia device plugin as external (#11313) 2021-10-14 12:22:31 -04:00
Dave May
bf94aad36f Remove vendor folder during make clean (#11315)
* Remove vendor folder during make clean
* Add vendor warning to make dev build command
2021-10-14 11:32:19 -04:00
Luiz Aoqui
9d2be2aee6 changlog: add entry for #10796 (#11312) 2021-10-14 09:01:43 -04:00
James Rasell
fa5addc4a1 Merge pull request #11280 from benbuzbee/log-err
Log error if there are no event handlers registered
2021-10-14 14:49:22 +02:00
Mahmood Ali
feb450a393 executor: set CpuWeight in cgroup-v2 (#11287)
Cgroup-v2 uses `cpu.weight` property instead of cpu shares:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files
. And it uses a different range (i.e. `[1, 10000]`) from cpu.shares
(i.e. `[2, 262144]`) to make things more interesting.

Luckily, the libcontainer provides a helper function to perform the
conversion
[`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value).

I have confirmed that docker/libcontainer performs the conversion as
well in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541
, and that CpuShares is ignored by libcontainer in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29
.
2021-10-14 08:46:07 -04:00
Luiz Aoqui
c0a1d3adb9 changelog: add entries for #9160 and #11078 (#11290) 2021-10-14 08:43:36 -04:00
Charlie Voiselle
8ba714e211 Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799) 2021-10-13 21:23:13 -04:00
Michael Schurter
6a0dede9b6 Merge pull request #11167 from a-zagaevskiy/master
Support configurable dynamic port range
2021-10-13 16:47:38 -07:00
Michael Schurter
fc89835daf client: improve errors & tests for dynamic ports 2021-10-13 16:25:25 -07:00
Dave May
1d30caafad cli: rename paths in debug bundle for clarity (#11307)
* Rename folders to reflect purpose
* Improve captured files test coverage
* Rename CSI plugins output file
* Add changelog entry
* fix test and make changelog message more explicit

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2021-10-13 18:00:55 -04:00
Mahmood Ali
ff1b2f7623 tests: ensure that tests restore env-var values (#11309)
Fix a test corruption issue, where a test accidentally unsets
the `NOMAD_LICENSE` environment variable, that's relied on by some
tests.

As a habit, tests should always restore the environment variable value
on test completion. Golang 1.17 introduced
[`t.Setenv`](https://pkg.go.dev/testing#T.Setenv) to address this issue.
However, as 1.0.x and 1.1.x branches target golang 1.15 and 1.16, I
opted to use a helper function to ease backports.
2021-10-13 17:26:56 -04:00
Dave May
6852f21ddd cli: Improved autocomplete support for job dispatch and operator debug (#11270)
* Add autocomplete to nomad job dispatch
* Add autocomplete to nomad operator debug
* Update incorrect comment
* Update test to verify autocomplete
* Add changelog
* Apply lint suggestions
* Create dynamic slices instead of specific length
* Align style across predictors
2021-10-12 20:01:54 -04:00
Jorge Marey
833247600b Add os-nova nomad autoscaler repo link (#11277) 2021-10-12 17:04:58 -04:00
Dave May
1bd132f09d debug: Improve namespace and region support (#11269)
* Include region and namespace in CLI output
* Add region and prefix matching for server members
* Add namespace and region API outputs to cluster metadata folder
* Add region awareness to WaitForClient helper function
* Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice
* Refactor test client agent generation
* Add tests for region
* Add changelog
2021-10-12 16:58:41 -04:00
Florian Apolloner
75cd30c548 Fixed plan diffing to handle non-unique service names. (#10965) 2021-10-12 16:42:39 -04:00
Luiz Aoqui
d4c3989e2a Update job details box (#11288) 2021-10-12 16:36:10 -04:00
Dave May
f545ac1bc4 cli: Add nomad job allocs command (#11242) 2021-10-12 16:30:36 -04:00
Luiz Aoqui
713094ffb7 wrap log messages with hclog (#11291) 2021-10-12 14:38:44 -04:00
Ben Buzbee
337c5d765b Log error if there are no event handlers registered
We see this error all the time
```
no handler registered for event
event.Message=, event.Annotations=, event.Timestamp=0001-01-01T00:00:00Z, event.TaskName=, event.AllocID=, event.TaskID=,
```

So we're handling an even with all default fields. I noted that this can
happen if only err is set as in

```
func (d *driverPluginClient) handleTaskEvents(reqCtx context.Context, ch chan *TaskEvent, stream proto.Driver_TaskEventsClient) {
	defer close(ch)
	for {
		ev, err := stream.Recv()
		if err != nil {
			if err != io.EOF {
				ch <- &TaskEvent{
					Err: grpcutils.HandleReqCtxGrpcErr(err, reqCtx, d.doneCtx),
				}
			}
```

In this case Err fails to be serialized by the logger, see this test

```

	ev := &drivers.TaskEvent{
		Err: fmt.Errorf("errz"),
	}
	i.logger.Warn("ben test", "event", ev)
	i.logger.Warn("ben test2", "event err str", ev.Err.Error())
	i.logger.Warn("ben test3", "event err", ev.Err)
	ev.Err = nil
	i.logger.Warn("ben test4", "nil error", ev.Err)

2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.643900Z","driver":"mock_driver","event":{"TaskID":"","TaskName":"","AllocID":"","Timestamp":"0001-01-01T00:00:00Z","Message":"","Annotations":null,"Err":{}}}
2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test2","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644226Z","driver":"mock_driver","event err str":"errz"}
2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test3","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644240Z","driver":"mock_driver","event err":"errz"}
2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test4","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644252Z","driver":"mock_driver","nil error":null}
```

Note in the first example err is set to an empty object and the error is
lost.

What we want is the last two examples which call out the err field
explicitly so we can see what it is in this case
2021-10-11 19:44:52 +00:00
Bryce Kalow
721f388f43 website: upgrade deps to fix search styles (#11294) 2021-10-11 11:33:59 -05:00
Aleksandr Zagaevskiy
0620bb04a5 fixup! Support configurable dynamic port range 2021-10-11 14:13:59 +03:00
James Rasell
8378d00d66 Merge pull request #11283 from hashicorp/f-update-hclog-dep
deps: update hashicorp/go-hclog to v1.0.0
2021-10-11 08:39:41 +02:00
Jai
0564f9fa68 System Batch UI, Client Status Bar Chart and Client Tab page view (#11078) 2021-10-07 17:11:38 -04:00
Michael Lange
c50b75178f Merge pull request #11279 from hashicorp/f-ui/storybook-upgrade
UI: Storybook upgrade
2021-10-07 09:17:27 -07:00
James Rasell
dd07f07ec8 changelog: add entry for #11283 2021-10-07 08:16:05 +01:00
James Rasell
594ba94878 deps: update hashicorp/go-hclog to v1.0.0 2021-10-07 07:48:41 +01:00
Matt Mukerjee
0881b94201 Add FailoverHeartbeatTTL to config (#11127)
FailoverHeartbeatTTL is the amount of time to wait after a server leader failure
before considering reallocating client tasks. This TTL should be fairly long as
the new server leader needs to rebuild the entire heartbeat map for the
cluster. In deployments with a small number of machines, the default TTL (5m)
may be unnecessary long. Let's allow operators to configure this value in their
config files.
2021-10-06 18:48:12 -04:00
Michael Lange
b9937dfc38 Migrate: New hierarchical separator 2021-10-06 14:05:32 -07:00
Michael Lange
90eabb6955 Migrate decorator to new file layout 2021-10-06 14:05:32 -07:00
Michael Lange
51d2873c3d Override the app rootURL for storybook
Hopefully this work gets merged into ember-cli-storybook. For the time
being, we get a fork instead.
2021-10-06 14:05:32 -07:00
Michael Lange
95d4af91f2 Storybook for ember workaround 2021-10-06 14:05:32 -07:00
Michael Lange
5385021b2c Upgrade Storybook configuration for v6 2021-10-06 14:05:32 -07:00
Amit Shuster
215bf04bc6 Lightrun Integration - External task driver (#11203) 2021-10-06 15:34:34 -04:00
Shantanu Gadgil
20b44d77bd auth_soft_fail needed for public images when agent is configured with auth (#11190) 2021-10-06 15:30:23 -04:00
Leela Venkaiah G
3eb852fcfe [demo] Kadalu CSI support for Nomad (#11207) 2021-10-06 15:29:15 -04:00
Michael Lange
f9bc9ca6f7 Upgrade storybook from 5 to 6 2021-10-06 11:06:57 -07:00
Mahmood Ali
bc2a51d43a executor: suppress spurious log messages (#11273)
Suppress stats streaming error log messages when task finishes.
Streaming errors are expected when a task finishes and they aren't
actionable to users.

Also, note that the task runner Stats hook retries collecting stats
after a delay. If the connection terminates prematurely, it will be
retried, and closing the stats stream is not very disruptive.

Ideally, executor terminates cleanly when task exits, but that's a more
substantial change that may require changing the executor/drivers interface.

Fixes #10814
2021-10-06 12:42:35 -04:00
Florian Apolloner
5624603893 Fixed creation of ControllerCreateVolumeRequest. (#11238) 2021-10-06 10:17:39 -04:00
Florian Apolloner
6cb36971c6 Added support for -force-color to the CLI. (#10975) 2021-10-06 10:02:42 -04:00
Yan
c21493a560 add -show-url option for ui command (#11213) 2021-10-05 20:08:42 -04:00
Michael Schurter
25b2e54751 Merge pull request #11268 from hashicorp/docs-1.1.6-changelog
docs: add 1.1.6 and 1.0.12 to changelog
2021-10-05 16:49:17 -07:00
Michael Schurter
ddf780a5ba docs: bump version to 1.1.6 on website 2021-10-05 16:35:33 -07:00
Michael Schurter
09d617f59f docs: add 1.1.6 and 1.0.12 to changelog 2021-10-05 16:34:24 -07:00
Bryce Kalow
f082d51c15 website: upgrade dependencies (#11247) 2021-10-05 13:31:14 -05:00
Mahmood Ali
1cb2049b91 Merge pull request #11261 from hashicorp/b-logmon-leak
Fix a logmon goroutine and memory leak
2021-10-05 13:41:29 -04:00
Mahmood Ali
40c772033b add changelog 2021-10-05 13:01:19 -04:00
Mahmood Ali
6aa8913392 logmon: Fix a memory leak on task restart
Fix a logmon leak causing high goroutine and memory usage when a task
restarts.

Logmon `FileRotator` buffers the task stdout/stderr streams and
periodically flushing them to log files. Logmon creates a new
FileRotator for each stream for each task run. However, the
`flushPeriodically` goroutine is leaked when a task restarts,
holding a reference to a no-longer-needed `FileRotator` instance
along with its 64kb buffer.

The cause is that the code assumed `time.Ticker.Stop()` closes the
ticker channel, thereby terminating the goroutine, but the documentation
says otherwise:

> Stop turns off a ticker. After Stop, no more ticks will be sent. Stop does not close the channel, to prevent a concurrent goroutine reading from the channel from seeing an erroneous "tick".
https://pkg.go.dev/time#Ticker.Stop
2021-10-05 12:11:53 -04:00