nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-10 12:25:42 +03:00

Author	SHA1	Message	Date
Michael Schurter	d4553b7569	Merge pull request #6218 from hashicorp/f-consul-defaults consul: use Consul's defaults and env vars	2019-08-28 11:54:44 -07:00
Mahmood Ali	b2ef75e10d	Merge pull request #6216 from hashicorp/b-recognize-pending-allocs alloc_runner: wait when starting suspicious allocs	2019-08-28 14:46:09 -04:00
Mahmood Ali	8b05f87140	rename to hasLocalState, and ignore clientstate The ClientState being pending isn't a good criteria; as an alloc may have been updated in-place before it was completed. Also, updated the logic so we only check for task states. If an alloc has deployment state but no persisted tasks at all, restore will still fail.	2019-08-28 11:44:48 -04:00
Mahmood Ali	ddf2f6be4d	Merge pull request #6219 from hashicorp/c-circleci-upgrade-machine-img upgrade machine image for most jobs	2019-08-28 11:27:04 -04:00
Lang Martin	3f0f3a06c0	Merge pull request #6215 from hashicorp/f-upgrade-go-getter upgrade go-getter, leave compiled protobuf at version 1.2	2019-08-28 11:01:31 -04:00
Nick Ethier	99742f2665	ar: ensure network forwarding is allowed for bridged allocs (#6196 ) * ar: ensure network forwarding is allowed in iptables for bridged allocs * ensure filter rule exists at setup time	2019-08-28 10:51:34 -04:00
Mahmood Ali	d99da5b656	upgrade machine image for most jobs Looks like the host unattended upgrades is interferring with chroot creation. Here, we upgrade machine image to one without unattended upgrades misconfigured, across the board except for the `test-docker` job. Docker seems to be misbehaving on that image, and we get some unexpected cgroups errors, e.g. https://circleci.com/gh/hashicorp/nomad/3854 . Sample recent failures of `test-exec`: https://circleci.com/gh/hashicorp/nomad/3633 https://circleci.com/gh/hashicorp/nomad/3696 https://circleci.com/gh/hashicorp/nomad/3714 https://circleci.com/gh/hashicorp/nomad/3764 https://circleci.com/gh/hashicorp/nomad/3770 https://circleci.com/gh/hashicorp/nomad/3834	2019-08-28 09:50:56 -04:00
Nick Ethier	f631ec6c2d	cli: display group ports and address in alloc status command output (#6189 ) * cli: display group ports and address in alloc status command output * add assertions for port.To = -1 case and convert assertions to testify	2019-08-27 23:59:36 -04:00
Nick Ethier	51750f5732	Add environment variables for connect upstreams (#6171 ) * taskenv: add connect upstream env vars + test * set taskenv upstreams instead of appending * Update client/taskenv/env.go Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-27 23:41:38 -04:00
Michael Schurter	6a1bdf04c4	consul: use Consul's defaults and env vars Use Consul's API package defaults and env vars as Nomad's defaults.	2019-08-27 14:56:52 -07:00
Mahmood Ali	493945a8a4	Alternative approach: avoid restoring This uses an alternative approach where we avoid restoring the alloc runner in the first place, if we suspect that the alloc may have been completed already.	2019-08-27 17:30:55 -04:00
Lang Martin	23d1214947	match pinned versions for sub-modules	2019-08-27 12:58:12 -04:00
Jasmine Dahilig	80dfa33223	expose nomad namespace as environment variable in allocation #5692 (#6192 )	2019-08-27 08:38:07 -07:00
Jasmine Dahilig	d29fa2b48c	remove network stanza from job init --short example jobspec (#6179 )	2019-08-27 07:36:32 -07:00
Mahmood Ali	cbc521e1e7	alloc_runner: wait when starting suspicious allocs This commit aims to help users running with clients suseptible to the destroyed alloc being restrarted bug upgrade to latest. Without this, such users will have their tasks run unexpectedly on upgrade and only see the bug resolved after subsequent restart. If, on restore, the client sees a pending alloc without any other persisted info, then err on the side that it's an corrupt persisted state of an alloc instead of the client happening to be killed right when alloc is assigned to client. Few reasons motivate this behavior: Statistically speaking, corruption being the cause is more likely. A long running client will have higher chance of having allocs persisted incorrectly with pending state. Being killed right when an alloc is about to start is relatively unlikely. Also, delaying starting an alloc that hasn't started (by hopefully seconds) is not as severe as launching too many allocs that may bring client down. More importantly, this helps customers upgrade their clients without risking taking their clients down and destablizing their cluster. We don't want existing users to force triggering the bug while they upgrade and restart cluster.	2019-08-26 22:05:31 -04:00
Lang Martin	0aa79ca764	govendor fetch github.com/hashicorp/go-getter@f5101da, protobuf 1.2	2019-08-26 17:54:21 -04:00
Mahmood Ali	f61637026e	Merge pull request #6207 from hashicorp/b-gc-destroyed-allocs-rerun Don't persist allocs of destroyed alloc runners	2019-08-26 17:26:18 -04:00
Tim Gross	e2efeb4911	init: add generated assets into bindata	2019-08-26 14:24:15 -04:00
Mahmood Ali	ff3dedd534	Write to client store while holding lock Protect against a race where destroying and persist state goroutines race. The downside is that the database io operation will run while holding the lock and may run indefinitely. The risk of lock being long held is slow destruction, but slow io has bigger problems.	2019-08-26 13:45:58 -04:00
Danielle	8066f9b8f0	Merge pull request #6181 from hashicorp/dani/scheduler-vol-ro scheduler: Implicit constraint on readonly hostvol	2019-08-26 17:01:49 +02:00
Mahmood Ali	925eed89c6	Merge pull request #6205 from hashicorp/b-no-golang-29119-workaround logmon: revert workaround for Windows go1.11 bug	2019-08-26 10:52:51 -04:00
Nick Fagerlund	3d9e44d40f	Update middleman-hashicorp container (#6185 )	2019-08-26 09:29:08 -05:00
Mahmood Ali	7c1fe3eae5	logmon: log stat error to help debugging	2019-08-26 10:10:20 -04:00
Mahmood Ali	eb5160427f	Merge pull request #6204 from hashicorp/c-circleci-tweaks-20190824 ci: use circleci/golang images directly	2019-08-26 10:08:14 -04:00
Mahmood Ali	a80643e46d	Don't persist allocs of destroyed alloc runners This fixes a bug where allocs that have been GCed get re-run again after client is restarted. A heavily-used client may launch thousands of allocs on startup and get killed. The bug is that an alloc runner that gets destroyed due to GC remains in client alloc runner set. Periodically, they get persisted until alloc is gced by server. During that time, the client db will contain the alloc but not its individual tasks status nor completed state. On client restart, client assumes that alloc is pending state and re-runs it. Here, we fix it by ensuring that destroyed alloc runners don't persist any alloc to the state DB. This is a short-term fix, as we should consider revamping client state management. Storing alloc and task information in non-transaction non-atomic concurrently while alloc runner is running and potentially changing state is a recipe for bugs. Fixes https://github.com/hashicorp/nomad/issues/5984 Related to https://github.com/hashicorp/nomad/pull/5890	2019-08-25 11:21:28 -04:00
Mahmood Ali	cc3da4a441	logmon: revert workaround for Windows go1.11 bug Revert `e0126123ab` now that we are running with Golang 1.12, and https://github.com/golang/go/issues/29119 is no longer relevant.	2019-08-24 08:19:44 -04:00
Mahmood Ali	b4a80a7eea	Merge pull request #6201 from hashicorp/b-device-stats-interval initialize device manager stats interval	2019-08-24 08:16:03 -04:00
Mahmood Ali	b6bf83ad72	use circleci/golang images directly We currently use an container image for `test-devices` job only; while all other jobs use machine executor. This allows us to switch golang and protoc verions easily without manually managing Docker images (which requires building them manually on a dev machines, etc). All that while, we install dependencies on every build in all other jobs.. `test-devices` now is one of the fastest jobs and isn't a constraint or a bottleneck, so increasing its overhead by few seconds doesn't hurt the overall developer iteration. If we split tests effectively later, we can revisit.	2019-08-23 21:59:49 -04:00
Mahmood Ali	f4571cb9a9	use a new image with proper protoc dependency Fixes `test-devices` job	2019-08-23 21:33:07 -04:00
Mahmood Ali	e87d9cc8a6	Merge pull request #6146 from hashicorp/b-config-template-copy clientConfig.Copy() to copy template config too	2019-08-23 19:00:57 -04:00
Mahmood Ali	e8ebde4ca2	clientConfig.Copy() to copy template config too	2019-08-23 18:43:22 -04:00
Mahmood Ali	a72a0f8832	Merge pull request #5676 from hashicorp/f-b-upgrade-ugorji-dep-20190508 Update ugorji/go to latest	2019-08-23 18:29:49 -04:00
Lang Martin	4877face87	Merge pull request #6203 from hashicorp/b-chroot-setuid-110 exec driver setuid go-getter update	2019-08-23 16:49:41 -04:00
Lang Martin	5fc06cd65f	taskrunner getter set Umask for go-getter, setuid test	2019-08-23 15:59:03 -04:00
Lang Martin	07373be85c	govendor fetch github.com/hashicorp/go-getter@6be654f	2019-08-23 15:59:03 -04:00
Mahmood Ali	01983ae59b	initialize device manager stats interval Fixes a bug where we cpu is pigged at 100% due to collecting devices statistics. The passed stats interval was ignored, and the default zero value causes a very tight loop of stats collection. FWIW, in my testing, it took 2.5-3ms to collect nvidia GPU stats, on a `g2.2xlarge` ec2 instance. The stats interval defaults to 1 second and is user configurable. I believe this is too frequent as a default, and I may advocate for reducing it to a value closer to 5s or 10s, but keeping it as is for now. Fixes https://github.com/hashicorp/nomad/issues/6057 .	2019-08-23 14:58:34 -04:00
Mahmood Ali	0c4718378b	Merge pull request #6200 from hashicorp/r-golang-1.12.9 Update golang to 1.12.9	2019-08-23 14:37:21 -04:00
Tim Gross	c7c8b01122	agent: -dev=connect mode bind to 0.0.0.0 The dev mode flag for connect was binding to the default interface's IP, but this makes for a bad user experience for the CLI which will default to 127.0.0.1. If we bind to 0.0.0.0 instead the CLI will work without further configuration by the user.	2019-08-23 13:51:16 -04:00
Jerome Gravel-Niquet	25e38c8257	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Mahmood Ali	3e1f584495	update circleci builds to use golang 1.12.9	2019-08-23 12:26:47 -04:00
Mahmood Ali	0ccca0ad59	use golang 1.12	2019-08-23 09:44:40 -04:00
Nick Ethier	974ff0392c	ar: fix bridge networking port mapping when port.To is unset (#6190 )	2019-08-22 21:53:52 -04:00
Preetha	28b0650dc9	Bring 0.9.5 changes to changelog on master branch	2019-08-22 17:35:15 -05:00
Buck Doyle	10e675c200	Remove most Netlify configuration (#6194 ) This removes the in-repository Netlify configuration. There are now two sites backed by the repository, so we must use the web UI to control the build settings, as having the configuration in-repository overrides the web UI settings. The build settings for the two sites are below, as of this commit. See the extra step in nomad-ui site’s build step that copies the _redirects file to the correct destination so things are properly forwarded when you visit the deployment. nomad-ui: base directory: ui build command: ember build && mkdir -p ui-dist/ui && mv dist/* ui-dist/ui/ && cp ../.netlify/ui-redirects ui-dist/_redirects publish directory: ui/ui-dist nomad-website: base directory: website build command: bundle exec middleman build publish directory: website/build	2019-08-22 15:54:23 -05:00
Michael Schurter	72193f99be	Merge pull request #6121 from hashicorp/f-connect-bootstrap connect: task hook for bootstrapping envoy sidecar	2019-08-22 10:58:31 -07:00
Michael Schurter	43d89f864e	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Mahmood Ali	5eee9ee59f	Merge pull request #6187 from hashicorp/c-circleci-tweak-20190822 ci: Use more recent base machine executor image for test-rkt	2019-08-22 11:10:05 -04:00
Mahmood Ali	91bccfc83b	ci: Use more recent base machine executor image This fixes a frequent failure in `test-rkt` jobs where dpkg installation fails. The image used currently, circleci/classic:201808-01, has unattended upgrades enabled accidentally, which runs on every build. This means that tools get modified unexpectedly during builds, and apt-get commands may fail as the unattended upgrade is holding package database lock. This updates `test-rkt` job only because the new image breaks `test-docker` job (e.g. https://circleci.com/gh/hashicorp/nomad/2641 ), and I punted on investigating test-docker for another day.	2019-08-22 10:31:57 -04:00
Buck Doyle	2364fb2da1	UI: Add creation time to evaluations table (#6050 )	2019-08-22 08:11:24 -05:00
Danielle	fbddb9281b	Merge pull request #6175 from hashicorp/dani/remove-hidden-vols remove hidden field from host volumes	2019-08-22 08:49:54 +02:00

1 2 3 4 5 ...

15822 Commits