nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-07 19:05:42 +03:00

Author	SHA1	Message	Date
Michael Lange	5aa938e121	Test coverage for preemption on the allocation detail page	2019-04-22 16:40:09 -07:00
Michael Lange	c7e1598ed3	Preemption modeling as page objects	2019-04-22 16:40:08 -07:00
Michael Lange	d4ae0a2819	Integration test for the alloc row icon	2019-04-22 16:40:07 -07:00
Michael Lange	4c773a1f3c	Add preemption properties to Mirage allocation factory	2019-04-22 16:40:07 -07:00
Michael Lange	4752950cae	Show which allocations an allocation preempted on the alloc page	2019-04-22 16:40:06 -07:00
Michael Lange	400deae4ce	Show which alloc, if any, preempted an alloc on the alloc detail page	2019-04-22 16:40:05 -07:00
Michael Lange	7ae2081282	Preemptions count and filtering on client detail page Show the count in the allocations table next to the existing total alloc count badge. Clicking either will filter by all or by preemptions.	2019-04-22 16:40:04 -07:00
Michael Lange	a33b105181	Add preempted icon to alloc row	2019-04-22 16:40:04 -07:00
Michael Lange	dca386ca70	Make sure tooltips show up over the top of the side bar	2019-04-22 16:40:03 -07:00
Michael Lange	384a0e5a54	Add wasPreempted bool to allocs	2019-04-22 16:40:02 -07:00
Michael Lange	c456c5eed0	Show preemptions on the job plan phase of job submission	2019-04-22 16:40:01 -07:00
Michael Lange	cf1d4a3a1e	Data modeling for preemptions	2019-04-22 16:40:00 -07:00
Chris Baker	09c998a4a1	Merge pull request #5591 from hashicorp/cgbaker/changelog changelog: added entry for #5540 fix	2019-04-22 15:31:22 -04:00
Michael Schurter	95bc6fe301	Merge pull request #5586 from hashicorp/docs-deploy-ver docs: bump deployment guide to 0.9.0	2019-04-22 12:29:22 -07:00
Chris Baker	184e171e11	changelog: added entry for #5540 fix	2019-04-22 19:27:40 +00:00
Chris Baker	7b4ac71d2f	Merge pull request #5541 from hashicorp/b/5540-bad-client-alloc-metrics client/metrics: fixed stale metrics	2019-04-22 15:07:30 -04:00
Mahmood Ali	151e0ae772	Merge pull request #5577 from hashicorp/dani/b-logmon-unrecoverable logging: Attempt to recover logmon failures	2019-04-22 14:40:24 -04:00
Michael Schurter	0f91277d85	tweak logging level for failed log line Co-Authored-By: notnoop <mahmood@notnoop.com>	2019-04-22 14:40:17 -04:00
Chris Baker	7d8fa4c045	client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy	2019-04-22 18:31:45 +00:00
Lang Martin	f5c621979e	tests over setwise equality of fingerprinted parts	2019-04-19 15:49:24 -04:00
Michael Schurter	a3e8f51643	docs: bump deployment guide to 0.9.0	2019-04-19 12:39:38 -07:00
Lang Martin	5c7e10e0b9	structs need to keep assert Equal interface implementation for tests	2019-04-19 15:23:49 -04:00
Lang Martin	228a7d6124	structs equals use labeled continue for clarity	2019-04-19 15:23:48 -04:00
Lang Martin	3e1c6ac890	struct equals use a working pattern for setwise comparison	2019-04-19 15:23:48 -04:00
Lang Martin	583ae3722c	client fingerprinter doesn't overwrite manual configuration Revert "Revert accidental merge of pr #5482" This reverts commit `c45652ab8c`.	2019-04-19 15:23:48 -04:00
Michael Schurter	8a0df4034d	Merge pull request #5583 from ygersie/fingerprint_nilpointer fix nil pointer in fingerprinting AWS env leading to crash	2019-04-19 08:08:59 -07:00
Mahmood Ali	54e1e0760b	Merge pull request #5437 from hashicorp/r-upstream-libcontainer-plain Use upstream libcontainer package	2019-04-19 10:15:13 -04:00
Mahmood Ali	6747195682	comment on using init() for libcontainer handling	2019-04-19 09:49:04 -04:00
Mahmood Ali	9bf54eae97	comment what refer to	2019-04-19 09:49:04 -04:00
Mahmood Ali	b6af5c9dca	Move libcontainer helper to executor package	2019-04-19 09:49:04 -04:00
Mahmood Ali	0088f40fd4	vendor upstream opencontainers/runc	2019-04-19 09:49:04 -04:00
Mahmood Ali	9050f5f611	Merge pull request #5585 from hashicorp/b-drivers-node-registration client: wait for batched driver updates before registering nodes	2019-04-19 09:47:21 -04:00
Mahmood Ali	8041b0cbe2	clarify cryptic log line	2019-04-19 09:31:43 -04:00
Mahmood Ali	9a2f46f332	client: log detected driver health state Noticed that `detected drivers` log line was misleading - when a driver doesn't fingerprint before timeout, their health status is empty string `""` which we would mark as detected. Now, we log all drivers along with their state to ease driver fingerprint debugging.	2019-04-19 09:15:25 -04:00
Mahmood Ali	9dcebcd8a3	client: avoid registering node twice right away I noticed that `watchNodeUpdates()` almost immediately after `registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5 seconds. This call is unnecessary and made debugging a bit harder. So here, we ensure that we only re-register node for new node events, not for initial registration.	2019-04-19 09:12:50 -04:00
Preetha	92a4033a1a	Update CHANGELOG.md	2019-04-19 08:02:48 -05:00
Mahmood Ali	7a68d76160	client: wait for batched driver updated Here we retain 0.8.7 behavior of waiting for driver fingerprints before registering a node, with some timeout. This is needed for system jobs, as system job scheduling for node occur at node registration, and the race might mean that a system job may not get placed on the node because of missing drivers. The timeout isn't strictly necessary, but raising it to 1 minute as it's closer to indefinitely blocked than 1 second. We need to keep the value high enough to capture as much drivers/devices, but low enough that doesn't risk blocking too long due to misbehaving plugin. Fixes https://github.com/hashicorp/nomad/issues/5579	2019-04-19 09:00:24 -04:00
Yorick Gersie	77a8fda87c	fix nil pointer in fingerprinting AWS env leading to crash HTTP Client returns a nil response if an error has occured. We first need to check for an error before being able to check the HTTP response code.	2019-04-19 11:07:13 +02:00
Preetha	83a2e693b7	Merge pull request #5580 from hashicorp/f-api-preemption-info Add preemption related fields to AllocationListStub	2019-04-18 18:38:25 -07:00
Preetha Appan	ad77c18c87	Add preemption related fields to AllocationListStub	2019-04-18 10:36:44 -05:00
Danielle	11388ab992	Merge pull request #5572 from hashicorp/dani/b-docker-volumes Switch to pre-0.9 behaviour for handling volumes	2019-04-18 15:48:23 +02:00
Danielle	4789948ba8	Merge pull request #5573 from hashicorp/dani/update-vol-docs docs: Clarify docker volume behaviour	2019-04-18 14:30:16 +02:00
Danielle Lancashire	ccce364cbd	Switch to pre-0.9 behaviour for handling volumes In Nomad 0.9, we made volume driver handling the same for `""`, and `"local"` volumes. Prior to Nomad 0.9 however these had slightly different behaviour for relative paths and named volumes. Prior to 0.9 the empty string would expand relative paths within the task dir, and `"local"` volumes that are not absolute paths would be treated as docker named volumes. This commit reverts to the previous behaviour as follows: \| Nomad Version \| Driver \| Volume Spec \| Behaviour \| \|------------------------------------------------------------------------- \| all \| "" \| testing:/testing \| allocdir/testing \| \| 0.8.7 \| "local" \| testing:/testing \| "testing" as named volume \| \| 0.9.0 \| "local" \| testing:/testing \| allocdir/testing \| \| 0.9.1 \| "local" \| testing:/testing \| "testing" as named volume \|	2019-04-18 14:28:45 +02:00
Danielle Lancashire	269e2c00fb	loggging: Attempt to recover logmon failures Currently, when logmon fails to reattach, we will retry reattachment to the same pid until the task restart specification is exhausted. Because we cannot clear hook state during error conditions, it is not possible for us to signal to a future restart that it _shouldn't_ attempt to reattach to the plugin. Here we revert to explicitly detecting reattachment seperately from a launch of a new logmon, so we can recover from scenarios where a logmon plugin has failed. This is a net improvement over the current hard failure situation, as it means in the most common case (the pid has gone away), we can recover. Other reattachment failure modes where the plugin may still be running could potentially cause a duplicate process, or a subsequent failure to launch a new plugin. If there was a duplicate process, it could potentially cause duplicate logging. This is better than a production workload outage. If there was a subsequent failure to launch a new plugin, it would fail in the same (retry until restarts are exhausted) as the current failure mode.	2019-04-18 13:41:56 +02:00
Chris Baker	15c64875d1	Merge pull request #5559 from ArangoGutierrez/website_docs_singularity list singularity as a community driver	2019-04-17 12:42:29 -04:00
Charlie Voiselle	4a0da839a9	fixed header level	2019-04-17 10:12:43 -04:00
Danielle Lancashire	acf8ab8665	docs: Clairfy docker volume behaviour	2019-04-17 11:31:55 +02:00
Mahmood Ali	c07b72959d	Merge pull request #5568 from hashicorp/b-nomad-logger-restart Fixes #5566 . Fix a case where docker logging process may lock up nomad agent restart. Looks like we have a case where docker logger is started even through logmon isn't. In such case, the fifo writer blocks indefinitely and because the open operation happens in the main goroutine, nomad agent blocks indefinitely. This fixes the issue where the fifo open operation happens in goroutine instead of main goroutine. We should follow up independently to ensure logmon <-> dockerlogger ordering and consider having task recovery happen in non-main goroutine with some sensible timeouts.	2019-04-16 19:34:37 -04:00
Eduardo Arango	9f97da0956	resolve merge conflicts Signed-off-by: Eduardo Arango <eduardo@sylabs.io>	2019-04-16 17:01:22 -05:00
Eduardo Arango	bd0d641a5e	address @cgbaker comments Signed-off-by: Eduardo Arango <eduardo@sylabs.io>	2019-04-16 16:59:59 -05:00

... 3 4 5 6 7 ...

14885 Commits