nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 08:55:43 +03:00

Author	SHA1	Message	Date
Mahmood Ali	979a6a1778	implement client endpoint of nomad exec Add a client streaming RPC endpoint for processing nomad exec tasks, by invoking the relevant task handler for execution.	2019-05-09 16:49:08 -04:00
Mahmood Ali	43cd7a71a6	aux: helper method that returns token as well as ACL policy This helper returns the token as well as the ACL policy, to be used in a later commit for logging the token info associated with nomad exec invocation.	2019-04-30 10:23:56 -04:00
Lang Martin	33f550fb52	Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config client fingerprinter doesn't overwrite manual configuration	2019-04-26 12:55:34 -04:00
Danielle	91fa55f4cc	Merge pull request #5515 from hashicorp/dani/f-alloc-signal allocs: Add nomad alloc signal command	2019-04-26 14:21:05 +02:00
Danielle Lancashire	7f102bcea8	alloc_signal: Add autcompletion and cmd tests	2019-04-26 12:47:53 +02:00
Mahmood Ali	a321901ad8	retry grpc unavailable errors even if not shutting down	2019-04-25 18:39:17 -04:00
Mahmood Ali	658a734912	try checking process status	2019-04-25 18:16:13 -04:00
Mahmood Ali	1f1551a4ae	add logging about attempts	2019-04-25 18:09:36 -04:00
Mahmood Ali	ba373fee2a	try sleeping for stop signal to take effect	2019-04-25 17:16:29 -04:00
Mahmood Ali	978fc65a2b	add a test that simulates logmon dying during Start() call	2019-04-25 16:41:17 -04:00
Mahmood Ali	b21849cb02	logmon: retry starting logmon if it exits Retry if we detect shutting down during Start() api call is started, locally.	2019-04-25 15:10:16 -04:00
Mahmood Ali	c23d673b7e	logmon client to handle grpc closing errors	2019-04-25 14:32:24 -04:00
Danielle Lancashire	023d0dff31	allocs: Add nomad alloc signal command This command will be used to send a signal to either a single task within an allocation, or all of the tasks if <task-name> is omitted. If the sent signal terminates the allocation, it will be treated as if the allocation has crashed, rather than as if it was operator-terminated. Signal validation is currently handled by the driver itself and nomad does not attempt to restrict or validate them.	2019-04-25 12:43:32 +02:00
Chris Baker	7b4ac71d2f	Merge pull request #5541 from hashicorp/b/5540-bad-client-alloc-metrics client/metrics: fixed stale metrics	2019-04-22 15:07:30 -04:00
Mahmood Ali	151e0ae772	Merge pull request #5577 from hashicorp/dani/b-logmon-unrecoverable logging: Attempt to recover logmon failures	2019-04-22 14:40:24 -04:00
Michael Schurter	0f91277d85	tweak logging level for failed log line Co-Authored-By: notnoop <mahmood@notnoop.com>	2019-04-22 14:40:17 -04:00
Chris Baker	7d8fa4c045	client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy	2019-04-22 18:31:45 +00:00
Lang Martin	583ae3722c	client fingerprinter doesn't overwrite manual configuration Revert "Revert accidental merge of pr #5482" This reverts commit `c45652ab8c`.	2019-04-19 15:23:48 -04:00
Michael Schurter	8a0df4034d	Merge pull request #5583 from ygersie/fingerprint_nilpointer fix nil pointer in fingerprinting AWS env leading to crash	2019-04-19 08:08:59 -07:00
Mahmood Ali	8041b0cbe2	clarify cryptic log line	2019-04-19 09:31:43 -04:00
Mahmood Ali	9a2f46f332	client: log detected driver health state Noticed that `detected drivers` log line was misleading - when a driver doesn't fingerprint before timeout, their health status is empty string `""` which we would mark as detected. Now, we log all drivers along with their state to ease driver fingerprint debugging.	2019-04-19 09:15:25 -04:00
Mahmood Ali	9dcebcd8a3	client: avoid registering node twice right away I noticed that `watchNodeUpdates()` almost immediately after `registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5 seconds. This call is unnecessary and made debugging a bit harder. So here, we ensure that we only re-register node for new node events, not for initial registration.	2019-04-19 09:12:50 -04:00
Mahmood Ali	7a68d76160	client: wait for batched driver updated Here we retain 0.8.7 behavior of waiting for driver fingerprints before registering a node, with some timeout. This is needed for system jobs, as system job scheduling for node occur at node registration, and the race might mean that a system job may not get placed on the node because of missing drivers. The timeout isn't strictly necessary, but raising it to 1 minute as it's closer to indefinitely blocked than 1 second. We need to keep the value high enough to capture as much drivers/devices, but low enough that doesn't risk blocking too long due to misbehaving plugin. Fixes https://github.com/hashicorp/nomad/issues/5579	2019-04-19 09:00:24 -04:00
Yorick Gersie	77a8fda87c	fix nil pointer in fingerprinting AWS env leading to crash HTTP Client returns a nil response if an error has occured. We first need to check for an error before being able to check the HTTP response code.	2019-04-19 11:07:13 +02:00
Danielle Lancashire	269e2c00fb	loggging: Attempt to recover logmon failures Currently, when logmon fails to reattach, we will retry reattachment to the same pid until the task restart specification is exhausted. Because we cannot clear hook state during error conditions, it is not possible for us to signal to a future restart that it _shouldn't_ attempt to reattach to the plugin. Here we revert to explicitly detecting reattachment seperately from a launch of a new logmon, so we can recover from scenarios where a logmon plugin has failed. This is a net improvement over the current hard failure situation, as it means in the most common case (the pid has gone away), we can recover. Other reattachment failure modes where the plugin may still be running could potentially cause a duplicate process, or a subsequent failure to launch a new plugin. If there was a duplicate process, it could potentially cause duplicate logging. This is better than a production workload outage. If there was a subsequent failure to launch a new plugin, it would fail in the same (retry until restarts are exhausted) as the current failure mode.	2019-04-18 13:41:56 +02:00
Michael Schurter	b135d28450	vault: fix data races	2019-04-16 11:22:44 -07:00
Michael Schurter	0e6da17a8f	vault: fix renewal time Renewal time was being calculated as 10s+Intn(lease-10s), so the renewal time could be very rapid or within 1s of the deadline: [10s, lease) This commit fixes the renewal time by calculating it as: (lease/2) +/- 10s For a lease of 60s this means the renewal will occur in [20s, 40s).	2019-04-16 11:22:44 -07:00
Michael Schurter	eeb282ca2f	Merge pull request #5518 from hashicorp/f-simplify-kill client: simplify kill logic	2019-04-15 14:11:58 -07:00
Chris Baker	377c1d694b	vault namespaces: inject VAULT_NAMESPACE alongside VAULT_TOKEN + documentation	2019-04-12 15:06:34 +00:00
Lang Martin	c45652ab8c	Revert accidental merge of pr #5482 Revert "fingerprint Constraints and Affinities have Equals, as set" This reverts commit `596f16fb5f`. Revert "client tests assert the independent handling of interface and speed" This reverts commit `7857ac5993`. Revert "structs missed applying a style change from the review" This reverts commit `658916e327`. Revert "client, structs comments" This reverts commit `be2838d6ba`. Revert "client fingerprint updateNetworks preserves the network configuration" This reverts commit `fc309cb430`. Revert "client_test cleanup comments from review" This reverts commit `bc0bf4efb9`. Revert "client Networks Equals is set equality" This reverts commit `f8d432345b`. Revert "struct cleanup indentation in RequestedDevice Equals" This reverts commit `f4746411ca`. Revert "struct Equals checks for identity before value checking" This reverts commit `0767a4665e`. Revert "fix client-test, avoid hardwired platform dependecy on lo0" This reverts commit `e89dbb2ab1`. Revert "refactor error in client fingerprint to include the offending data" This reverts commit `a7fed726c6`. Revert "add client updateNodeResources to merge but preserve manual config" This reverts commit `84bd433c7e`. Revert "refactor struts.RequestedDevice to have its own Equals" This reverts commit `6897825240`. Revert "refactor structs.Resource.Networks to have its own Equals" This reverts commit `49e2e6c77b`. Revert "refactor structs.Resource.Devices to have its own Equals" This reverts commit `4ede9226bb`. Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources" This reverts commit `49fbaace52`. Revert "add structs.Resources Equals" This reverts commit `8528a2a2a6`. Revert "test that fingerprint resources are updated, net not clobbered" This reverts commit `8ee02ddd23`.	2019-04-11 10:29:40 -04:00
Lang Martin	7857ac5993	client tests assert the independent handling of interface and speed	2019-04-11 09:56:22 -04:00
Lang Martin	be2838d6ba	client, structs comments	2019-04-11 09:56:22 -04:00
Lang Martin	fc309cb430	client fingerprint updateNetworks preserves the network configuration	2019-04-11 09:56:22 -04:00
Lang Martin	bc0bf4efb9	client_test cleanup comments from review	2019-04-11 09:56:22 -04:00
Lang Martin	e89dbb2ab1	fix client-test, avoid hardwired platform dependecy on lo0	2019-04-11 09:56:22 -04:00
Lang Martin	a7fed726c6	refactor error in client fingerprint to include the offending data	2019-04-11 09:56:22 -04:00
Lang Martin	84bd433c7e	add client updateNodeResources to merge but preserve manual config	2019-04-11 09:56:22 -04:00
Lang Martin	8ee02ddd23	test that fingerprint resources are updated, net not clobbered	2019-04-11 09:56:21 -04:00
Danielle Lancashire	419d70c5f9	allocs: Add nomad alloc restart This adds a `nomad alloc restart` command and api that allows a job operator with the alloc-lifecycle acl to perform an in-place restart of a Nomad allocation, or a given subtask.	2019-04-11 14:25:49 +02:00
Chris Baker	2022db72b6	vault client test: minor formatting vendor: using upstream circonus-gometrics	2019-04-10 10:34:10 -05:00
Chris Baker	312721427d	vault e2e: pass vault version into setup instead of having to infer it from test name	2019-04-10 10:34:10 -05:00
Chris Baker	401c9fdd16	taskrunner: removed some unecessary config from a test	2019-04-10 10:34:10 -05:00
Chris Baker	20a3884559	docs: -vault-namespace, VAULT_NAMESPACE, and config agent: added VAULT_NAMESPACE env-based configuration	2019-04-10 10:34:10 -05:00
Chris Baker	e09badbe8b	client: gofmt	2019-04-10 10:34:10 -05:00
Chris Baker	3a28763455	taskrunner: pass configured Vault namespace into TaskTemplateConfig	2019-04-10 10:34:10 -05:00
Chris Baker	1349497152	config/docs: added `namespace` to vault config server/client: process `namespace` config, setting on the instantiated vault client	2019-04-10 10:34:10 -05:00
Michael Schurter	8caa1c5b0d	Bump to 0.9.1-dev	2019-04-09 09:01:48 -07:00
Nomad Release bot	18dd59056e	Generate files for 0.9.0 release	2019-04-09 01:56:00 +00:00
Michael Schurter	3bad050bf1	client: simplify kill logic Remove runLaunched tracking as Run is always called for killable TaskRunners. TaskRunners which fail before Run can be called (during NewTaskRunner or Restore) are not killable as they're never added to the client's alloc map.	2019-04-04 15:18:33 -07:00
Michael Schurter	b51e9e09fc	Remove 0.9.0-rc2 generated files	2019-04-03 07:41:09 -07:00

1 2 3 4 5 ...

3707 Commits