nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-08 03:15:42 +03:00

Author	SHA1	Message	Date
Chris Baker	7050e14eb5	formatting and clarity	2019-06-18 14:00:57 +00:00
Chris Baker	240d68765c	metrics: add namespace label to allocation metrics	2019-06-17 20:50:26 +00:00
Mahmood Ali	7a01a96cc2	Fallback to `alloc.TaskResources` for old allocs When a client is running against an old server (e.g. running 0.8), `alloc.AllocatedResources` may be nil, and we need to check the deprecated `alloc.TaskResources` instead. Fixes https://github.com/hashicorp/nomad/issues/5810	2019-06-11 10:32:53 -04:00
Danielle Lancashire	92527c6b4e	client: Pass servers contacted ch to allocrunner This fixes an issue where batch and service workloads would never be restarted due to indefinitely blocking on a nil channel. It also raises the restoration logging message to `Info` to simplify log analysis.	2019-05-22 13:47:35 +02:00
Michael Schurter	abd809d60a	docs: changelog entry for #5669 and fix comment	2019-05-14 10:54:00 -07:00
Michael Schurter	6a2792ad90	client: do not restart dead tasks until server is contacted (try 2) Refactoring of 104067bc2b2002a4e45ae7b667a476b89addc162 Switch the MarkLive method for a chan that is closed by the client. Thanks to @notnoop for the idea! The old approach called a method on most existing ARs and TRs on every runAllocs call. The new approach does a once.Do call in runAllocs to accomplish the same thing with less work. Able to remove the gate abstraction that did much more than was needed.	2019-05-14 10:53:27 -07:00
Michael Schurter	e7042b674b	client: do not restart dead tasks until server is contacted Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.	2019-05-14 10:53:27 -07:00
Michael Schurter	3c19af4bfd	client: expose allocated memory per task Related to #4280 This PR adds `client.allocs.<job>.<group>.<alloc>.<task>.memory.allocated` as a gauge in bytes to metrics to ease calculating how close a task is to OOMing. ``` 'nomad.client.allocs.memory.allocated.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 268435456.000 'nomad.client.allocs.memory.cache.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 5677056.000 'nomad.client.allocs.memory.kernel_max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000 'nomad.client.allocs.memory.kernel_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000 'nomad.client.allocs.memory.max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8908800.000 'nomad.client.allocs.memory.rss.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 876544.000 'nomad.client.allocs.memory.swap.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000 'nomad.client.allocs.memory.usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8208384.000 ```	2019-05-10 11:12:12 -07:00
Mahmood Ali	979a6a1778	implement client endpoint of nomad exec Add a client streaming RPC endpoint for processing nomad exec tasks, by invoking the relevant task handler for execution.	2019-05-09 16:49:08 -04:00
Michael Schurter	3bad050bf1	client: simplify kill logic Remove runLaunched tracking as Run is always called for killable TaskRunners. TaskRunners which fail before Run can be called (during NewTaskRunner or Restore) are not killable as they're never added to the client's alloc map.	2019-04-04 15:18:33 -07:00
Michael Schurter	7970b224b0	client: emit event and call exited hooks during cleanup Builds upon earlier commit that cleans up restored handles of terminal allocs by also emitting terminated events and calling exited hooks when appropriate.	2019-03-05 15:12:02 -08:00
Michael Schurter	db9daf6631	client: ensure task is cleaned up when terminal This commit is a significant change. TR.Run is now always executed, even for terminal allocations. This was changed to allow TR.Run to cleanup (run stop hooks) if a handle was recovered. This is intended to handle the case of Nomad receiving a DesiredStatus=Stop allocation update, persisting it, but crashing before stopping AR/TR. The commit also renames task runner hook data as it was very easy to accidently set state on Requests instead of Responses using the old field names.	2019-03-01 14:00:23 -08:00
Michael Schurter	cf66e25e57	client: restart on recoverable StartTask errors Fixes restarting on recoverable errors from StartTask. Ports TestTaskRunner_Run_RecoverableStartError from 0.8 which discovered the bug.	2019-02-21 15:30:49 -08:00
Michael Schurter	1d17fbc681	simplify hcl2 parsing helper No need to pass in the entire eval context	2019-02-04 11:07:57 -08:00
Michael Schurter	158c74887e	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Michael Schurter	0d61ff0fb9	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00
Alex Dadgar	72f8851254	Split hclspec	2019-01-22 15:43:34 -08:00
Alex Dadgar	8be7057177	move hclutils	2019-01-22 15:43:34 -08:00
Michael Schurter	7c45733ba4	Merge pull request #5211 from hashicorp/test-porting-08 Port some 0.8 TaskRunner tests	2019-01-22 14:05:53 -08:00
Michael Schurter	06119e2505	test: port TestTaskRunner_CheckWatcher_Restart Added ability to adjust the number of events the TaskRunner keeps as there's no way to observe all events otherwise. Task events differ slightly from 0.8 because 0.9 emits Terminated every time a task exits instead of only when it exits on its own (not due to restart or kill). 0.9 does not emit Killing/Killed for restarts like 0.8 which seems fine as `Restart Signaled/Terminated/Restarting` is more descriptive. Original v0.8 events emitted: ``` expected := []string{ "Received", "Task Setup", "Started", "Restart Signaled", "Killing", "Killed", "Restarting", "Started", "Restart Signaled", "Killing", "Killed", "Restarting", "Started", "Restart Signaled", "Killing", "Killed", "Not Restarting", } ```	2019-01-22 09:46:46 -08:00
Preetha Appan	c3f044291d	Rename TaskKillHook to TaskPreKillHook to more closely match usage Also added/fixed comments	2019-01-22 09:41:56 -06:00
Preetha Appan	bf9a2168e7	Rename TaskKillHook to TaskPreKillHook to more closely match usage Also added/fixed comments	2019-01-22 09:41:21 -06:00
Mahmood Ali	9f7619344e	Merge pull request #5190 from hashicorp/f-memory-usage Track Basic Memory Usage as reported by cgroups	2019-01-18 16:46:02 -05:00
Chris Baker	b43f803d36	set TaskGroupName in task_runner	2019-01-18 20:25:11 +00:00
Michael Schurter	4e4ecc949f	Merge pull request #5203 from hashicorp/b-terminated client: restore Terminated event on every exit	2019-01-18 08:54:15 -08:00
Preetha Appan	eb7663697b	Fix one more place that should be using taskResources taskResources handles new resource fields in a backwards compatible way	2019-01-17 15:52:51 -06:00
Michael Schurter	64e531e7bb	client: restore Terminated event on every exit v0.9.0-dev started emitting a Terminated event every time a task process exited. While this wasn't true in previous versions, it's a useful task event because it's the only place for job operators to view the task's exit code. This behavior is asserted in the e2e/taskevents tests.	2019-01-17 10:02:25 -08:00
Mahmood Ali	b5c20aa50b	Track Basic Memory Usage as reported by cgroups Track current memory usage, `memory.usage_in_bytes`, in addition to `memory.max_memory_usage_in_bytes` and friends. This number is closer what Docker reports. Related to https://github.com/hashicorp/nomad/issues/5165 .	2019-01-14 18:47:52 -05:00
Alex Dadgar	c7fc39d38d	Merge pull request #5168 from hashicorp/b-kill-race Improve Kill handling on task runner	2019-01-09 12:05:10 -08:00
Michael Schurter	e44d51f4d0	Spelling fix Co-Authored-By: dadgar <alex@hashicorp.com>	2019-01-09 11:42:40 -08:00
Mahmood Ali	d1fbd735f3	Merge pull request #5157 from hashicorp/r-drivers-no-cstructs drivers: avoid referencing client/structs package	2019-01-09 13:06:46 -05:00
Alex Dadgar	a5ba15591a	Improve Kill handling on task runner This PR improves how killing a task is handled. Before the kill function directly orchestrated the killing and was only valid while the task was running. The new behavior is to mark the desired state and wait for the task runner to converge to that state.	2019-01-08 16:42:26 -08:00
Michael Schurter	1ae8261139	client: emit Killing/Killed task events We were just emitting Killed/Terminated events before. In v0.8 we emitted Killing/Killed, but lacked Terminated when explicitly stopping a task. This change makes it so Terminated is always included, whether explicitly stopping a task or it exiting on its own. New output: 2019-01-04T14:58:51-08:00 Killed Task successfully killed 2019-01-04T14:58:51-08:00 Terminated Exit Code: 130, Signal: 2 2019-01-04T14:58:51-08:00 Killing Sent interrupt 2019-01-04T14:58:51-08:00 Leader Task Dead Leader Task in Group dead 2019-01-04T14:58:49-08:00 Started Task started by client 2019-01-04T14:58:49-08:00 Task Setup Building Task Directory 2019-01-04T14:58:49-08:00 Received Task received by client Old (v0.8.6) output: 2019-01-04T22:14:54Z Killed Task successfully killed 2019-01-04T22:14:54Z Killing Sent interrupt. Waiting 5s before force killing 2019-01-04T22:14:54Z Leader Task Dead Leader Task in Group dead 2019-01-04T22:14:53Z Started Task started by client 2019-01-04T22:14:53Z Task Setup Building Task Directory 2019-01-04T22:14:53Z Received Task received by client	2019-01-08 07:20:54 -08:00
Mahmood Ali	c0162fab35	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Alex Dadgar	6bb99c93d0	Review comments	2019-01-07 14:50:28 -08:00
Alex Dadgar	144866a87b	Mock driver has recovery, stats	2019-01-07 14:49:40 -08:00
Alex Dadgar	b300306c4a	comments	2019-01-07 14:49:40 -08:00
Alex Dadgar	437f03d877	recover	2019-01-07 14:49:40 -08:00
Mahmood Ali	17b7490891	taskrunner: emit TaskReceived event Preserve pre-0.9, where task runner emits `Received: Task received by client` event on task runner creation.	2019-01-04 14:32:29 -05:00
Alex Dadgar	99df4c98c7	Store device envs separately and pass to drivers	2018-12-19 14:23:09 -08:00
Nick Ethier	6951ca487d	drivermanager: use allocID and task name to route task events	2018-12-18 23:01:51 -05:00
Nick Ethier	331793e283	client: batch initial fingerprinting in plugin manangers drivermanager: fix pr comments/feedback	2018-12-18 22:56:19 -05:00
Nick Ethier	39ca1b00dd	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00
Alex Dadgar	517bf1c35f	Fix unit tests + upgrade pathing resources	2018-12-18 15:50:44 -08:00
Alex Dadgar	d5512c39f0	Lint	2018-12-18 15:50:44 -08:00
Alex Dadgar	7a0b73341a	LinuxResources doesn't use task.Resources	2018-12-18 15:50:44 -08:00
Alex Dadgar	cd6879409c	Drivers	2018-12-18 15:50:11 -08:00
Alex Dadgar	da6925bfc1	utilities	2018-12-18 15:48:52 -08:00
Danielle Tomlinson	b92bc1178d	taskrunner: Use a random suffix for Task Config The RestartCount is not really suitable for use as a source of uniqueness within task invocations as it is not monotonic, and interacts with the restart stanza in a users config, so conflates restarts due to task failures, with restarts due to enviromental changes, such as consul template or vault secrets changing. Here we instead use a substring from a uuid, which is more random than we strictly need, but is nicer than rolling our own random string generator here.	2018-12-19 00:38:54 +01:00
Alex Dadgar	8b624340ad	Fix various bugs with task events Fixes the following: * Emitting events when the task fails to start * Don't double emit events on task shutdown (nomad stop) * Don't emit a OOM kill metric unless actually OOM'd	2018-12-05 14:27:07 -08:00

1 2

95 Commits