nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-08 11:25:41 +03:00

Author	SHA1	Message	Date
Danielle Lancashire	1250d56333	csi: Add VolumeManager (#6920 ) This changeset is some pre-requisite boilerplate that is required for introducing CSI volume management for client nodes. It extracts out fingerprinting logic from the csi instance manager. This change is to facilitate reusing the csimanager to also manage the node-local CSI functionality, as it is the easiest place for us to guaruntee health checking and to provide additional visibility into the running operations through the fingerprinter mechanism and goroutine. It also introduces the VolumeMounter interface that will be used to manage staging/publishing unstaging/unpublishing of volumes on the host.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	cd0c2a6df0	csi: Setup gRPC Clients with a logger	2020-03-23 13:58:29 -04:00
Danielle Lancashire	d296efd2c6	CSI Plugin Registration (#6555 ) This changeset implements the initial registration and fingerprinting of CSI Plugins as part of #5378. At a high level, it introduces the following: * A `csi_plugin` stanza as part of a Nomad task configuration, to allow a task to expose that it is a plugin. * A new task runner hook: `csi_plugin_supervisor`. This hook does two things. When the `csi_plugin` stanza is detected, it will automatically configure the plugin task to receive bidirectional mounts to the CSI intermediary directory. At runtime, it will then perform an initial heartbeat of the plugin and handle submitting it to the new `dynamicplugins.Registry` for further use by the client, and then run a lightweight heartbeat loop that will emit task events when health changes. * The `dynamicplugins.Registry` for handling plugins that run as Nomad tasks, in contrast to the existing catalog that requires `go-plugin` type plugins and to know the plugin configuration in advance. * The `csimanager` which fingerprints CSI plugins, in a similar way to `drivermanager` and `devicemanager`. It currently only fingerprints the NodeID from the plugin, and assumes that all plugins are monolithic. Missing features * We do not use the live updates of the `dynamicplugin` registry in the `csimanager` yet. * We do not deregister the plugins from the client when they shutdown yet, they just become indefinitely marked as unhealthy. This is deliberate until we figure out how we should manage deploying new versions of plugins/transitioning them.	2020-03-23 13:58:28 -04:00
Mahmood Ali	83b08ab158	tr: proceed to mark other tasks as dead if alloc fails	2020-03-21 17:52:58 -04:00
Mahmood Ali	4558fa6aec	fix test	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	6c1474398f	change jobspec lifecycle stanza to use sidecar attribute instead of block_until status	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	dcd317745d	fix restart policy for system jobs with no lifecycle	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	90fa242d83	fix failing ci test: TestTaskRunner_UnregisterConsul_Retries	2020-03-21 17:52:54 -04:00
Jasmine Dahilig	c6cd7b523b	clean up restart conditions and restart tests for task lifecycle	2020-03-21 17:52:50 -04:00
Jasmine Dahilig	262d204096	incorporate lifecycle into restart tracker	2020-03-21 17:52:40 -04:00
Mahmood Ali	5377b4cb58	Add a coordinator for alloc runners	2020-03-21 17:52:38 -04:00
Fredrik Hoem Grelland	26cca14f27	Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170 )	2020-02-23 16:24:53 +01:00
Mahmood Ali	a3b0b25acb	update rest of consul packages	2020-02-16 16:25:04 -06:00
Seth Hoenig	1f8e31770c	tests: set consul token for nomad client for testing SIDS TR hook	2020-01-31 19:06:15 -06:00
Seth Hoenig	04b526662c	e2e: setup consul ACLs a little more correctly	2020-01-31 19:06:11 -06:00
Seth Hoenig	0f285b840e	tests: skip some SIDS hook tests if running tests as root	2020-01-31 19:05:32 -06:00
Seth Hoenig	08951ac759	client: additional test cases around failures in SIDS hook	2020-01-31 19:05:27 -06:00
Seth Hoenig	91c7dbaa8d	client: PR cleanup - improved logging around kill task in SIDS hook	2020-01-31 19:05:23 -06:00
Seth Hoenig	f8949dde35	client: PR cleanup - shadow context variable	2020-01-31 19:05:19 -06:00
Seth Hoenig	0589b656b7	nomad: make TaskGroup.UsesConnect helper a public helper	2020-01-31 19:05:11 -06:00
Seth Hoenig	40de85867d	client: manage TR kill from parent on SI token derivation failure Re-orient the management of the tr.kill to happen in the parent of the spawned goroutine that is doing the actual token derivation. This makes the code a little more straightforward, making it easier to reason about not leaking the worker goroutine.	2020-01-31 19:05:02 -06:00
Seth Hoenig	1fca495a85	client: set context timeout around SI token derivation The derivation of an SI token needs to be safegaurded by a context timeout, otherwise an unresponsive Consul could cause the siHook to block forever on Prestart.	2020-01-31 19:04:56 -06:00
Seth Hoenig	bbedeb670d	nomad,client: apply more comment/style PR tweaks	2020-01-31 19:04:52 -06:00
Seth Hoenig	cc7b768907	nomad,client: apply smaller PR suggestions Apply smaller suggestions like doc strings, variable names, etc. Co-Authored-By: Nick Ethier <nethier@hashicorp.com> Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2020-01-31 19:04:40 -06:00
Seth Hoenig	d24d470775	comments: cleanup some leftover debug comments and such	2020-01-31 19:04:35 -06:00
Seth Hoenig	e825a0f769	client: skip task SI token file load failure if testing as root The TestEnvoyBootstrapHook_maybeLoadSIToken test case only works when running as a non-priveleged user, since it deliberately tries to read an un-readable file to simulate a failure loading the SI token file.	2020-01-31 19:04:30 -06:00
Seth Hoenig	4b4dfacda5	client: remove unused indirection for referencing consul executable Was thinking about using the testing pattern where you create executable shell scripts as test resources which "mock" the process a bit of code is meant to fork+exec. Turns out that wasn't really necessary in this case.	2020-01-31 19:04:25 -06:00
Seth Hoenig	d85cccc8d0	nomad: fixup token policy validation	2020-01-31 19:04:08 -06:00
Seth Hoenig	6bc6a52f99	client: enable envoy bootstrap hook to set SI token When creating the envoy bootstrap configuration, we should append the "-token=<token>" argument in the case where the sidsHook placed the token in the secrets directory.	2020-01-31 19:04:01 -06:00
Seth Hoenig	674ccaa122	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	f8666bb1f9	client: enable nomad client to request and set SI tokens for tasks When a job is configured with Consul Connect aware tasks (i.e. sidecar), the Nomad Client should be able to request from Consul (through Nomad Server) Service Identity tokens specific to those tasks.	2020-01-31 19:03:38 -06:00
Mahmood Ali	b789b507d1	Merge pull request #6922 from hashicorp/b-alloc-canoncalize Handle Upgrades and Alloc.TaskResources modification	2020-01-28 15:12:41 -05:00
Mahmood Ali	3291523d8c	address review comments	2020-01-15 08:57:05 -05:00
Nick Ethier	4b6f9e800b	Merge pull request #6816 from hashicorp/b-multiple-envoy connect: configure envoy to support multiple sidecars in the same alloc	2020-01-09 23:25:39 -05:00
Mahmood Ali	058076afd0	client: stop using alloc.TaskResources Now that alloc.Canonicalize() is called in all alloc sources in the client (i.e. on state restore and RPC fetching), we no longer need to check alloc.TaskResources. alloc.AllocatedResources is always non-nil through alloc runner. Though, early on, we check for alloc validity, so NewTaskRunner and TaskEnv must still check. `TestClient_AddAllocError` test validates that behavior.	2020-01-09 09:25:07 -05:00
Tim Gross	f31482ae8a	interpolate environment for services in script checks (#6916 ) In 0.10.2 (specifically `387b016`) we added interpolation to group service blocks and centralized the logic for task environment interpolation. This wasn't also added to script checks, which caused a regression where the IDs for script checks for services w/ interpolated fields (ex. the service name) didn't match the service ID that was registered with Consul. This changeset calls the same taskenv interpolation logic during `script_check` configuration, and adds tests to reduce the risk of future regressions by comparing the IDs of service hook and the check hook.	2020-01-09 08:12:54 -05:00
Nick Ethier	55217423c7	tr: initialize envoybootstrap prestart hook response.Env field	2020-01-08 13:41:38 -05:00
Nick Ethier	a44490182b	tr: expose envoy sidecar admin port as environment variable	2020-01-06 21:53:45 -05:00
Nick Ethier	04a1623b85	connect: configure envoy such that multiple sidecars can run in the same alloc	2020-01-06 11:26:27 -05:00
Mahmood Ali	20f8227c0a	Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob driver: allow disabling log collection	2019-12-13 11:41:20 -05:00
Mahmood Ali	e82dad732b	address review comments	2019-12-13 11:21:00 -05:00
Chris Dickson	bbb6b2af09	client: expose allocated CPU per task (#6784 )	2019-12-09 15:40:22 -05:00
Mahmood Ali	943854469d	driver: allow disabling log collection Operators commonly have docker logs aggregated using various tools and don't need nomad to manage their docker logs. Worse, Nomad uses a somewhat heavy docker api call to collect them and it seems to cause problems when a client runs hundreds of log collections. Here we add a knob to disable log aggregation completely for nomad. When log collection is disabled, we avoid running logmon and docker_logger for the docker tasks in this implementation. The downside here is once disabled, `nomad logs ...` commands and API no longer return logs and operators must corrolate alloc-ids with their aggregated log info. This is meant as a stop gap measure. Ideally, we'd follow up with at least two changes: First, we should optimize behavior when we can such that operators don't need to disable docker log collection. Potentially by reverting to using pre-0.9 syslog aggregation in linux environments, though with different trade-offs. Second, when/if logs are disabled, nomad logs endpoints should lookup docker logs api on demand. This ensures that the cost of log collection is paid sparingly.	2019-12-08 14:15:03 -05:00
Preetha	d4f801d188	Merge pull request #6349 from hashicorp/b-host-stats client: Return empty values when host stats fail	2019-11-20 10:13:02 -06:00
Lang Martin	c47d52e865	getter: allow the gcs download scheme (#6692 )	2019-11-19 09:10:56 -05:00
Nick Ethier	387b016ac4	client: improve group service stanza interpolation and check_re… (#6586 ) * client: improve group service stanza interpolation and check_restart support Interpolation can now be done on group service stanzas. Note that some task runtime specific information that was previously available when the service was registered poststart of a task is no longer available. The check_restart stanza for checks defined on group services will now properly restart the allocation upon check failures if configured.	2019-11-18 13:04:01 -05:00
Michael Schurter	0fcb0d4016	client: fix panic from 0.8 -> 0.10 upgrade makeAllocTaskServices did not do a nil check on AllocatedResources which causes a panic when upgrading directly from 0.8 to 0.10. While skipping 0.9 is not supported we intend to fix serious crashers caused by such upgrades to prevent cluster outages. I did a quick audit of the client package and everywhere else that accesses AllocatedResources appears to be properly guarded by a nil check.	2019-11-01 07:47:03 -07:00
Michael Schurter	43909b1374	Revert "Revert "Use joint context to cancel prestart hooks""	2019-10-08 11:34:09 -07:00
Michael Schurter	680e30457f	Revert "Use joint context to cancel prestart hooks"	2019-10-08 11:27:08 -07:00
Drew Bailey	12be12020e	simplify logic to check for vault read event defer shutdown to cleanup after failed run Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> update comment to include ctx note for shutdown	2019-09-30 11:02:14 -07:00

1 2 3 4 5 ...

290 Commits