nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-06 18:35:44 +03:00

Author	SHA1	Message	Date
Lang Martin	056a1dc2ee	csi add allocation context to fingerprinting results (#7133 ) * structs: CSIInfo include AllocID, CSIPlugins no Jobs * state_store: eliminate plugin Jobs, delete an empty plugin * nomad/structs/csi: detect empty plugins correctly * client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID * client/pluginmanager/csimanager/instance: allocID * client/pluginmanager/csimanager/fingerprint: set AllocID * client/node_updater: split controller and node plugins * api/csi: remove Jobs The CSI Plugin API will map plugins to allocations, which allows plugins to be defined by jobs in many configurations. In particular, multiple plugins can be defined in the same job, and multiple jobs can be used to define a single plugin. Because we now map the allocation context directly from the node, it's no longer necessary to track the jobs associated with a plugin directly. * nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint * client/dynamicplugins: lift AllocID into the struct from Options * api/csi_test: remove Jobs test * nomad/structs/csi: CSIPlugins has an array of allocs * nomad/state/state_store: implement CSIPluginDenormalize * nomad/state/state_store: CSIPluginDenormalize npe on missing alloc * nomad/csi_endpoint_test: defer deleteNodes for clarity * api/csi_test: disable this test awaiting mocks: https://github.com/hashicorp/nomad/issues/7123	2020-03-23 13:58:30 -04:00
Danielle Lancashire	08ad2b97e4	csi: Add /dev mounts to CSI Plugins CSI Plugins that manage devices need not just access to the CSI directory, but also to manage devices inside `/dev`. This commit introduces a `/dev:/dev` mount to the container so that they may do so.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	8a8ed6036d	taskrunner/volume_hook: Cleanup arg order of prepareHostVolumes	2020-03-23 13:58:30 -04:00
Danielle Lancashire	9e1d99e49b	taskrunner/volume_hook: Mounts for CSI Volumes This commit implements support for creating driver mounts for CSI Volumes. It works by fetching the created mounts from the allocation resources and then iterates through the volume requests, creating driver mount configs as required. It's a little bit messy primarily because there's _so_ much terminology overlap and it's a bit difficult to follow.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	01ff8960b5	volume_hook: Loosen validation in host volume prep	2020-03-23 13:58:30 -04:00
Danielle Lancashire	7d044a340f	allocrunner: Push state from hooks to taskrunners This commit is an initial (read: janky) approach to forwarding state from an allocrunner hook to a taskrunner using a similar `hookResources` approach that tr's use internally. It should eventually probably be replaced with something a little bit more message based, but for things that only come from pre-run hooks, and don't change, it's probably fine for now.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	b3a1110c50	csi: Provide plugin-scoped paths during RPCs When providing paths to plugins, the path needs to be in the scope of the plugins container, rather than that of the host. Here we enable that by providing the mount point through the plugin registration and then use it when constructing request target paths.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	1250d56333	csi: Add VolumeManager (#6920 ) This changeset is some pre-requisite boilerplate that is required for introducing CSI volume management for client nodes. It extracts out fingerprinting logic from the csi instance manager. This change is to facilitate reusing the csimanager to also manage the node-local CSI functionality, as it is the easiest place for us to guaruntee health checking and to provide additional visibility into the running operations through the fingerprinter mechanism and goroutine. It also introduces the VolumeMounter interface that will be used to manage staging/publishing unstaging/unpublishing of volumes on the host.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	cd0c2a6df0	csi: Setup gRPC Clients with a logger	2020-03-23 13:58:29 -04:00
Danielle Lancashire	d296efd2c6	CSI Plugin Registration (#6555 ) This changeset implements the initial registration and fingerprinting of CSI Plugins as part of #5378. At a high level, it introduces the following: * A `csi_plugin` stanza as part of a Nomad task configuration, to allow a task to expose that it is a plugin. * A new task runner hook: `csi_plugin_supervisor`. This hook does two things. When the `csi_plugin` stanza is detected, it will automatically configure the plugin task to receive bidirectional mounts to the CSI intermediary directory. At runtime, it will then perform an initial heartbeat of the plugin and handle submitting it to the new `dynamicplugins.Registry` for further use by the client, and then run a lightweight heartbeat loop that will emit task events when health changes. * The `dynamicplugins.Registry` for handling plugins that run as Nomad tasks, in contrast to the existing catalog that requires `go-plugin` type plugins and to know the plugin configuration in advance. * The `csimanager` which fingerprints CSI plugins, in a similar way to `drivermanager` and `devicemanager`. It currently only fingerprints the NodeID from the plugin, and assumes that all plugins are monolithic. Missing features * We do not use the live updates of the `dynamicplugin` registry in the `csimanager` yet. * We do not deregister the plugins from the client when they shutdown yet, they just become indefinitely marked as unhealthy. This is deliberate until we figure out how we should manage deploying new versions of plugins/transitioning them.	2020-03-23 13:58:28 -04:00
Mahmood Ali	83b08ab158	tr: proceed to mark other tasks as dead if alloc fails	2020-03-21 17:52:58 -04:00
Mahmood Ali	4558fa6aec	fix test	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	6c1474398f	change jobspec lifecycle stanza to use sidecar attribute instead of block_until status	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	dcd317745d	fix restart policy for system jobs with no lifecycle	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	90fa242d83	fix failing ci test: TestTaskRunner_UnregisterConsul_Retries	2020-03-21 17:52:54 -04:00
Jasmine Dahilig	c6cd7b523b	clean up restart conditions and restart tests for task lifecycle	2020-03-21 17:52:50 -04:00
Jasmine Dahilig	262d204096	incorporate lifecycle into restart tracker	2020-03-21 17:52:40 -04:00
Mahmood Ali	5377b4cb58	Add a coordinator for alloc runners	2020-03-21 17:52:38 -04:00
Fredrik Hoem Grelland	26cca14f27	Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170 )	2020-02-23 16:24:53 +01:00
Mahmood Ali	a3b0b25acb	update rest of consul packages	2020-02-16 16:25:04 -06:00
Seth Hoenig	1f8e31770c	tests: set consul token for nomad client for testing SIDS TR hook	2020-01-31 19:06:15 -06:00
Seth Hoenig	04b526662c	e2e: setup consul ACLs a little more correctly	2020-01-31 19:06:11 -06:00
Seth Hoenig	0f285b840e	tests: skip some SIDS hook tests if running tests as root	2020-01-31 19:05:32 -06:00
Seth Hoenig	08951ac759	client: additional test cases around failures in SIDS hook	2020-01-31 19:05:27 -06:00
Seth Hoenig	91c7dbaa8d	client: PR cleanup - improved logging around kill task in SIDS hook	2020-01-31 19:05:23 -06:00
Seth Hoenig	f8949dde35	client: PR cleanup - shadow context variable	2020-01-31 19:05:19 -06:00
Seth Hoenig	0589b656b7	nomad: make TaskGroup.UsesConnect helper a public helper	2020-01-31 19:05:11 -06:00
Seth Hoenig	40de85867d	client: manage TR kill from parent on SI token derivation failure Re-orient the management of the tr.kill to happen in the parent of the spawned goroutine that is doing the actual token derivation. This makes the code a little more straightforward, making it easier to reason about not leaking the worker goroutine.	2020-01-31 19:05:02 -06:00
Seth Hoenig	1fca495a85	client: set context timeout around SI token derivation The derivation of an SI token needs to be safegaurded by a context timeout, otherwise an unresponsive Consul could cause the siHook to block forever on Prestart.	2020-01-31 19:04:56 -06:00
Seth Hoenig	bbedeb670d	nomad,client: apply more comment/style PR tweaks	2020-01-31 19:04:52 -06:00
Seth Hoenig	cc7b768907	nomad,client: apply smaller PR suggestions Apply smaller suggestions like doc strings, variable names, etc. Co-Authored-By: Nick Ethier <nethier@hashicorp.com> Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2020-01-31 19:04:40 -06:00
Seth Hoenig	d24d470775	comments: cleanup some leftover debug comments and such	2020-01-31 19:04:35 -06:00
Seth Hoenig	e825a0f769	client: skip task SI token file load failure if testing as root The TestEnvoyBootstrapHook_maybeLoadSIToken test case only works when running as a non-priveleged user, since it deliberately tries to read an un-readable file to simulate a failure loading the SI token file.	2020-01-31 19:04:30 -06:00
Seth Hoenig	4b4dfacda5	client: remove unused indirection for referencing consul executable Was thinking about using the testing pattern where you create executable shell scripts as test resources which "mock" the process a bit of code is meant to fork+exec. Turns out that wasn't really necessary in this case.	2020-01-31 19:04:25 -06:00
Seth Hoenig	d85cccc8d0	nomad: fixup token policy validation	2020-01-31 19:04:08 -06:00
Seth Hoenig	6bc6a52f99	client: enable envoy bootstrap hook to set SI token When creating the envoy bootstrap configuration, we should append the "-token=<token>" argument in the case where the sidsHook placed the token in the secrets directory.	2020-01-31 19:04:01 -06:00
Seth Hoenig	674ccaa122	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	f8666bb1f9	client: enable nomad client to request and set SI tokens for tasks When a job is configured with Consul Connect aware tasks (i.e. sidecar), the Nomad Client should be able to request from Consul (through Nomad Server) Service Identity tokens specific to those tasks.	2020-01-31 19:03:38 -06:00
Mahmood Ali	b789b507d1	Merge pull request #6922 from hashicorp/b-alloc-canoncalize Handle Upgrades and Alloc.TaskResources modification	2020-01-28 15:12:41 -05:00
Mahmood Ali	3291523d8c	address review comments	2020-01-15 08:57:05 -05:00
Nick Ethier	4b6f9e800b	Merge pull request #6816 from hashicorp/b-multiple-envoy connect: configure envoy to support multiple sidecars in the same alloc	2020-01-09 23:25:39 -05:00
Mahmood Ali	058076afd0	client: stop using alloc.TaskResources Now that alloc.Canonicalize() is called in all alloc sources in the client (i.e. on state restore and RPC fetching), we no longer need to check alloc.TaskResources. alloc.AllocatedResources is always non-nil through alloc runner. Though, early on, we check for alloc validity, so NewTaskRunner and TaskEnv must still check. `TestClient_AddAllocError` test validates that behavior.	2020-01-09 09:25:07 -05:00
Tim Gross	f31482ae8a	interpolate environment for services in script checks (#6916 ) In 0.10.2 (specifically `387b016`) we added interpolation to group service blocks and centralized the logic for task environment interpolation. This wasn't also added to script checks, which caused a regression where the IDs for script checks for services w/ interpolated fields (ex. the service name) didn't match the service ID that was registered with Consul. This changeset calls the same taskenv interpolation logic during `script_check` configuration, and adds tests to reduce the risk of future regressions by comparing the IDs of service hook and the check hook.	2020-01-09 08:12:54 -05:00
Nick Ethier	55217423c7	tr: initialize envoybootstrap prestart hook response.Env field	2020-01-08 13:41:38 -05:00
Nick Ethier	a44490182b	tr: expose envoy sidecar admin port as environment variable	2020-01-06 21:53:45 -05:00
Nick Ethier	04a1623b85	connect: configure envoy such that multiple sidecars can run in the same alloc	2020-01-06 11:26:27 -05:00
Mahmood Ali	20f8227c0a	Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob driver: allow disabling log collection	2019-12-13 11:41:20 -05:00
Mahmood Ali	e82dad732b	address review comments	2019-12-13 11:21:00 -05:00
Chris Dickson	bbb6b2af09	client: expose allocated CPU per task (#6784 )	2019-12-09 15:40:22 -05:00
Mahmood Ali	943854469d	driver: allow disabling log collection Operators commonly have docker logs aggregated using various tools and don't need nomad to manage their docker logs. Worse, Nomad uses a somewhat heavy docker api call to collect them and it seems to cause problems when a client runs hundreds of log collections. Here we add a knob to disable log aggregation completely for nomad. When log collection is disabled, we avoid running logmon and docker_logger for the docker tasks in this implementation. The downside here is once disabled, `nomad logs ...` commands and API no longer return logs and operators must corrolate alloc-ids with their aggregated log info. This is meant as a stop gap measure. Ideally, we'd follow up with at least two changes: First, we should optimize behavior when we can such that operators don't need to disable docker log collection. Potentially by reverting to using pre-0.9 syslog aggregation in linux environments, though with different trade-offs. Second, when/if logs are disabled, nomad logs endpoints should lookup docker logs api on demand. This ensures that the cost of log collection is paid sparingly.	2019-12-08 14:15:03 -05:00

1 2 3 4 5 ...

297 Commits