nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Juana De La Cuesta	2944a34b58	Reuse token if it exists on client reconnect (#26604 ) Currently every time a client starts, it creates a new consul token per service or task,. This PR changes the behaviour , it persists consul ACL token to the client state and it starts by looking up a token before creating a new one. Fixes: #20184 Fixes: #20185	2025-09-04 15:27:57 +02:00
tehut	d709accaf5	Add nomad monitor export command (#26178 ) * Add MonitorExport command and handlers * Implement autocomplete * Require nomad in serviceName * Fix race in StreamReader.Read * Add and use framer.Flush() to coordinate function exit * Add LogFile to client/Server config and read NomadLogPath in rpcHandler instead of HTTPServer * Parameterize StreamFixed stream size	2025-08-01 10:26:59 -07:00
Daniel Bennett	49c147bcd7	dynamic host volumes: change env vars, fixup auto-delete (#24943 ) * plugin env: DHV_HOST_PATH->DHV_VOLUMES_DIR * client config: host_volumes_dir * plugin env: add namespace+nodepool * only auto-delete after error saving client state on initial create	2025-01-27 10:36:53 -06:00
Tim Gross	7add04eb0f	refactor: volume request modes to be generic between DHV/CSI (#24896 ) When we implemented CSI, the types of the fields for access mode and attachment mode on volume requests were defined with a prefix "CSI". This gets confusing now that we have dynamic host volumes using the same fields. Fortunately the original was a typedef on string, and the Go API in the `api` package just uses strings directly, so we can change the name of the type without breaking backwards compatibility for the msgpack wire format. Update the names to `VolumeAccessMode` and `VolumeAttachmentMode`. Keep the CSI and DHV specific value constant names for these fields (they aren't currently 1:1), so that we can easily differentiate in a given bit of code which values are valid. Ref: https://github.com/hashicorp/nomad/pull/24881#discussion_r1920702890	2025-01-24 10:37:48 -05:00
Tim Gross	4a65b21aab	dynamic host volumes: send register to client for fingerprint (#24802 ) When we register a volume without a plugin, we need to send a client RPC so that the node fingerprint can be updated. The registered volume also needs to be written to client state so that we can restore the fingerprint after a restart. Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2025-01-08 16:58:58 -05:00
Daniel Bennett	e76f5e0b4c	dynamic host volumes: volume fingerprinting (#24613 ) and expand the demo a bit	2024-12-19 09:25:54 -05:00
Daniel Bennett	05f1cda594	dynamic host volumes: client state (#24595 ) store dynamic host volume creations in client state, so they can be "restored" on agent restart. restore works by repeating the same Create operation as initial creation, and expecting the plugin to be idempotent. this is (potentially) especially important after host restarts, which may have dropped mount points or such.	2024-12-19 09:25:54 -05:00
Daniel Bennett	c2dd97dee7	HostVolumePlugin interface and two implementations (#24497 ) * mkdir: HostVolumePluginMkdir: just creates a directory * example-host-volume: HostVolumePluginExternal: plugin script that does mkfs and mount loopback Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-12-19 09:25:54 -05:00
Tim Gross	6a3803c31e	dynamic host volumes: RPC handlers (#24373 ) This changeset implements the RPC handlers for Dynamic Host Volumes, including the plumbing needed to forward requests to clients. The client-side implementation is stubbed and will be done under a separate PR. Ref: https://hashicorp.atlassian.net/browse/NET-11549	2024-12-19 09:25:54 -05:00
Tim Gross	65ae61249c	CSI: include volume namespace in staging path (#20532 ) CSI volumes are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. The per-allocation paths don't need to be namespaced, because an allocation can only mount volumes from its job's own namespace. Rework the CSI hook tests to have more fine-grained control over the mock on-disk state. Add tests covering upgrades from staging paths missing namespaces. Fixes: https://github.com/hashicorp/nomad/issues/18741	2024-05-13 11:24:09 -04:00
Michael Schurter	23e4b7c9d2	Upgrade go-msgpack to v2 (#20173 ) Replaces #18812 Upgraded with: ``` find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';' find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';' go get go get -v -u github.com/hashicorp/raft-boltdb/v2 go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80 ``` see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0 for details	2024-03-21 11:44:23 -07:00
Tim Gross	45b2c34532	cni: add DNS set by CNI plugins to task configuration (#20007 ) CNI plugins may set DNS configuration, but this isn't threaded through to the task configuration so that we can write it to the `/etc/resolv.conf` file as needed. Add the `AllocNetworkStatus` to the alloc hook resources so they're accessible from the taskrunner. Any DNS entries provided by the user will override these values. Fixes: https://github.com/hashicorp/nomad/issues/11102	2024-02-20 10:17:27 -05:00
Tim Gross	c7c3b3ae33	revoke Consul tokens obtained via WI when alloc stops (#19034 ) Add a `Postrun` and `Destroy` hook to the allocrunner's `consul_hook` to ensure that Consul tokens we've created via WI get revoked via the logout API when we're done with them. Also add the logout to the `Prerun` hook if we've hit an error.	2023-11-09 10:08:09 -05:00
Luiz Aoqui	3ddf1ecf1d	actions: minor bug fixes and improvements (#18904 )	2023-10-31 17:06:02 -04:00
Phil Renaud	8902afe651	Nomad Actions (#18794 ) * Scaffolding actions (#18639) * Task-level actions for job submissions and retrieval * FIXME: Temporary workaround to get ember dev server to pass exec through to 4646 * Update api/tasks.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update command/agent/job_endpoint.go Co-authored-by: Tim Gross <tgross@hashicorp.com> * Diff and copy implementations * Action structs get their own file, diff updates to behave like our other diffs * Test to observe actions changes in a version update * Tests migrated into structs/diff_test and modified with PR comments in mind * APIActionToSTructsAction now returns a new value * de-comment some plain parts, remove unused action lookup * unused param in action converter --------- Co-authored-by: Tim Gross <tgross@hashicorp.com> * New endpoint: job/:id/actions (#18690) * unused param in action converter * backing out of parse_job level and moved toward new endpoint level * Adds taskName and taskGroupName to actions at job level * Unmodified job mock actions tests * actionless job test * actionless job test * Multi group multi task actions test * HTTP method check for GET, cleaner errors in job_endpoint_test * decomment * Actions aggregated at job model level (#18733) * Removal of temporary fix to proxy to 4646 * Run Action websocket endpoint (#18760) * Working demo for review purposes * removal of cors passthru for websockets * Remove job_endpoint-specific ws handlers and aimed at existing alloc exec handlers instead * PR comments adressed, no need for taskGroup pass, better group and task lookups from alloc * early return in action validate and removed jobid from req args per PR comments * todo removal, we're checking later in the rpc * boolean style change on tty * Action CLI command (#18778) * Action command init and stuck-notes * Conditional reqpath to aim at Job action endpoint * De-logged * General CLI command cleanup, observe namespace, pass action as string, get random alloc w group adherence * tab and varname cleanup * Remove action param from Allocations().Exec calls * changelog * dont nil-check acl --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-10-20 13:05:55 -04:00
James Rasell	ca9e08e6b5	monitor: add log include location option on monitor CLI and API (#18795 )	2023-10-20 07:55:22 +01:00
Piotr Kazmierczak	5dab41881b	client: new consul_hook (#18557 ) This PR introduces a new allocrunner-level consul_hook which iterates over services and tasks, if their provider is consul, fetches consul tokens for all of them, stores them in AllocHookResources and in task secret dirs. Ref: hashicorp/team-nomad#404 --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-09-29 17:41:48 +02:00
Piotr Kazmierczak	86d2cdcf80	client: split identity_hook across allocrunner and taskrunner (#18431 ) This commit splits identity_hook between the allocrunner and taskrunner. The allocrunner-level part of the hook signs each task identity, and the taskrunner-level part picks it up and stores secrets for each task. The code revamps the WIDMgr, which is now split into 2 interfaces: IdentityManager which manages renewals of signatures and handles sending updates to subscribers via Watch method, and IdentitySigner which only does the signing. This work is necessary for having a unified Consul login workflow that comes with the new Consul integration. A new, allocrunner-level consul_hook will now be the only hook doing Consul authentication.	2023-09-21 17:31:27 +02:00
Daniel Bennett	4895d708b4	csi: implement NodeExpandVolume (#18522 ) following ControllerExpandVolume in `c6dbba7cde`, which expands the disk at e.g. a cloud vendor, the controller plugin may say that we also need to issue NodeExpandVolume for the node plugin to make the new disk space available to task(s) that have claims on the volume by e.g. expanding the filesystem on the node. csi spec: https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#nodeexpandvolume	2023-09-18 10:30:15 -05:00
Daniel Bennett	c6dbba7cde	csi: implement ControllerExpandVolume (#18359 ) the first half of volume expansion, this allows a user to update requested capacity ("capacity_min" and "capacity_max") in a volume specification file, and re-issue either Register or Create volume commands (or api calls). the requested capacity will now be "reconciled" with the current real capacity of the volume, issuing a ControllerExpandVolume RPC call to a running controller plugin, if requested "capacity_min" is higher than the current capacity on the volume in state. csi spec: https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#controllerexpandvolume note: this does not yet cover NodeExpandVolume	2023-09-14 14:13:04 -05:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Seth Hoenig	a4cc76bd3e	numa: enable numa topology detection (#18146 ) * client: refactor cgroups management in client * client: fingerprint numa topology * client: plumb numa and cgroups changes to drivers * client: cleanup task resource accounting * client: numa client and config plumbing * lib: add a stack implementation * tools: remove ec2info tool * plugins: fixup testing for cgroups / numa changes * build: update makefile and package tests and cl	2023-08-10 17:05:30 -05:00
Tim Gross	deae9bb62e	client: send node secret with every client-to-server RPC (#16799 ) In Nomad 1.5.3 we fixed a security bug that allowed bypass of ACL checks if the request came thru a client node first. But this fix broke (knowingly) the identification of many client-to-server RPCs. These will be now measured as if they were anonymous. The reason for this is that many client-to-server RPCs do not send the node secret and instead rely on the protection of mTLS. This changeset ensures that the node secret is being sent with every client-to-server RPC request. In a future version of Nomad we can add enforcement on the server side, but this was left out of this changeset to reduce risks to the safe upgrade path. Sending the node secret as an auth token introduces a new problem during initial introduction of a client. Clients send many RPCs concurrently with `Node.Register`, but until the node is registered the node secret is unknown to the server and will be rejected as invalid. This causes permission denied errors. To fix that, this changeset introduces a gate on having successfully made a `Node.Register` RPC before any other RPCs can be sent (except for `Status.Ping`, which we need earlier but which also ignores the error because that handler doesn't do an authorization check). This ensures that we only send requests with a node secret already known to the server. This also makes client startup a little easier to reason about because we know `Node.Register` must succeed first, and it should make for a good place to hook in future plans for secure introduction of nodes. The tradeoff is that an existing client that has running allocs will take slightly longer (a second or two) to transition to ready after a restart, because the transition in `Node.UpdateStatus` is gated at the server by first submitting `Node.UpdateAlloc` with client alloc updates.	2023-06-22 11:06:49 -04:00
hashicorp-copywrite[bot]	8597061b52	[COMPLIANCE] Add Copyright and License Headers (#17429 ) Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-06-05 13:23:59 -04:00
Tim Gross	893d4a77c8	prioritized client updates (#17354 ) The allocrunner sends several updates to the server during the early lifecycle of an allocation and its tasks. Clients batch-up allocation updates every 200ms, but experiments like the C2M challenge has shown that even with this batching, servers can be overwhelmed with client updates during high volume deployments. Benchmarking done in #9451 has shown that client updates can easily represent ~70% of all Nomad Raft traffic. Each allocation sends many updates during its lifetime, but only those that change the `ClientStatus` field are critical for progressing a deployment or kicking off a reschedule to recover from failures. Add a priority to the client allocation sync and update the `syncTicker` receiver so that we only send an update if there's a high priority update waiting, or on every 5th tick. This means when there are no high priority updates, the client will send updates at most every 1s instead of 200ms. Benchmarks have shown this can reduce overall Raft traffic by 10%, as well as reduce client-to-server RPC traffic. This changeset also switches from a channel-based collection of updates to a shared buffer, so as to split batching from sending and prevent backpressure onto the allocrunner when the RPC is slow. This doesn't have a major performance benefit in the benchmarks but makes the implementation of the prioritized update simpler. Fixes: #9451	2023-05-31 15:34:16 -04:00
Luiz Aoqui	ee5a08dbb2	Revert "hashicorp/go-msgpack v2 (#16810 )" (#17047 ) This reverts commit `8a98520d56`.	2023-05-01 17:18:34 -04:00
Ian Fijolek	8a98520d56	hashicorp/go-msgpack v2 (#16810 ) * Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0 Fixes #16808 * Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack * deps: use go-msgpack v2.0.0 go-msgpack v2.1.0 includes some code changes that we will need to investigate furthere to assess its impact on Nomad, so keeping this dependency on v2.0.0 for now since it's no-op. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-17 17:02:05 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	f3fc54adcf	CSI: set mounts in alloc hook resources atomically (#16722 ) The allocrunner has a facility for passing data written by allocrunner hooks to taskrunner hooks. Currently the only consumers of this facility are the allocrunner CSI hook (which writes data) and the taskrunner volume hook (which reads that same data). The allocrunner hook for CSI volumes doesn't set the alloc hook resources atomically. Instead, it gets the current resources and then writes a new version back. Because the CSI hook is currently the only writer and all readers happen long afterwards, this should be safe but #16623 shows there's some sequence of events during restore where this breaks down. Refactor hook resources so that hook data is accessed via setters and getters that hold the mutex.	2023-04-03 11:03:36 -04:00
Seth Hoenig	b2861f2a9b	client: add support for checks in nomad services This PR adds support for specifying checks in services registered to the built-in nomad service provider. Currently only HTTP and TCP checks are supported, though more types could be added later.	2022-07-12 17:09:50 -05:00
Seth Hoenig	b242957990	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Tim Gross	03a8d72dba	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
Alessandro De Blasis	759397533a	metrics: added `mapped_file` metric (#11500 ) Signed-off-by: Alessandro De Blasis <alex@deblasis.net> Co-authored-by: Nate <37554478+servusdei2018@users.noreply.github.com>	2022-01-10 15:35:19 -05:00
Nomad Release Bot	4cf5545532	remove generated files	2021-11-24 18:54:50 +00:00
Nomad Release bot	0e11f8d517	Generate files for 1.2.0 release	2021-11-15 23:00:30 +00:00
Florian Apolloner	5624603893	Fixed creation of ControllerCreateVolumeRequest. (#11238 )	2021-10-06 10:17:39 -04:00
Grant Griffiths	cba476eae6	CSI ListSnapshots secrets implementation Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2021-07-28 11:30:29 -07:00
Nomad Release Bot	9caf5e2881	remove generated files	2021-06-10 08:04:25 -04:00
Nomad Release bot	21465a592d	Generate files for 1.1.1 release	2021-06-10 08:04:25 -04:00
Tim Gross	9a8c68f6cd	csi: accept list of caps during validation in volume register When `nomad volume create` was introduced in Nomad 1.1.0, we changed the volume spec to take a list of capabilities rather than a single capability, to meet the requirements of the CSI spec. When a volume is registered via `nomad volume register`, we should be using the same fields to validate the volume with the controller plugin.	2021-06-04 07:57:26 -04:00
Ryan Sundberg	97af3e9a4c	CSI: Include MountOptions in capabilities sent to CSI for all RPCs Include the VolumeCapability.MountVolume data in ControllerPublishVolume, CreateVolume, and ValidateVolumeCapabilities RPCs sent to the CSI controller. The previous behavior was to only include the MountVolume capability in the NodeStageVolume request, which on some CSI implementations would be rejected since the Volume was not originally provisioned with the specific mount capabilities requested.	2021-05-24 10:59:54 -04:00
Tim Gross	730c22656b	CSI: volume snapshot	2021-04-01 11:16:52 -04:00
Tim Gross	d7e80380de	CSI: HTTP handlers for create/delete/list	2021-03-31 16:37:09 -04:00
Tim Gross	41258e54de	CSI: create/delete/list volume RPCs This commit implements the RPC handlers on the client that talk to the CSI plugins on that client for the Create/Delete/List RPC.	2021-03-31 16:37:09 -04:00
Kris Hicks	85ed8ddd4f	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
Tim Gross	103d873ebe	csi: support for VolumeContext and VolumeParameters (#7957 ) The MVP for CSI in the 0.11.0 release of Nomad did not include support for opaque volume parameters or volume context. This changeset adds support for both. This also moves args for ControllerValidateCapabilities into a struct. The CSI plugin `ControllerValidateCapabilities` struct that we turn into a CSI RPC is accumulating arguments, so moving it into a request struct will reduce the churn of this internal API, make the plugin code more readable, and make this method consistent with the other plugin methods in that package.	2020-05-15 08:16:01 -04:00
Tim Gross	c514a5527a	csi: refactor internal client field name to ExternalID (#7958 ) The CSI plugins RPCs require the use of the storage provider's volume ID, rather than the user-defined volume ID. Although changing the RPCs to use the field name `ExternalID` risks breaking backwards compatibility, we can use the `ExternalID` name internally for the client and only use `VolumeID` at the RPC boundaries.	2020-05-14 11:56:07 -04:00
Tim Gross	a28f18ea1d	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Mahmood Ali	88808b2d30	When serializing msgpack, only consider codec tag When serializing structs with msgpack, only consider type tags of `codec`. Hashicorp/go-msgpack (based on ugorji/go) defaults to interpretting `codec` tag if it's available, but falls to using `json` if `codec` isn't present. This behavior is surprising in cases where we want to serialize json differently from msgpack, e.g. serializing `ConsulExposeConfig`.	2020-05-11 14:14:10 -04:00
Mahmood Ali	1fd22623cd	Harmonize go-msgpack/codec/codecgen Use v1.1.5 of go-msgpack/codec/codecgen, so go-msgpack codecgen matches the library version. We branched off earlier to pick up `f51b518921` , but apparently that's not needed as we could customize the package via `-c` argument.	2020-04-28 17:12:31 -04:00

1 2 3 4

151 Commits