nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	c1dc9ed75d	CSI: don't overwrite context with empty value from request (#24922 ) When a volume is updated, we merge the new definition to the old. But the volume's context comes from the plugin and is likely not present in the user's volume specification. Which means that if the user re-submits the volume specification to make an adjustment to the volume, we wipe out the context field which might be required for subsequent operations by the CSI plugin. This was discovered to be a problem with the Terraform provider and fixed there, but it's also a problem for users of the `volume create` and `volume register` commands. Update the merge so that we only overwrite the value of the context if it's been explictly set by the user. We still need to support user-driven updates to context for the `volume register` workflow. Ref: https://github.com/hashicorp/terraform-provider-nomad/pull/503 Fixes: https://github.com/democratic-csi/democratic-csi/issues/438	2025-01-23 14:06:32 -05:00
Michael Smithhisler	5befea62b7	event stream: adds ability to authenticate using Workload Identity (#24849 )	2025-01-23 11:49:54 -05:00
Michael Smithhisler	d621211108	auth: adds option to enable verbose logging during sso (#24892 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2025-01-23 11:40:01 -05:00
Tim Gross	3e7adba8f0	volume spec: fix `access_mode` field in examples (#24911 ) The `volume init` command creates example volume specifications. But one of the values for `capability.access_mode` is not a valid value. Correct the example to match the validation logic.	2025-01-22 09:30:49 -05:00
Juana De La Cuesta	687335639b	fix: add a dependency to avoid terraform errors when generating ssh keys (#24912 )	2025-01-22 11:36:03 +01:00
Piotr Kazmierczak	3d7e4fd634	client: always initialize node.HostVolumes map (#24910 ) The default node configuration in the client should always set an empty HostVolumes map. Otherwise callers can panic, e.g.,: goroutine 179 [running]: github.com/hashicorp/nomad/client/hostvolumemanager.UpdateVolumeMap({0x36042b0, 0xc000c62a80}, 0x0, {0xc000a802a0, 0xd}, 0xc000691940) github.com/hashicorp/nomad/client/hostvolumemanager/volume_fingerprint.go:43 +0x1b2 github.com/hashicorp/nomad/client.(Client).batchFirstFingerprints.func1({0xc000a802a0, 0xd}, 0xc000691940) github.com/hashicorp/nomad/client/node_updater.go:54 +0xd7 github.com/hashicorp/nomad/client.(batchNodeUpdates).batchHostVolumeUpdates(0xc000912608?, 0xc0009f2f88) github.com/hashicorp/nomad/client/node_updater.go:417 +0x152 github.com/hashicorp/nomad/client.(*Client).batchFirstFingerprints(0xc000c2d188) github.com/hashicorp/nomad/client/node_updater.go:53 +0x1c5 created by github.com/hashicorp/nomad/client.NewClient in goroutine 1 github.com/hashicorp/nomad/client/client.go:557 +0x2069 is a panic of the HVM when restarting a client that doesn't have any static host volumes, but does have a dynamic host volume.	2025-01-21 20:45:04 +01:00
Piotr Kazmierczak	ebffcce378	stateful deployments: remove CSIVolumeIDs (#24908 )	2025-01-21 17:00:55 +01:00
Juana De La Cuesta	039da61d8f	[F-net-11478] Make keys directory cluster grouped (#24883 ) * func: make windows arch dependant * func: unify keys and make them cluster grouped * Update README.md * Update e2e/terraform/provision-infra/provision-nomad/variables.tf Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update .gitignore * style: add an output with the custer identifier --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-01-20 10:18:38 +01:00
Piotr Kazmierczak	af4f31fc5f	deps: upgrade go-getter to 1.7.8 (#24832 )	2025-01-20 09:49:12 +01:00
Tim Gross	44d9c2a3d1	dynamic host volumes: enforce exclusive access in plan apply (#24881 ) Some dynamic host volumes are claimed by allocations with the capability we borrowed from CSI called `single-node-single-writer`, which says only one allocation can use the volume, and it can use it in read/write mode. We enforce this in the scheduler, but if evaluations for different jobs were to be processed concurrently by the scheduler, it's possible to get plans that would fail to enforce this requirement. Add a check in the plan applier to ensure that non-terminal allocations have exclusive access when requested.	2025-01-17 15:38:33 -05:00
Michael Schurter	63dacd2d6e	update vault token warning from 1.9->1.10 (#24884 ) Fixes #24847	2025-01-17 10:56:06 -08:00
Tim Gross	96e539ee87	dynamic host volumes quotas (#24871 ) Allow users to configure a host volumes quota in MB. This will be enforced at the time of provisioning via create/register RPCs. This changeset is the CE version of ENT/2114. Ref: https://github.com/hashicorp/nomad-enterprise/pull/2114 Ref: https://hashicorp.atlassian.net/browse/NET-11549	2025-01-17 11:41:56 -05:00
Daniel Bennett	4807e74ea2	dynamic host volumes: serialize ops per volume (#24852 ) let only one of any create/register/delete run at a time per volume ID * plugins can assume that Nomad will not run concurrent operations for the same volume * we avoid interleaving client RPCs with raft writes	2025-01-17 10:37:09 -06:00
Tim Gross	33c68dcc58	docs: clarify workload-associated policy parameters (#24882 ) Workload-associated ACL policies can only be set on a specific job within a namespace, not the namespace as a whole. Clarify the documentation for the CLI and API. Fixes: https://github.com/hashicorp/terraform-provider-nomad/issues/500 Ref: https://github.com/hashicorp/terraform-provider-nomad/pull/504	2025-01-17 10:51:33 -05:00
James Rasell	63ea13be77	agent: Ensure logger set up method is public. (#24886 ) This is needed by a Nomad Enterprise code path.	2025-01-17 13:47:06 +00:00
Tim Gross	1df94b1470	E2E: refactor volume_mounts test (#24876 ) The volume_mounts test is flaky due to slow starts from the exec-driver and some incorrect wait code. Refactor the volume_mounts test to use the `e2e/v3` package helpers, and use these to give it enough time to start the exec tasks.	2025-01-17 08:31:50 -05:00
James Rasell	753f752cdd	agent: remove unused log filter and unrequired library. (#24873 ) The Nomad agent used a log filter to ensure logs were written at the expected level. Since the use of hclog this is not required, as hclog acts as the gate keeper and filter for logging. All log writers accept messages from hclog which has already done the filtering.	2025-01-17 07:51:27 +00:00
Brian McClain	b4cc5d88e7	docs: update install command for Fedora to match install page (#24870 )	2025-01-16 13:39:56 -05:00
James Rasell	03cbe7cd71	server: Fix error message format when detailing cluster metadata. (#24874 )	2025-01-16 13:54:42 +00:00
James Rasell	1ae9785f9b	agent: Fix a bug where all syslog lines are notice when using JSON (#24865 ) The agent syslog write handler was unable to handle JSON log lines correctly, meaning all syslog entries when using JSON log format showed as NOTICE level. This change adds a new handler to the Nomad agent which can parse JSON log lines and correctly understand the expected log level entry. The change also removes the use of a filter from the default log format handler. This is not needed as the logs are fed into the syslog handler via hclog, which is responsible for level filtering.	2025-01-16 07:23:08 +00:00
Tim Gross	46bd0b1716	dynamic host volume: set default capability (#24857 ) We can reduce the amount of volume specification configuration many users will need by setting a default capability on a dynamic host volume if none is set. The default capability will allow using the volume in read/write mode on its node, with no further restrictions except those that might be set in the jobspec.	2025-01-15 14:07:07 -05:00
Tim Gross	044784b2fb	dynamic host volumes: move node pool governance to placement filter (CE) (#24867 ) Enterprise governance checks happen after dynamic host volumes are placed, so if node pool governance is active and you don't set a node pool or node ID for a volume, it's possible to get a placement that fails node pool governance even though there might be other nodes in the cluster that would be valid placements. Move the node pool governance for host volumes into the placement path, so that we're checking a specific node pool when node pool or node ID are set, but otherwise filtering out candidate nodes by node pool. This changset is the CE version of ENT/2200. Ref: https://hashicorp.atlassian.net/browse/NET-11549 Ref: https://github.com/hashicorp/nomad-enterprise/pull/2200	2025-01-15 14:04:18 -05:00
Tim Gross	a292ecc621	dynamic host volumes: allow for node pool and plugin ID changes (#24851 ) Update dynamic host volume validation and update logic to allow for changes to the node pool and plugin ID. If the client's node pool changes we'll sync up the correct node pool for the volumes already placed on that client. We'll also allow the plugin ID to be changed to allow for new versions of plugins supporting the same volume over time.	2025-01-15 13:40:42 -05:00
James Rasell	75d0ac657e	ui: Fill service check background object for pending checks. (#24818 )	2025-01-15 15:27:10 +00:00
James Rasell	8d201a82fd	agent: Fixed a bug where syslog error messages marked as notice. (#24820 ) The mapping between Nomad log level identifiers and syslog priorities did not handle the error level string correctly.	2025-01-15 08:02:53 +00:00
James Rasell	689f935e0a	services: Support TLS Skip Verify within Nomad service checks. (#24781 ) Checks within a service using the Nomad provider can now utilise the `tls_skip_verify` parameter.	2025-01-15 07:39:39 +00:00
Michael Schurter	0438294f69	Merge pull request #24858 from hashicorp/post-1.9.5-release Post 1.9.5 release	2025-01-14 13:11:58 -08:00
Michael Schurter	925d2dbaed	actually update backport changelog	2025-01-14 12:56:37 -08:00
Michael Schurter	cd40bf9958	update changelog	2025-01-14 12:35:04 -08:00
hc-github-team-nomad-core	3eea4076c3	Prepare for next release	2025-01-14 12:31:19 -08:00
hc-github-team-nomad-core	b40200cefd	Generate files for 1.9.5 release	2025-01-14 12:31:18 -08:00
Tim Gross	6ea40cbfb2	E2E: dynamic host volumes test reliability (#24854 ) The nightly runs for E2E have been failing the recently added dynamic host volumes tests for a number of reasons: * Adding timing logs to the tests shows that it can take over 5s (the original test timeout) for the client fingerprint to show up on the client. This seems like a lot but seems to be host-dependent because it's much faster locally. Extend the timeout and leave in the timing logs so that we can keep an eye on this problem in the future. * The register test doesn't wait for the dispatched job to complete, and the dispatched job was actually broken when TLS was in use because we weren't using the Task API socket. Fix the jobspec for the dispatched job and add waiting for the dispatched allocation to be marked complete before checking for the volume on the server. I've also change both the mounter jobs to batch workloads, so that we don't have to wait 10s for the deployment to complete.	2025-01-14 12:26:31 -05:00
Tim Gross	ef366ee166	E2E: update .gitignore files to avoid committing runtime files (#24855 ) In #24694 we did a major refactoring of the E2E Terraform configuration. After deploying a cluster this morning, I noticed a few moved/removed files were not reflected in the .gitignore files. This changeset updates the .gitignore to have no unstaged files after applying.	2025-01-14 12:16:01 -05:00
Tim Gross	87f1427d9e	dynamic host volumes: capacity_max may be unset during register (#24850 ) When using the register workflow, `capacity_max` is ignored so is likely unset. If the volume is then updated later, the check we had for valid updates assumes that the value was previously. Only perform this check if the value is set.	2025-01-13 16:05:25 -05:00
Daniel Bennett	985eb53c65	dynamic host volumes: plugin spec tweaks (#24848 ) * prefix plugin env vars with DHV_ * add env: DHV_VOLUME_ID, DHV_PLUGIN_DIR * 5s timeout on fingerprint calls	2025-01-13 14:18:10 -06:00
Tim Gross	203a6533bb	API: host volume access modes should match list in structs package (#24838 ) We changed the list of access modes available for dynamic host volumes in #24705 but neglected to change them in the API package. Update the API package to match. Ref: https://github.com/hashicorp/nomad/pull/24705	2025-01-13 15:00:22 -05:00
Juana De La Cuesta	b29a3736a4	Update e2e infra provision to expect providers (#24694 ) * func: move infra provisionining to a module and remove providers * func: update paths * func: update more paths * func: update path inside bootstrap scrip * style: remove debug prints on bootstrap scripts * Delete e2e/terraform/csi/input/volume-efs.hcl * fix: update keys path to use module path instead pf root * fix: add missing headers * fix: update keys directory inside provision-nomad * style; format hcl files * Update compute.tf * Update e2e/terraform/main.tf Co-authored-by: Tim Gross <tgross@hashicorp.com> * Update e2e/terraform/provision-infra/compute.tf Co-authored-by: Tim Gross <tgross@hashicorp.com> * fix: update more paths * fix: fmt hcl files * func: final paths revision for running e2e locally * fix: make path of certs relative to module for the bootstrap * func: final paths revision for running e2e locally * Update network.tf * fix: fix typo and add success message * fix: remove the test name from token to avoid long names and use name for vol to avoid colisions * func: unify the uploads folder * func: make the uploads file one per cluster * func: Add outputs with all data necessary to connect to the cluster * fix: make nomad token a sensitive output * Update bootstrap-nomad.sh --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-01-13 15:59:40 +01:00
Tim Gross	3a11a0b1e1	quotas: refactor storage limit specification (#24785 ) In anticipation of having quotas for dynamic host volumes, we want the user experience of the storage limits to feel integrated with the other resource limits. This is currently prevented by reusing the `Resources` type instead of having a specific type for `QuotaResources`. Update the quota limit/usage types to use a `QuotaResources` that includes a new storage resources quota block. The wire format for the two types are compatible such that we can migrate the existing variables limit in the FSM. Also fixes improper parallelism in the quota init test where we change working directory to avoid file write conflicts but this breaks when multiple tests are executed in the same process. Ref: https://github.com/hashicorp/nomad-enterprise/pull/2096	2025-01-13 09:25:00 -05:00
Tim Gross	cca9a5320d	testing: fix test flake in dynamic host volume client tests (#24836 ) The output of `GetDynamicHostVolumes` is a slice but that slice is constructed from iterating over a map and isn't sorted. Sort the output in the test to eliminate a test flake.	2025-01-10 14:48:05 -05:00
Aimee Ukasick	ffb34319d5	Docs SEO: Update Configuration section to improve search (#24759 ) * Docs SEO: Update Configuration section to improve search engine opt CE-775 * Add enterprise only back to audit * Update descriptions and add intro paragraph * Fix typo * replace "below" and "see" * Apply suggestions from code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2025-01-10 11:05:23 -06:00
Tim Gross	a7b5970d49	dynamic host volumes: cleanup comments (#24830 ) Some comment cleanups as we're wrapping up dynamic host volumes work: * We're not going to implement mount_options for host volumes, as the dynamic host volumes don't have the equivalent of the stage/publish phase that CSI volumes do. Users who want that sort of thing will pass them as `parameter` field during volume create/register. * The scheduler feasibility check prevents a dynamic host volume being claimed by a job in the wrong namespace, but the comment incorrectly identifies that code path as only being about the race between fingerprint and delete. Update the comment to make the intent clear so that we don't accidentally remove this behavior in the future.	2025-01-10 11:30:47 -05:00
Michael Smithhisler	606ce9dd90	deps: upgrade aws-sdk-go from v1 to v2 (#24720 )	2025-01-09 17:27:19 -05:00
Mitch Pronschinske	b050c73a6d	Update who-uses-nomad.mdx (#24815 ) * Update who-uses-nomad.mdx Our new contract with Roblox states that we can't mention anywhere on our sites that they use us. * Update who-uses-nomad.mdx Edited the sentence above the companies list to more accurately reflect them. Also added Target to the list with a link to their case study.	2025-01-09 09:05:26 -06:00
Tim Gross	997358d855	E2E: dynamic host volumes workflows (#24816 ) Initial end-to-end tests for dynamic host volumes. This includes tests for two workflows: * One where a dynamic host volume is created by a plugin and then mounted by a job. * Another where a dynamic host volume is created out-of-band and registered by a job, then mounted by another job. This changeset also moves the existing `volumes` E2E test package to the better-named `volume_mounts`. Ref: https://hashicorp.atlassian.net/browse/NET-11551	2025-01-09 08:41:22 -05:00
Tim Gross	4a65b21aab	dynamic host volumes: send register to client for fingerprint (#24802 ) When we register a volume without a plugin, we need to send a client RPC so that the node fingerprint can be updated. The registered volume also needs to be written to client state so that we can restore the fingerprint after a restart. Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2025-01-08 16:58:58 -05:00
Seth Hoenig	2bfe817721	Post 1.9.4 release (#24811 ) * Generate files for 1.9.4 release * Prepare for next release * Merge release 1.9.4 files --------- Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2025-01-08 09:36:22 -06:00
Ruben Nic	894b32f178	Merge pull request #24791 from hashicorp/rm/version-bump-prettier Add override to force prettier 5.0	2025-01-08 10:01:48 -05:00
Piotr Kazmierczak	7726ae68c6	client: move 'waiting for previous alloc to terminate' log messages to info (#24804 )	2025-01-08 15:44:35 +01:00
James Rasell	359571df01	e2e: Account for non-default region in Prometheus scrape config. (#24807 )	2025-01-08 14:08:17 +00:00
Michael Smithhisler	34a34e7233	plugins: validate logmon process during reattach (#24798 )	2025-01-08 08:50:33 -05:00

1 2 3 4 5 ...

26575 Commits