nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
Tim Gross	3432b0a2d6	consul: only add fingerprint link if unique.consul.name is set (#26787 ) In Nomad Enterprise we can fingerprint multiple Consul datacenters. If neither is `"default"` then we end up with warning logs about adding a "link". The `Link` field on the `Node` struct is a map of attributes that only contributes to the node's computed hash. The `"consul"` key's value is derived from the `unique.consul.name` attribute, which only exists if there's a default Consul cluster. Update the fingerprint to skip setting the link field if there's no `unique.consul.name`, and lower the warning log for malformed fields to debug; this is a minor scheduling optimization largely captured by existing Consul fields in the node computed class. The only reason not to remove it entirely is to avoid changing computed classes on existing large clusters. Fixes: https://github.com/hashicorp/nomad/issues/26781 Ref: https://hashicorp.atlassian.net/browse/NMD-998	2025-09-17 13:23:01 -04:00
Olli Janatuinen	6398ef9475	secrets: Support custom plugins in Windows (#26751 ) Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>	2025-09-16 09:14:50 -04:00
Michael Smithhisler	10ed46cbd4	secrets: pass key/value config data to plugins as env (#26455 ) Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2025-09-05 16:08:24 -04:00
Michael Smithhisler	9950ef515c	secrets: validate name and update client config (#26447 )	2025-09-05 16:08:23 -04:00
Michael Smithhisler	00ef9cacab	secrets: add common secrets plugins impl (#26335 ) Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2025-09-05 16:08:23 -04:00
Michael Smithhisler	ac32b0864d	scheduler: adds implicit constraint for secrets plugin node attributes (#26303 )	2025-09-05 16:08:23 -04:00
James Rasell	c85c723336	ci: Run core tests groups workflow on amd64 and arm64 runners. (#25695 )	2025-04-17 15:16:29 +01:00
Daniel Bennett	99c25fc635	dhv: mkdir plugin parameters: uid,guid,mode (#25533 ) also remove Error logs from client rpc and promote plugin Debug logs to Error (since they have more info in them)	2025-03-28 10:13:13 -05:00
Tim Gross	1788bfb42e	remove addresses from node class hash (#24942 ) When a node is fingerprinted, we calculate a "computed class" from a hash over a subset of its fields and attributes. In the scheduler, when a given node fails feasibility checking (before fit checking) we know that no other node of that same class will be feasible, and we add the hash to a map so we can reject them early. This hash cannot include any values that are unique to a given node, otherwise no other node will have the same hash and we'll never save ourselves the work of feasibility checking those nodes. In #4390 we introduce the `nomad.advertise.address` attribute and in #19969 we introduced `consul.dns.addr` attribute. Both of these are unique per node and break the hash. Additionally, we were not correctly filtering attributes out when checking if a node escaped the class by not filtering for attributes that start with `unique.`. The test for this introduced in #708 had an inverted assertion, which allowed this to pass unnoticed since the early days of Nomad. Ref: https://github.com/hashicorp/nomad/pull/708 Ref: https://github.com/hashicorp/nomad/pull/4390 Ref: https://github.com/hashicorp/nomad/pull/19969	2025-03-03 09:28:32 -05:00
Tim Gross	e02ef73abf	fingerprint: only log failed Consul fingerprint once (#25182 ) In #24526 we updated Consul and Vault fingerprinting so that we no longer periodically fingerprint. In #25102 we made it so that we fingerprint periodically on start until the first fingerprint, in order to tolerate Consul or Vault not being available on start. For clusters not running Consul, this leads to a warn-level log every 15s. This same log exists for Vault, but Vault support is opt-in via `vault.enable = true` whereas you have to manually disable the fingerprinter for Consul. Make it so that we only log a failed Consul fingerprint once per Consul cluster. Reset the gate on this once we have a successful fingerprint, so that we get the logs after a reload if Consul is unavailable. Ref: https://github.com/hashicorp/nomad/pull/24526 Ref: https://github.com/hashicorp/nomad/pull/25102 Fixes: https://github.com/hashicorp/nomad/issues/25181	2025-02-21 13:09:34 -05:00
Tim Gross	8c57fd5eb0	fingerprint: initial fingerprint of Vault/Consul should be periodic (#25102 ) In #24526 we updated the Consul and Vault fingerprints so that they are no longer periodic. This fixed a problem that cluster admins reported where rolling updates of Vault or Consul would cause a thundering herd of fingerprint updates across the whole cluster. But if Consul/Vault is not available during the initial fingerprint, it will never get fingerprinted again. This is challenging for cluster updates and black starts because the implicit service startup ordering may require reloads. Instead, have the fingerprinter run periodically but mark that it has made its first successful fingerprint of all Consul/Vault clusters. At that point, we can skip further periodic updates. The `Reload` method will reset the mark and allow the subsequent fingerprint to run normally. Fixes: https://github.com/hashicorp/nomad/issues/25097 Ref: https://github.com/hashicorp/nomad/pull/24526 Ref: https://github.com/hashicorp/nomad/issues/24049	2025-02-13 14:26:04 -05:00
Jorge Marey	25426f0777	fingerprint: add config option to disable dmidecode (#25108 )	2025-02-13 11:20:48 -05:00
Matt Keeler	833e240597	Upgrade to using hashicorp/go-metrics@v0.5.4 (#24856 ) * Upgrade to using hashicorp/go-metrics@v0.5.4 This also requires bumping the dependencies for: * memberlist * serf * raft * raft-boltdb * (and indirectly hashicorp/mdns due to the memberlist or serf update) Unlike some other HashiCorp products, Nomads root module is currently expected to be consumed by others. This means that it needs to be treated more like our libraries and upgrade to hashicorp/go-metrics by utilizing its compat packages. This allows those importing the root module to control the metrics module used via build tags.	2025-01-31 15:22:00 -05:00
Daniel Bennett	49c147bcd7	dynamic host volumes: change env vars, fixup auto-delete (#24943 ) * plugin env: DHV_HOST_PATH->DHV_VOLUMES_DIR * client config: host_volumes_dir * plugin env: add namespace+nodepool * only auto-delete after error saving client state on initial create	2025-01-27 10:36:53 -06:00
Seth Hoenig	1356880962	fingerprint: convert consul and vault fingerprinters to be reloadable (#24526 ) This PR changes the Consul and Vault fingerprint implementations to be reloadable rather than periodic. Reasons described in the issue.	2025-01-27 09:20:01 +00:00
Daniel Bennett	985eb53c65	dynamic host volumes: plugin spec tweaks (#24848 ) * prefix plugin env vars with DHV_ * add env: DHV_VOLUME_ID, DHV_PLUGIN_DIR * 5s timeout on fingerprint calls	2025-01-13 14:18:10 -06:00
Michael Smithhisler	606ce9dd90	deps: upgrade aws-sdk-go from v1 to v2 (#24720 )	2025-01-09 17:27:19 -05:00
Daniel Bennett	af967184a6	dynamic host volumes: tweak plugin fingerprint (#24711 ) Instead of a plugin `version` subcommand that responds with a string (established in #24497), respond to a `fingerprint` command with a data structure that we may extend in the future (such as plugin capabilities, like size constraint support?). In the immediate term, it's still just the version: `{"version": "0.0.1"}` In addition to leaving the door open for future expansion, I think it will also avoid false positives detecting executables that just happen to respond to a `version` command. This also reverses the ordering of the fingerprint string parts from `plugins.host_volume.version.mkdir` (which aligned with CNI) to `plugins.host_volume.mkdir.version` (makes more sense to me)	2024-12-19 09:25:55 -05:00
Daniel Bennett	46a39560bb	dynamic host volumes: fingerprint client plugins (#24589 )	2024-12-19 09:25:54 -05:00
Rodrigo Lourenço	cdebf96b0e	fingerprint gce: collect preemptibility	2024-10-23 15:19:20 +02:00
Tim Gross	b7f1800657	fingerprint: update landlock test to accept v4+ APIs (#23979 ) The landlock fingerprint test assumes there's no version of the landlock API >3. Update the test assertion to allow for the current v4 and any future versions.	2024-09-17 15:07:44 -04:00
Piotr Kazmierczak	0bc9796d3b	client: log an error message if total detected cpu is zero (#23827 )	2024-08-15 18:31:27 +02:00
Seth Hoenig	db0642099e	build: update golangci-lint to 1.60.1 (#23807 ) * build: update golangci-lint to 1.60.1 * ci: update golangci-lint to v1.60.1 Helps with go1.23 compatability. Introduces some breaking changes / newly enforced linter patterns so those are fixed as well.	2024-08-14 10:09:31 -05:00
guifran001	1c44521543	client: Add a preferred address family option for network-interface (#23389 ) to prefer ipv4 or ipv6 when deducing IP from network interface Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2024-07-12 15:30:38 -05:00
Tim Gross	7d73065066	numa: fix scheduler panic due to topology serialization bug (#23284 ) The NUMA topology struct field `NodeIDs` is a `idset.Set`, which has no public members. As a result, this field is never serialized via msgpack and persisted in state. When `numa.affinity = "prefer"`, the scheduler dereferences this nil field and panics the scheduler worker. Ideally we would fix this by adding a msgpack serialization extension, but because the field already exists and is just always empty, this breaks RPC wire compatibility across upgrades. Instead, create a new field that's populated at the same time we populate the more useful `idset.Set`, and repopulate the set on demand. Fixes: https://hashicorp.atlassian.net/browse/NET-9924	2024-06-11 08:55:00 -04:00
Tim Gross	a74775814c	fingerprint: add DNS address and port to Consul fingerprint (#19969 ) In order to provide a DNS address and port to Connect tasks configured for transparent proxy, we need to fingerprint the Consul DNS address and port. The client will pass this address/port to the iptables configuration provided to the `consul-cni` plugin. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-02-14 12:15:58 -05:00
Tim Gross	62c57d208b	fingerprint: eliminate spurious warning logs with Consul CE (#19923 ) Support for fingerprinting the Consul admin partition was added in #19485. But when the client fingerprints Consul CE, it gets a valid fingerprint and working Consul but with a warn-level log. Return "ok" from the partition extractor, but also ensure that we only add the Consul attribute if it actually has a value. Fixes: https://github.com/hashicorp/nomad/issues/19756	2024-02-09 08:19:00 -05:00
Tim Gross	2e33115c15	consul: fingerprint Consul Enterprise admin partitions (#19485 ) Consul Enterprise agents all belong to an admin partition. Fingerprint this attribute when available. When a Consul agent is not explicitly configured with "default" it is in the default partition but will not report this in its `/v1/agent/self` endpoint. Fallback to "default" when missing only for Consul Enterprise. This feature provides users the ability to add constraints for jobs to land on Nomad nodes that have a Consul in that partition. Or it can allow cluster administrators to pair Consul partitions 1:1 with Nomad node pools. We'll also have the option to implement a future `partition` field in the jobspec's `consul` block to create an implicit constraint. Ref: https://github.com/hashicorp/nomad/issues/13139#issuecomment-1856479581	2023-12-15 09:26:25 -05:00
Tim Gross	50f0ce5412	config: remove old Vault/Consul config blocks from client (#18994 ) Remove the now-unused original configuration blocks for Consul and Vault from the client. When the client needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service. Add a helper for accessing the default clusters (for the client's own use). This is two of three changesets for this work. The remainder will implement the same changes in the `command/agent` package. As part of this work I discovered and fixed two bugs: * The gRPC proxy socket that we create for Envoy is only ever created using the default Consul cluster's configuration. This will prevent Connect from being used with the non-default cluster. * The Consul configuration we use for templates always comes from the default Consul cluster's configuration, but will use the correct Consul token for the non-default cluster. This will prevent templates from being used with the non-default cluster. Ref: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Fixes: https://github.com/hashicorp/nomad/issues/18984 Fixes: https://github.com/hashicorp/nomad/issues/18983	2023-11-07 09:15:37 -05:00
Seth Hoenig	951cde4e3b	numa: fix cpu topology conversion for non linux systems (#18843 )	2023-10-24 09:12:34 -05:00
Seth Hoenig	83720740f5	core: plumbing to support numa aware scheduling (#18681 ) * core: plumbing to support numa aware scheduling * core: apply node resources compatibility upon fsm rstore Handle the case where an upgraded server dequeus an evaluation before a client triggers a new fingerprint - which would be needed to cause the compatibility fix to run. By running the compat fix on restore the server will immediately have the compatible pseudo topology to use. * lint: learn how to spell pseudo	2023-10-19 15:09:30 -05:00
Tim Gross	5001bf4547	consul: use constant instead of "default" literal (#18611 ) Use the constant `structs.ConsulDefaultCluster` instead of the string literal "default", as we've done for Vault.	2023-09-28 16:50:21 -04:00
Luiz Aoqui	868aba57bb	vault: update identity name to start with `vault_` (#18591 ) * vault: update identity name to start with `vault_` In the original proposal, workload identities used to derive Vault tokens were expected to be called just `vault`. But in order to support multiple Vault clusters it is necessary to associate identities with specific Vault cluster configuration. This commit implements a new proposal to have Vault identities named as `vault_<cluster>`.	2023-09-27 15:53:28 -03:00
Tim Gross	20eadc7b29	config: move Consul getter out of fingerprinter (#18556 )	2023-09-22 10:58:39 -04:00
Tim Gross	fdc6c2151d	vault: select Vault API client by cluster name (#18533 ) Nomad Enterprise will support configuring multiple Vault clients. Instead of having a single Vault client field in the Nomad client, we'll have a function that callers can parameterize by the Vault cluster name that returns the correctly configured Vault API client wrapper.	2023-09-19 14:35:01 -04:00
Seth Hoenig	591394fb62	drivers: plumb hardware topology via grpc into drivers (#18504 ) * drivers: plumb hardware topology via grpc into drivers This PR swaps out the temporary use of detecting system hardware manually in each driver for using the Client's detected topology by plumbing the data over gRPC. This ensures that Client configuration is taken to account consistently in all references to system topology. * cr: use enum instead of bool for core grade * cr: fix test slit tables to be possible	2023-09-18 08:58:07 -05:00
Seth Hoenig	2e1974a574	client: refactor cpuset partitioning (#18371 ) * client: refactor cpuset partitioning This PR updates the way Nomad client manages the split between tasks that make use of resources.cpus vs. resources.cores. Previously, each task was explicitly assigned which CPU cores they were able to run on. Every time a task was started or destroyed, all other tasks' cpusets would need to be updated. This was inefficient and would crush the Linux kernel when a client would try to run ~400 or so tasks. Now, we make use of cgroup heirarchy and cpuset inheritence to efficiently manage cpusets. * cr: tweaks for feedback	2023-09-12 09:11:11 -05:00
Tim Gross	b022346575	fingerprint: backoff on Consul fingerprint after initial success (#18426 ) In the original design of Consul fingerprinting, we would poll every period so that we could change the client's fingerprint if Consul became unavailable. As of 1.4.0 (ref #14673) we no longer update the fingerprint in order to avoid excessive `Node.Register` RPCs when someone's Consul cluster is flapping. This allows us to safely backoff Consul fingerprinting on success, just as we have with Vault.	2023-09-08 08:17:07 -04:00
Tim Gross	a8e68e6479	fingerprint: add support for fingerprinting multiple Consul clusters (#18392 ) fingerprint: add support for fingerprinting multiple Consul clusters Add fingerprinting we'll need to accept multiple Consul clusters in upcoming Nomad Enterprise features. The fingerprinter will create a map of Consul clients by cluster name. In Nomad CE, all but the default cluster will be ignored and there will be no visible behavior change. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-09-07 14:05:35 -04:00
Tim Gross	c145e8b30f	fingerprint: add warning in CE when there are multiple vaults (#18412 ) Nomad CE only supports a single (default) Vault cluster, so log a warning if the user has configured multiple Vaults.	2023-09-07 09:51:48 -04:00
Tim Gross	b51b2a2705	fingerprint: add support for fingerprinting multiple Vault clusters (#18253 ) Add fingerprinting we'll need to accept multiple Vault clusters in upcoming Nomad Enterprise features. The fingerprinter will create a map of Vault clients by cluster name. In Nomad CE, all but the default cluster will be ignored and there will be no visible behavior change.	2023-08-18 15:33:22 -04:00
James Rasell	6108f5c4c3	admin: rename _oss files to _ce (#18209 )	2023-08-18 07:47:24 +01:00
hashicorp-copywrite[bot]	2d35e32ec9	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:15 -05:00
Seth Hoenig	a4cc76bd3e	numa: enable numa topology detection (#18146 ) * client: refactor cgroups management in client * client: fingerprint numa topology * client: plumb numa and cgroups changes to drivers * client: cleanup task resource accounting * client: numa client and config plumbing * lib: add a stack implementation * tools: remove ec2info tool * plugins: fixup testing for cgroups / numa changes * build: update makefile and package tests and cl	2023-08-10 17:05:30 -05:00
Kevin Schoonover	4841791c86	fingerprint: fix 'default' alias not added to interface specified by `network_interface` (#18096 )	2023-08-01 08:35:31 -04:00
Ville Vesilehto	2c463bb038	chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968 )	2023-07-26 15:28:09 +01:00
Patric Stout	e190eae395	Use config "cpu_total_compute" (if set) for all CPU statistics (#17628 ) Before this commit, it was only used for fingerprinting, but not for CPU stats on nodes or tasks. This meant that if the auto-detection failed, setting the cpu_total_compute didn't resolved the issue. This issue was most noticeable on ARM64, as there auto-detection always failed.	2023-07-19 13:30:47 -05:00
Seth Hoenig	100c460467	env/aws: updates from ec2info (#17835 )	2023-07-07 10:12:05 -05:00
VishnuJin	102f73274b	fingerprint: added windows os.build attribute to host fingerprint (#17576 )	2023-06-21 10:53:50 -04:00
Jerome Eteve	0d41fb6747	client checks kernel module in /sys/module for WSL2 bridge networking (#17306 )	2023-06-06 10:26:50 -04:00

1 2 3 4 5 ...

389 Commits