nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-03 17:05:43 +03:00

Author	SHA1	Message	Date
Tim Gross	bb062deadc	docs: update service mesh integration docs for transparent proxy (#20251 ) Update the service mesh integration docs to explain how Consul needs to be configured for transparent proxy. Update the walkthrough to assume that `transparent_proxy` mode is the best approach, and move the manually-configured `upstreams` to a separate section for users who don't want to use Consul DNS. Ref: https://github.com/hashicorp/nomad/pull/20175 Ref: https://github.com/hashicorp/nomad/pull/20241	2024-04-04 17:01:07 -04:00
Tim Gross	76009d89af	tproxy: networking hook changes (#20183 ) When `transparent_proxy` block is present and the network mode is `bridge`, use a different CNI configuration that includes the `consul-cni` plugin. Before invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the allocation. This includes: * Use all the `transparent_proxy` block fields * The reserved ports are added to the inbound exclusion list so the alloc is reachable from outside the mesh * The `expose` blocks and `check` blocks with `expose=true` are added to the inbound exclusion list so health checks work. The `iptables.Config` is then passed as a CNI argument to the `consul-cni` plugin. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-04-04 17:01:07 -04:00
Tim Gross	e8d203e7ce	transparent proxy: add jobspec support (#20144 ) Add a transparent proxy block to the existing Connect sidecar service proxy block. This changeset is plumbing required to support transparent proxy configuration on the client. Ref: https://github.com/hashicorp/nomad/issues/10628	2024-04-04 17:01:07 -04:00
Tim Gross	648daceca1	E2E: skip Vault 1.16.1 for JWT compatibility test (#20301 ) Vault 1.16.1 has a known issue around the JWT auth configuration that will prevent this test from ever passing. Skip testing the JWT code path on 1.16.1. Once 1.16.2 ships it will no longer get skipped. Ref: https://github.com/hashicorp/nomad/issues/20298	2024-04-04 17:00:35 -04:00
Yorick Gersie	6124ee8afb	cpuset fixer: use correct cgroup path for updates (#20276 ) * cpuset fixer: use correct cgroup path for updates fixes #20275 * docker: flatten switch statement and add test cases * cl: add cl --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2024-04-04 15:54:10 -05:00
Tim Gross	a71632e3a4	docs: recommendation for maximum number of template dependencies (#20259 )	2024-04-04 11:08:49 -04:00
Julien Castets	9b5eb26c83	doc nomad-autoscaler: add options for pass-through strategy (#20284 )	2024-04-04 10:54:34 -04:00
Tim Gross	c1f020d60f	E2E: refactor Connect tests to use stdlib testing (#20278 ) Migrate our E2E tests for Connect off the old framework in preparation for writing E2E tests for transparent proxy and the updated workload identity workflow. Mark the tests that cover the legacy Consul token submitted workflow. Ref: https://github.com/hashicorp/nomad/pull/20175	2024-04-04 10:48:10 -04:00
James Rasell	fd5a42a6ca	docs: clarify data dir default parameters and default creation. (#20268 )	2024-04-04 09:20:47 +01:00
Tim Gross	78f9f17867	api: add missing `AllocDirStats` field in Go API (#20261 ) The JSON response for the Read Stats client API includes an `AllocDirStats` field. This field is missing in the `api` package, so consumers of the Go API can't use it to read the values we're getting back from the HTTP server. Fixes: https://github.com/hashicorp/nomad/issues/20246	2024-04-03 08:54:05 -04:00
Tim Gross	4ce728afbd	E2E: make `vault.create_from_role` unique per cluster (#20267 ) If a E2E cluster is destroyed after a different one has been created, the role and policy we create in Vault for the cluster will be deleted and Vault-related tests will fail. Note that before 1.9, we should figure out a way to give HCP Vault access to the JWKS endpoint and have a different set of policies, but we'll need to have a role-per-cluster in that case as well. Fixes: https://github.com/hashicorp/nomad-e2e/issues/138 (internal)	2024-04-03 08:45:01 -04:00
Tim Gross	cf25cf5cd5	E2E: use a self-hosted Consul for easier WI testing (#20256 ) Our `consulcompat` tests exercise both the Workload Identity and legacy Consul token workflow, but they are limited to running single node tests. The E2E cluster is network isolated, so using our HCP Consul cluster runs into a problem validating WI tokens because it can't reach the JWKS endpoint. In real production environments, you'd solve this with a CNAME pointing to a public IP pointing to a proxy with a real domain name. But that's logisitcally impractical for our ephemeral nightly cluster. Migrate the HCP Consul to a single-node Consul cluster on AWS EC2 alongside our Nomad cluster. Bootstrap TLS and ACLs in Terraform and ensure all nodes can reach each other. This will allow us to update our Consul tests so they can use Workload Identity, in a separate PR. Ref: #19698	2024-04-02 15:24:51 -04:00
Tim Gross	31f53cec01	structs: fix test for empty DNS configuration (#20233 ) The `DNSConfig.IsZero` method incorrectly returns true if any of the fields are empty, rather than if all of them are empty. The only code path that consumes this method is on the client, where it's used as part of equality checks on the allocation network status to set the priority of allocation updates to the server. Hypothetically, if the network hook modified only the DNS configuration and no task states were emitted, it would be possible to miss an allocation update. In practice this appears to be impossible, but we should fix the bug so that there aren't errors in future consumers.	2024-03-29 10:47:53 -04:00
Seth Hoenig	6ad648bec8	networking: Inject implicit constraints on CNI plugins when using bridge mode (#15473 ) This PR adds a job mutator which injects constraints on the job taskgroups that make use of bridge networking. Creating a bridge network makes use of the CNI plugins: bridge, firewall, host-local, loopback, and portmap. Starting with Nomad 1.5 these plugins are fingerprinted on each node, and as such we can ensure jobs are correctly scheduled only on nodes where they are available, when needed.	2024-03-27 16:11:39 -04:00
Tim Gross	9c2286014f	docs: update Consul compatibility matrix (#20242 ) Version of Nomad and Consul that were known not to be compatible are no longer supported in general. Update the compatibility matrix for Consul to match.	2024-03-27 16:11:14 -04:00
Tim Gross	c3e7b13d54	deps: update consul-template to 0.37.4 to fix resource leak (#20234 ) A Nomad user reported an issue where template runner `View.poll` goroutines were being leaked when using templates with many dependencies. This resource leak was fixed in consul-template 0.37.4. Fixes: https://github.com/hashicorp/nomad/issues/20163	2024-03-27 11:51:34 -04:00
Juana De La Cuesta	c7e7fdfa84	[f-gh-208] Force recreation and redeployment of task if volume label changes (#20074 ) Scheduler: Force recreation and redeployment of task if volume mount labels in the task definitions changes	2024-03-27 11:43:31 +01:00
Seth Hoenig	bd2a809135	subproc: lazy lookup nomad binary in self call (#20231 )	2024-03-26 12:33:06 -05:00
Tim Gross	2fde4a0c93	namespace/node pool: forward RPCs cross-region if ACLs aren't enabled (#20220 ) Although it's not recommended, it's possible to federate regions without ACLs enabled. In this case, ACL-related objects such as namespaces and node pools can be written independently in each region and won't be replicated. If you use commands like `namespace apply` or `node pool delete`, the RPC is supposed to be forwarded to the authoritative region. But when ACLs are disabled, there is no authoritative region and so the RPC will always be applied to the local region even if the `-region` flag is passed. Remove the change to the RPC region for the namespace and node pool write RPC whenver ACLs are disabled, so that forwarding works. Fixes: https://github.com/hashicorp/nomad/issues/20197 Ref: https://github.com/hashicorp/nomad/issues/20128	2024-03-26 10:39:37 -04:00
Seth Hoenig	77889a16fb	exec2: more tweaks to driver harness (#20221 ) Also add an explicit exit code to subproc package for when a child process is instructed to run an unrunnable command (i.e. cannot be found or is not executable) - with the 127 return code folks using bash are familiar with	2024-03-26 08:02:41 -05:00
Tim Gross	a50e6267d0	cli: remove redundant `allocs` profile from `operator debug` (#20219 ) The pprof `allocs` profile is identical to the `heap` profile, just with a different default view. Collecting only one of the two is sufficient to view all of `alloc_objects`, `alloc_space`, `inuse_objects`, and `inuse_space`, and collecting only one means that both views will be of the same profile. Also improve the docstrings on the goroutine profiles explaining what's in each so that it's clear why we might want all of debug=0, debug=1, and debug=2.	2024-03-26 08:19:18 -04:00
Juana De La Cuesta	f2965cad36	[gh-19729] Fix logic for updating terminal allocs on clients with max client disconnect (#20181 ) Only ignore allocs on terminal states that are updated --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2024-03-26 10:31:58 +01:00
Phil Renaud	fee242c53d	Namespace added to example test in exec window (#20218 )	2024-03-25 17:02:07 -04:00
James Rasell	facc3e8013	agent: allow configuration of in-memory telemetry sink. (#20166 ) This change adds configuration options for setting the in-memory telemetry sink collection and retention durations. This sink backs the metrics JSON API and previously had hard-coded default values. The new options are particularly useful when running development or debug environments, where metrics collection is desired at a fast and granular rate.	2024-03-25 15:00:18 +00:00
Tim Gross	02d98b9357	operator debug: fix pprof interval handling (#20206 ) The `nomad operator debug` command saves a CPU profile for each interval, and names these files based on the interval. The same functions takes a goroutine profile, heap profile, etc. but is missing the logic to interpolate the file name with the interval. This results in the operator debug command making potentially many expensive profile requests, and then overwriting the data. Update the command to save every profile it scrapes, and number them similarly to the existing CPU profile. Additionally, the command flags for `-pprof-interval` and `-pprof-duration` were validated backwards, which meant that we always coerced the `-pprof-interval` to be the same as the `-pprof-duration`, which always resulted in a single profile being taken at the start of the bundle. Correct the check as well as change the defaults to be more sensible. Fixes: https://github.com/hashicorp/nomad/issues/20151	2024-03-25 09:01:06 -04:00
Tim Gross	bdf3ff301e	jobspec: add support for destination partition to `upstream` block (#20167 ) Adds support for specifying a destination Consul admin partition in the `upstream` block. Fixes: https://github.com/hashicorp/nomad/issues/19785	2024-03-22 16:15:22 -04:00
Tim Gross	de218d1919	E2E: change timing of `vaultsecrets` test to guarantee lease window (#20200 ) We've been getting a couple of errors from this test on nightly where the template hasn't rendered by the time we expect it to. I've run some tests locally and this may be a timing issue introduced by recent code changes to templates. Move the start of the timer to after we're guaranteed that we've got a secret lease TTL started, to eliminate this as a source of flakiness. In my tests this adds another ~5s to a test that already takes over a minute to run anyways.	2024-03-22 16:12:00 -04:00
Conor Mongey	48535abc2d	Add nomad-port-forward to community tools (#20190 )	2024-03-22 15:31:19 -04:00
Tim Gross	d3ddb0aa49	docs: make it clear that federation features require ACLs (#20196 ) Our documentation has a hidden assumption that users know that federation replication requires ACLs to be enabled and bootstrapped. Add notes at some of the places users are likely to look for it. A separate follow-up PR to the federation tutorial should point to the ACL multi-region tutorial as well. Fixes: https://github.com/hashicorp/nomad/issues/20128	2024-03-22 15:15:00 -04:00
Michael Schurter	976789b8de	Small docs updates: bai rkt, cya openapi, lol ephemeral_disk "examples" (#20198 ) * docs: rip openapi spec * docs: remove useless ephemeral_disk examples	2024-03-22 11:53:25 -07:00
Tim Gross	10dd738a03	jobspec: update `gateway.ingress.service` Consul API fields (#20176 ) Add support for further configuring `gateway.ingress.service` blocks to bring this block up-to-date with currently available Consul API fields (except for namespace and admin partition, which will need be handled under a different PR). These fields are sent to Consul as part of the job endpoint submission hook for Connect gateways. Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>	2024-03-22 13:50:48 -04:00
Piotr Kazmierczak	2556ff9a0e	deps: update generate.sh script to msgpack v2 (#20186 )	2024-03-22 16:56:29 +01:00
Tim Gross	15162917c1	cni: fix regression in falling back to DNS owned by `dockerd` (#20189 ) In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not threaded through to the task configuration. This resulted in a regression where a DNS override set by `dockerd` was not respected for `bridge` mode networking. Our existing handling of CNI DNS incorrectly assumed that the DNS field would be empty, when in fact it contains a single empty DNS struct. Handle this case correctly by checking whether the DNS struct we get back from CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of this case. Fixes: https://github.com/hashicorp/nomad/issues/20174	2024-03-22 10:54:16 -04:00
Seth Hoenig	c36db1b005	drivers/testutil: set full filepath for envs when using unveil fs isolation (#20187 )	2024-03-22 09:46:17 -05:00
Michael Schurter	23e4b7c9d2	Upgrade go-msgpack to v2 (#20173 ) Replaces #18812 Upgraded with: ``` find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';' find . -name '.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';' go get go get -v -u github.com/hashicorp/raft-boltdb/v2 go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80 ``` see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0 for details	2024-03-21 11:44:23 -07:00
Luiz Aoqui	b5573b7470	docs: fix `invoke_scheduler` metrics (#20172 )	2024-03-21 10:57:30 -04:00
Tim Gross	7b9bce2d08	config: fix `client.template` config merging with defaults (#20165 ) When loading the client configuration, the user-specified `client.template` block was not properly merged with the default values. As a result, if the user set any `client.template` field, all the other field defaulted to their zero values instead of the documented defaults. This changeset: * Adds the missing `Merge` method for the client template config and ensures it's called. * Makes a single source of truth for the default template configuration, instead of two different constructors. * Extends the tests to cover the merge of a partial block better. Fixes: https://github.com/hashicorp/nomad/issues/20164	2024-03-20 10:18:56 -04:00
Juana De La Cuesta	56bf253474	Add docs for disconnected block (#20147 ) Expand the job settings to include the disconnect block and set as deprecated the fields that will be replaced by it.	2024-03-20 10:08:16 +01:00
Charlie Voiselle	7b27bc344b	[refactor] Move task directory destroy logic from alloc_dir.go to task_dir.go (#20006 ) * Move task directory destroy logic from alloc_dir to task_dir * Update errors to wrap error cause * Use constants for file permissions * Make multierror handling consistent. * Make helpers for directory creation * Move mount dir unlink to task_dir Unlink method * Make constant for file mode 710 Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2024-03-19 13:49:09 -04:00
Tim Gross	dc39c20e66	docs: make recommendation for collection interval vs scrape interval (#20056 ) Metrics tools that "pull" metrics, such as Prometheus, have a configurable interval for how frequently they scrape metrics. This should be greater or equal to the Nomad `telemetry.collection_interval` to avoid re-scraping metrics that cannot have been updated in that interval. Fixes: https://github.com/hashicorp/nomad/issues/20055	2024-03-19 08:56:29 -04:00
Charlie Voiselle	4cc90c0b22	Fix LeadershipTransfer tests (#20154 ) Multi-node Nomad clusters under test must use the RPC calls to bootstrap the ACL subsystem. The original implementation of the testcluster tried naively supersizing the single node behavior in TestRaftRemovePeer, which mutates the single node's state directly. In the case of a multi- node cluster, the RPC calls are necessary to ensure that the data is replicated via Raft to all of the cluster members.	2024-03-18 18:03:23 -04:00
Tim Gross	c4253470a0	autopilot: add `operator autopilot health` command (#20156 ) Add a command line operation that reports Enterprise autopilot data from the `/operator/autopilot/health` API. I've pulled this feature out of @lindleywhite's PR in the Enterprise repo. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394 Co-authored-by: Lindley <lindley@hashicorp.com>	2024-03-18 14:46:18 -04:00
Tim Gross	5138c1c82f	autopilot: add Enterprise health information to API endpoint (#20153 ) Add information about autopilot health to the `/operator/autopilot/health` API in Nomad Enterprise. I've pulled the CE changes required for this feature out of @lindleywhite's PR in the Enterprise repo. A separate PR will include a new `operator autopilot health` command that can present this information at the command line. Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394 Co-authored-by: Lindley <lindley@hashicorp.com>	2024-03-18 11:38:17 -04:00
Tim Gross	1cbddfa8ce	acl: remove unused nil ACL object handling (#20150 ) As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the object returned by `ResolveACL` if there's no error return, because in the case where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking nil is not a bug but should be discouraged because it opens us up to future bugs that would bypass ACLs. While working on an unrelated feature @lindleywhite discovered that we missed removing the nil check from several endpoints with our semgrep linter. This changeset fixes that. Co-Author: Lindley <lindley@hashicorp.com>	2024-03-18 10:04:51 -04:00
Tim Gross	695bb7ffcf	docs: improve wording around autoconfiguration via Consul (#20139 ) Fixes: https://github.com/hashicorp/nomad/issues/20132	2024-03-15 08:44:58 -04:00
Tim Gross	db195726a5	cli: add options to help string for `acl policy info` (#20138 ) Fixes: https://github.com/hashicorp/nomad/issues/20117	2024-03-15 08:44:50 -04:00
Juana De La Cuesta	ff72248c86	func: add new picker dependency (#20029 ) This commit introduces the new options for reconciling a reconnecting allocation and its replacement: Best score (Current implementation) Keep original Keep replacement Keep the one that has run the longest time It is achieved by adding a new dependency to the allocReconciler that calls the corresponding function depending on the task group's disconnect strategy. For more detailed information, refer to the new stanza for disconnected clientes RFC. It resolves 15144	2024-03-15 13:42:08 +01:00
Tim Gross	13617eee4b	template: improve internal documentation around shutdown (#20134 ) While investigating a report around possible consul-template shutdown issues, which didn't bear fruit, I found that some of the logic around template runner shutdown is unintuitive. * Add some doc strings to the places where someone might think we should be obviously stopping the runner or returning early. * Mark context argument for `Poststart`, `Stop`, and `Update` hooks as unused. No functional code changes.	2024-03-14 15:33:32 -04:00
Amir Abbas	40b8f17717	Support insecure flag on artifact (#20126 )	2024-03-14 10:59:20 -05:00
Seth Hoenig	bb54d16e4a	exec2: setup RPC plumbing for dynamic workload users (#20129 ) And pass the dynamic users pool from the client into the hook.	2024-03-13 14:06:52 -05:00

1 2 3 4 5 ...

25699 Commits