Commit Graph

202 Commits

Author SHA1 Message Date
James Rasell
8eb569faf4 job_hooks: add implicit constraint when using Consul for services. (#12602) 2022-04-20 14:09:13 +02:00
Derek Strickland
8ac3e642e6 reconciler: 2 phase reconnects and tests (#12333)
* structs: Add alloc.Expired & alloc.Reconnected functions. Add Reconnect eval trigger by.

* node_endpoint: Emit new eval for reconnecting unknown allocs.

* filterByTainted: handle 2 phase commit filtering rules.

* reconciler: Append AllocState on disconnect. Logic updates from testing and 2 phase reconnects.

* allocs: Set reconnect timestamp. Destroy if not DesiredStatusRun. Watch for unknown status.
2022-04-05 17:13:10 -04:00
DerekStrickland
042a07bcaf client: reconnect unknown allocations and sync state 2022-04-05 17:10:41 -04:00
James Rasell
d49cf2388a Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
James Rasell
6e8f32a290 client: refactor common service registration objects from Consul.
This commit performs refactoring to pull out common service
registration objects into a new `client/serviceregistration`
package. This new package will form the base point for all
client specific service registration functionality.

The Consul specific implementation is not moved as it also
includes non-service registration implementations; this reduces
the blast radius of the changes as well.
2022-03-15 09:38:30 +01:00
Michael Schurter
c615870911 client: defensively log reserved ports
- Fix test broken due to being improperly setup.
- Include min/max ports in default client config.
2021-10-04 15:43:35 -07:00
Lars Lehtonen
cded17cbaf client: fix multiple imports (#10537) 2021-05-13 14:30:31 -04:00
Nick Ethier
4a25ec9410 testing fixes 2021-04-14 10:17:28 -04:00
Nick Ethier
698a014a42 client: fix failing test 2021-04-13 13:28:15 -04:00
Tim Gross
bb194cb91d test infrastructure for mock client RPCs (#10193)
This commit includes a new test client that allows overriding the RPC
protocols. Only the RPCs that are passed in are registered, which lets you
implement a mock RPC in the server tests. This commit includes an example of
this for the ClientCSI RPC server.
2021-03-31 16:37:09 -04:00
Mahmood Ali
b383f92188 oversubscription: set the linux memory limit
Use the MemoryMaxMB as the LinuxResources limit. This is intended to ease
drivers implementation and adoption of the features: drivers that use
`resources.LinuxResources.MemoryLimitBytes` don't need to be updated.

Drivers that use NomadResources will need to updated to track the new
field value. Given that tasks aren't guaranteed to use up the excess
memory limit, this is a reasonable compromise.
2021-03-30 16:55:58 -04:00
Nick Ethier
ee5b13a77b api: add Resource.Canonicalize test and fix tests to handle ReservedCores field 2021-03-19 22:08:27 -04:00
Drew Bailey
7ce0b5017c Events/msgtype cleanup (#9117)
* use msgtype in upsert node

adds message type to signature for upsert node, update tests, remove placeholder method

* UpsertAllocs msg type test setup

* use upsertallocs with msg type in signature

update test usage of delete node

delete placeholder msgtype method

* add msgtype to upsert evals signature, update test call sites with test setup msg type

handle snapshot upsert eval outside of FSM and ignore eval event

remove placeholder upsertevalsmsgtype

handle job plan rpc and prevent event creation for plan

msgtype cleanup upsertnodeevents

updatenodedrain msgtype

msg type 0 is a node registration event, so set the default  to the ignore type

* fix named import

* fix signature ordering on upsertnode to match
2020-10-19 09:30:15 -04:00
Seth Hoenig
bdeb73cd2c consul/connect: dynamically select envoy sidecar at runtime
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.

This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.

Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.

Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.

`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.

Addresses #8585 #7665
2020-10-13 09:14:12 -05:00
Yoan Blanc
c14c616194 use allow/deny instead of the colored alternatives (#9019)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-10-12 08:47:05 -04:00
Nick Ethier
18ed6a7a85 test: fix up testing around host networks 2020-06-19 13:53:31 -04:00
Nick Ethier
33ce12cda9 CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Seth Hoenig
f8666bb1f9 client: enable nomad client to request and set SI tokens for tasks
When a job is configured with Consul Connect aware tasks (i.e. sidecar),
the Nomad Client should be able to request from Consul (through Nomad Server)
Service Identity tokens specific to those tasks.
2020-01-31 19:03:38 -06:00
Seth Hoenig
94c60b4cfa tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Mahmood Ali
c661d37ca2 fixup! tests: don't assume eth0 network is available 2019-11-21 08:28:20 -05:00
Mahmood Ali
bb81fce18e tests: don't assume eth0 network is available
TestClient_UpdateNodeFromFingerprintKeepsConfig checks a test node
network interface, which is hardcoded to `eth0` and is updated
asynchronously.  This causes flakiness when eth0 isn't available.

Here, we hardcode the value to an arbitrary network interface.
2019-11-20 20:37:30 -05:00
Mahmood Ali
b886e15487 tests: run TestClient_WatchAllocs in non-linux environments 2019-11-20 20:37:29 -05:00
Mahmood Ali
77a0064fcd testS: fix TestClient_RestoreError
When spinning a second client, ensure that it uses new driver
instances, rather than reuse the already shutdown unhealthy drivers from
first instance.

This speeds up tests significantly, but cutting ~50 seconds or so, the
timeout in NewClient until drivers fingerprints.  They never do because
drivers were shutdown already.
2019-11-20 20:37:28 -05:00
Mahmood Ali
47bc949a8f tests: remove TestClient_RestoreError test
TestClient_RestoreError is very slow, taking ~81 seconds.

It has few problematic patterns.  It's unclear what it tests, it
simulates a failure condition where all state db lookup fails and
asserts that alloc fails.  Though starting from
https://github.com/hashicorp/nomad/pull/6216 , we don't fail allocs in
that condition but rather restart them.

Also, the drivers used in second client `c2` are the same singleton
instances used in `c1` and already shutdown.  We ought to start healthy
new driver instances.
2019-11-20 20:37:27 -05:00
Mahmood Ali
8b05f87140 rename to hasLocalState, and ignore clientstate
The ClientState being pending isn't a good criteria; as an alloc may
have been updated in-place before it was completed.

Also, updated the logic so we only check for task states.  If an alloc
has deployment state but no persisted tasks at all, restore will still
fail.
2019-08-28 11:44:48 -04:00
Mahmood Ali
493945a8a4 Alternative approach: avoid restoring
This uses an alternative approach where we avoid restoring the alloc
runner in the first place, if we suspect that the alloc may have been
completed already.
2019-08-27 17:30:55 -04:00
Jasmine Dahilig
ce55bf5fba Merge pull request #5664 from hashicorp/f-http-hcl-region
backfill region from hcl for jobUpdate and jobPlan
2019-06-13 12:25:01 -07:00
Jasmine Dahilig
c467a94e2b backfill region from job hcl in jobUpdate and jobPlan endpoints
- updated region in job metadata that gets persisted to nomad datastore
- fixed many unrelated unit tests that used an invalid region value
(they previously passed because hcl wasn't getting picked up and
the job would default to global region)
2019-06-13 08:03:16 -07:00
Lang Martin
a732cd1f06 Merge pull request #5642 from hashicorp/b-network-fingerprinting-ipv4
network fingerprinting multiple IPs on the configured network device
2019-05-13 11:46:53 -04:00
Chris Baker
4b54e27841 stale allocation data leads to incorrect (and even negative) metrics (#5637)
* client: was not using up-to-date client state in determining which alloc count towards allocated resources

* Update client/client.go

Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-05-07 15:54:36 -04:00
Lang Martin
5f2c6630a2 client_test new test fingerprinting can keep multi ips on a device 2019-05-02 18:11:28 -04:00
Lang Martin
583ae3722c client fingerprinter doesn't overwrite manual configuration
Revert "Revert accidental merge of pr #5482"
This reverts commit c45652ab8c.
2019-04-19 15:23:48 -04:00
Lang Martin
c45652ab8c Revert accidental merge of pr #5482
Revert "fingerprint Constraints and Affinities have Equals, as set"
This reverts commit 596f16fb5f.

Revert "client tests assert the independent handling of interface and speed"
This reverts commit 7857ac5993.

Revert "structs missed applying a style change from the review"
This reverts commit 658916e327.

Revert "client, structs comments"
This reverts commit be2838d6ba.

Revert "client fingerprint updateNetworks preserves the network configuration"
This reverts commit fc309cb430.

Revert "client_test cleanup comments from review"
This reverts commit bc0bf4efb9.

Revert "client Networks Equals is set equality"
This reverts commit f8d432345b.

Revert "struct cleanup indentation in RequestedDevice Equals"
This reverts commit f4746411ca.

Revert "struct Equals checks for identity before value checking"
This reverts commit 0767a4665e.

Revert "fix client-test, avoid hardwired platform dependecy on lo0"
This reverts commit e89dbb2ab1.

Revert "refactor error in client fingerprint to include the offending data"
This reverts commit a7fed726c6.

Revert "add client updateNodeResources to merge but preserve manual config"
This reverts commit 84bd433c7e.

Revert "refactor struts.RequestedDevice to have its own Equals"
This reverts commit 6897825240.

Revert "refactor structs.Resource.Networks to have its own Equals"
This reverts commit 49e2e6c77b.

Revert "refactor structs.Resource.Devices to have its own Equals"
This reverts commit 4ede9226bb.

Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources"
This reverts commit 49fbaace52.

Revert "add structs.Resources Equals"
This reverts commit 8528a2a2a6.

Revert "test that fingerprint resources are updated, net not clobbered"
This reverts commit 8ee02ddd23.
2019-04-11 10:29:40 -04:00
Lang Martin
7857ac5993 client tests assert the independent handling of interface and speed 2019-04-11 09:56:22 -04:00
Lang Martin
bc0bf4efb9 client_test cleanup comments from review 2019-04-11 09:56:22 -04:00
Lang Martin
e89dbb2ab1 fix client-test, avoid hardwired platform dependecy on lo0 2019-04-11 09:56:22 -04:00
Lang Martin
8ee02ddd23 test that fingerprint resources are updated, net not clobbered 2019-04-11 09:56:21 -04:00
Michael Schurter
158c74887e goimports until make check is happy 2019-01-23 06:27:14 -08:00
Michael Schurter
0d61ff0fb9 move pluginutils -> helper/pluginutils
I wanted a different color bikeshed, so I get to paint it
2019-01-22 15:50:08 -08:00
Alex Dadgar
95297c608c goimports 2019-01-22 15:44:31 -08:00
Alex Dadgar
fe2fa21a7d gofmt 2019-01-22 15:43:34 -08:00
Alex Dadgar
b9f36134dc move catalog + grpcutils 2019-01-22 15:11:57 -08:00
Preetha Appan
19777b870b linting fixes 2019-01-12 10:38:20 -06:00
Preetha Appan
c66f2abefd Make unit test for allocrunner failure much nicer 2019-01-12 10:38:20 -06:00
Preetha Appan
a1a7a02e48 Add unit test to simulate alloc runner creation failure 2019-01-12 10:38:20 -06:00
Michael Schurter
a20ae598c7 Apply suggestions from code review
Co-Authored-By: preetapan <preetha@hashicorp.com>
2019-01-12 10:38:20 -06:00
Preetha Appan
72dead7448 REfactor statedb factory config to set it directly in client config 2019-01-12 10:38:20 -06:00
Preetha Appan
80919bf713 Modified destroy failure handling to rely on allocrunner's destroy method
Added a unit test with custom statedb implementation that errors, to
use to verify destroy errors
2019-01-12 10:37:12 -06:00
Mahmood Ali
1b7b70f47e tests: enable and fix tests requiring mock driver 2019-01-10 10:10:11 -05:00