Commit Graph

315 Commits

Author SHA1 Message Date
Seth Hoenig
96b6a8d985 build: update ec2 instance profiles
using tools/ec2info
2022-04-21 11:47:40 -05:00
James Rasell
443cea1806 Merge pull request #12368 from hashicorp/f-1.3-boogie-nights
service discovery: add initial MVP implementation
2022-03-25 18:04:47 +01:00
Hunter Morris
77bab76b3d client: Add AWS EC2 instance-life-cycle from metadata to client fingerprint (#12371) 2022-03-25 11:50:52 -04:00
James Rasell
2d8b370f35 Merge branch 'main' into f-1.3-boogie-nights 2022-03-25 16:40:32 +01:00
Seth Hoenig
5da1a31e94 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
James Rasell
d49cf2388a Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
Seth Hoenig
b242957990 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
James Rasell
8e2c71e76b client: add service discovery feature enabled attribute. 2022-03-14 12:42:01 +01:00
Kevin Schoonover
6633f8d908 fingerprint: remove metadata from digitalocean (#12032) 2022-02-09 07:31:45 -05:00
Tim Gross
79e8d394b4 fingerprint: digitalocean fingerprint test requires metadata header (#12028) 2022-02-08 16:35:13 -05:00
Seth Hoenig
652de761bf env: update aws cpu configs
By running the tools/ec2info tool
2022-02-08 12:44:00 -06:00
Kevin Schoonover
5cea36639d address comments
Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>
2022-02-07 09:03:48 -08:00
Kevin Schoonover
7b6f9540db small fixes 2022-02-05 22:23:43 -08:00
Kevin Schoonover
4d4c839796 add digitalocean fingerprinter 2022-02-05 22:17:36 -08:00
Luiz Aoqui
5e112a87d8 fix host network reserved port fingerprint (#11728) 2021-12-22 15:29:54 -05:00
Mahmood Ali
6c414cd5f9 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Luiz Aoqui
2904a0710a Disable PowerShell profile and simplify fingerprinting link speed on Windows (#11183) 2021-09-22 11:17:47 -04:00
Luiz Aoqui
b4e222f6cc Log network device name during fingerprinting (#11184) 2021-09-16 10:48:31 -04:00
James Rasell
3bffe443ac chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
Seth Hoenig
ec4fa410a7 env/aws: update ec2 cpu data
using tools/ec2info

```
$ go run .
```
2021-07-22 09:32:46 -05:00
Seth Hoenig
4b3ed53511 consul: pr cleanup namespace probe function signatures 2021-06-07 15:41:01 -05:00
Seth Hoenig
0bc8a33084 consul: probe consul namespace feature before using namespace api
This PR changes Nomad's wrapper around the Consul NamespaceAPI so that
it will detect if the Consul Namespaces feature is enabled before making
a request to the Namespaces API. Namespaces are not enabled in Consul OSS,
and require a suitable license to be used with Consul ENT.

Previously Nomad would check for a 404 status code when makeing a request
to the Namespaces API to "detect" if Consul OSS was being used. This does
not work for Consul ENT with Namespaces disabled, which returns a 500.

Now we avoid requesting the namespace API altogether if Consul is detected
to be the OSS sku, or if the Namespaces feature is not licensed. Since
Consul can be upgraded from OSS to ENT, or a new license applied, we cache
the value for 1 minute, refreshing on demand if expired.

Fixes https://github.com/hashicorp/nomad-enterprise/issues/575

Note that the ticket originally describes using attributes from https://github.com/hashicorp/nomad/issues/10688.
This turns out not to be possible due to a chicken-egg situation between
bootstrapping the agent and setting up the consul client. Also fun: the
Consul fingerprinter creates its own Consul client, because there is no
[currently] no way to pass the agent's client through the fingerprint factory.
2021-06-07 12:19:25 -05:00
Seth Hoenig
b3254f618a client/fingerprint/consul: add new attributes to consul fingerprinter
This PR adds new probes for detecting these new Consul related attributes:

Consul namespaces are a Consul enterprise feature that may be disabled depending
on the enterprise license associated with the Consul servers. Having this attribute
available will enable Nomad to properly decide whether to query the Consul Namespace
API.

Consul connect must be explicitly enabled before Connect APIs will work. Currently
Nomad only checks for a minimum Consul version. Having this attribute available will
enable Nomad to properly schedule Connect tasks only on nodes with a Consul agent that
has Connect enabled.

Consul connect requires the grpc port to be explicitly set before Connect APIs will work.
Currently Nomad only checks for a minimal Consul version. Having this attribute available
will enable Nomad to schedule Connect tasks only on nodes with a Consul agent that has
the grpc listener enabled.
2021-06-03 12:49:22 -05:00
Seth Hoenig
0479167252 client/fingerprint/consul: refactor the consul fingerprinter to test individual attributes
This PR refactors the ConsulFingerprint implementation, breaking individual attributes
into individual functions to make testing them easier. This is in preparation for
additional extractors about to be added. Behavior should be otherwise unchanged.

It adds the attribute consul.sku, which can be used to differentiate between Consul
OSS vs Consul ENT.
2021-06-03 12:48:39 -05:00
Seth Hoenig
21885d70c1 aws_env: update ec2 instances
Generate updated list using tools/ec2info
2021-04-22 11:33:51 -06:00
Nick Ethier
f897ac79e8 client/ar: thread through cpuset manager 2021-04-13 13:28:36 -04:00
Nick Ethier
38bc1b1a31 client/fingerprint: move existing cgroup concerns to cgutil 2021-04-13 13:28:36 -04:00
Nick Ethier
03d6eb8205 client: only fingerprint reservable cores via cgroups, allowing manual override for other platforms 2021-04-13 13:28:15 -04:00
Nick Ethier
b8397a712d fingerprint: implement client fingerprinting of reservable cores
on Linux systems this is derived from the configure cpuset cgroup parent (defaults to /nomad)
for non Linux systems and Linux systems where cgroups are not enabled, the client defaults to using all cores
2021-04-13 13:28:15 -04:00
Andrii Chubatiuk
bab0d39d51 support multiple host network aliases for the same interface 2021-04-13 09:33:33 -04:00
Yoan Blanc
a814f0253f chore: bump golangci-lint from v1.24 to v1.39
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2021-04-03 09:50:23 +02:00
Tim Gross
14568b3e00 deps: bump gopsutil to v3.21.2 2021-03-30 16:02:51 -04:00
Florian Apolloner
deb835792c Properly detect unloaded dynamic modules on RHEL derivates. Fixes #9776
The modules.dep file on RHEL includes .xz for compressed kernel modules.
2021-01-12 18:28:00 +01:00
Joel May
2e17610406 Allow client.cpu_total_compute to override attr.cpu.totalcompute 2021-01-07 15:31:11 -05:00
Kris Hicks
85ed8ddd4f Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Seth Hoenig
da1235f35b client/fingerprint/cpu: use fallback total compute value if cpu not detected
Previously, Nomad would fail to startup if the CPU fingerprinter could
not detect the cpu total compute (i.e. cores * mhz). This is common on
some EC2 instance types (graviton class), where the env_aws fingerprinter
will override the detected CPU performance with a more accurate value
anyway.

Instead of crashing on startup, have Nomad use a low default for available
cpu performance of 1000 ticks (e.g. 1 core * 1 GHz). This enables Nomad
to get past the useless cpu fingerprinting on those EC2 instances. The
crashing error message is now a log statement suggesting the setting of
cpu_total_compute in client config.

Fixes #7989
2020-12-09 10:35:58 -06:00
Seth Hoenig
6c8ea087d6 env_aws: run ec2info to update ec2 info
Use `tools/ec2info` to update the generated table of instance types.
`$ go run .`
2020-12-02 09:35:03 -06:00
Roman Vynar
4bbe50bc53 Add compute/zone to Azure fingerprinting 2020-11-26 13:26:51 +02:00
Seth Hoenig
52cef27176 client/fingerprint: detect unloaded dynamic bridge kernel module
In Nomad v0.12.0, the client added additional fingerprinting around the
presense of the bridge kernel module. The fingerprinter only checked in
`/proc/modules` which is a list of loaded modules. In some cases, the
bridge kernel module is builtin rather than dynamically loaded. The fix
for that case is in #8721. However we were still missing the case where
the bridge module is dynamically loaded, but not yet loaded during the
startup of the Nomad agent. In this case the fingerprinter would believe
the bridge module was unavailable when really it gets loaded on demand.

This PR now has the fingerprinter scan the kernel module dependency file,
which will contain an entry for the bridge module even if it is not yet
loaded.

In summary, the client now looks for the bridge kernel module in
 - /proc/modules
 - /lib/modules/<kernel>/modules.builtin
 - /lib/modules/<kernel>/modules.dep

Closes #8423
2020-11-09 13:56:14 -06:00
Seth Hoenig
080b2c4415 env_aws: fixup test case node attr detection 2020-10-08 12:59:07 -05:00
Seth Hoenig
53ab30870b env_aws: get ec2 cpu perf data from AWS API
Previously, Nomad was using a hand-made lookup table for looking
up EC2 CPU performance characteristics (core count + speed = ticks).

This data was incomplete and incorrect depending on region. The AWS
API has the correct data but requires API keys to use (i.e. should not
be queried directly from Nomad).

This change introduces a lookup table generated by a small command line
tool in Nomad's tools module which uses the Amazon AWS API.

Running the tool requires AWS_* environment variables set.
  $ # in nomad/tools/cpuinfo
  $ go run .

Going forward, Nomad can incorporate regeneration of the lookup table
somewhere in the CI pipeline so that we remain up-to-date on the latest
offerings from EC2.

Fixes #7830
2020-10-08 12:01:09 -05:00
Landan Cheruka
e40b9a40b2 fingerprint: changed unique.platform.azure.hostname to unique.platform.azure.name (#9016) 2020-10-02 16:50:12 -04:00
Javier Heredia
5fd9f1b5f5 Add consul segment fingerprint (#7214) 2020-10-02 15:15:59 -04:00
Landan Cheruka
89558ead34 client: added azure fingerprinting support (#8979) 2020-10-01 09:10:27 -04:00
Joel May
767e8909d7 fingerprinting: add AWS MAC and public-ipv6 (#8887) 2020-09-17 09:03:01 -04:00
Mark Lee
893d4d102f refactor lookup code 2020-08-24 12:24:16 +09:00
Mark Lee
08dfc80724 lookup kernel builtin modules too 2020-08-24 11:09:13 +09:00
Shengjing Zhu
274bf2ee1c Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Mahmood Ali
6bf248c790 honor config.NetworkInterface in NodeNetworks 2020-07-21 15:43:45 -04:00
Mahmood Ali
13a4c56c5a tests: avoid os.Exit in a test 2020-07-01 15:25:13 -04:00