Commit Graph

290 Commits

Author SHA1 Message Date
Nick Ethier
f897ac79e8 client/ar: thread through cpuset manager 2021-04-13 13:28:36 -04:00
Nick Ethier
38bc1b1a31 client/fingerprint: move existing cgroup concerns to cgutil 2021-04-13 13:28:36 -04:00
Nick Ethier
03d6eb8205 client: only fingerprint reservable cores via cgroups, allowing manual override for other platforms 2021-04-13 13:28:15 -04:00
Nick Ethier
b8397a712d fingerprint: implement client fingerprinting of reservable cores
on Linux systems this is derived from the configure cpuset cgroup parent (defaults to /nomad)
for non Linux systems and Linux systems where cgroups are not enabled, the client defaults to using all cores
2021-04-13 13:28:15 -04:00
Andrii Chubatiuk
bab0d39d51 support multiple host network aliases for the same interface 2021-04-13 09:33:33 -04:00
Yoan Blanc
a814f0253f chore: bump golangci-lint from v1.24 to v1.39
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2021-04-03 09:50:23 +02:00
Tim Gross
14568b3e00 deps: bump gopsutil to v3.21.2 2021-03-30 16:02:51 -04:00
Florian Apolloner
deb835792c Properly detect unloaded dynamic modules on RHEL derivates. Fixes #9776
The modules.dep file on RHEL includes .xz for compressed kernel modules.
2021-01-12 18:28:00 +01:00
Joel May
2e17610406 Allow client.cpu_total_compute to override attr.cpu.totalcompute 2021-01-07 15:31:11 -05:00
Kris Hicks
85ed8ddd4f Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Seth Hoenig
da1235f35b client/fingerprint/cpu: use fallback total compute value if cpu not detected
Previously, Nomad would fail to startup if the CPU fingerprinter could
not detect the cpu total compute (i.e. cores * mhz). This is common on
some EC2 instance types (graviton class), where the env_aws fingerprinter
will override the detected CPU performance with a more accurate value
anyway.

Instead of crashing on startup, have Nomad use a low default for available
cpu performance of 1000 ticks (e.g. 1 core * 1 GHz). This enables Nomad
to get past the useless cpu fingerprinting on those EC2 instances. The
crashing error message is now a log statement suggesting the setting of
cpu_total_compute in client config.

Fixes #7989
2020-12-09 10:35:58 -06:00
Seth Hoenig
6c8ea087d6 env_aws: run ec2info to update ec2 info
Use `tools/ec2info` to update the generated table of instance types.
`$ go run .`
2020-12-02 09:35:03 -06:00
Roman Vynar
4bbe50bc53 Add compute/zone to Azure fingerprinting 2020-11-26 13:26:51 +02:00
Seth Hoenig
52cef27176 client/fingerprint: detect unloaded dynamic bridge kernel module
In Nomad v0.12.0, the client added additional fingerprinting around the
presense of the bridge kernel module. The fingerprinter only checked in
`/proc/modules` which is a list of loaded modules. In some cases, the
bridge kernel module is builtin rather than dynamically loaded. The fix
for that case is in #8721. However we were still missing the case where
the bridge module is dynamically loaded, but not yet loaded during the
startup of the Nomad agent. In this case the fingerprinter would believe
the bridge module was unavailable when really it gets loaded on demand.

This PR now has the fingerprinter scan the kernel module dependency file,
which will contain an entry for the bridge module even if it is not yet
loaded.

In summary, the client now looks for the bridge kernel module in
 - /proc/modules
 - /lib/modules/<kernel>/modules.builtin
 - /lib/modules/<kernel>/modules.dep

Closes #8423
2020-11-09 13:56:14 -06:00
Seth Hoenig
080b2c4415 env_aws: fixup test case node attr detection 2020-10-08 12:59:07 -05:00
Seth Hoenig
53ab30870b env_aws: get ec2 cpu perf data from AWS API
Previously, Nomad was using a hand-made lookup table for looking
up EC2 CPU performance characteristics (core count + speed = ticks).

This data was incomplete and incorrect depending on region. The AWS
API has the correct data but requires API keys to use (i.e. should not
be queried directly from Nomad).

This change introduces a lookup table generated by a small command line
tool in Nomad's tools module which uses the Amazon AWS API.

Running the tool requires AWS_* environment variables set.
  $ # in nomad/tools/cpuinfo
  $ go run .

Going forward, Nomad can incorporate regeneration of the lookup table
somewhere in the CI pipeline so that we remain up-to-date on the latest
offerings from EC2.

Fixes #7830
2020-10-08 12:01:09 -05:00
Landan Cheruka
e40b9a40b2 fingerprint: changed unique.platform.azure.hostname to unique.platform.azure.name (#9016) 2020-10-02 16:50:12 -04:00
Javier Heredia
5fd9f1b5f5 Add consul segment fingerprint (#7214) 2020-10-02 15:15:59 -04:00
Landan Cheruka
89558ead34 client: added azure fingerprinting support (#8979) 2020-10-01 09:10:27 -04:00
Joel May
767e8909d7 fingerprinting: add AWS MAC and public-ipv6 (#8887) 2020-09-17 09:03:01 -04:00
Mark Lee
893d4d102f refactor lookup code 2020-08-24 12:24:16 +09:00
Mark Lee
08dfc80724 lookup kernel builtin modules too 2020-08-24 11:09:13 +09:00
Shengjing Zhu
274bf2ee1c Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Mahmood Ali
6bf248c790 honor config.NetworkInterface in NodeNetworks 2020-07-21 15:43:45 -04:00
Mahmood Ali
13a4c56c5a tests: avoid os.Exit in a test 2020-07-01 15:25:13 -04:00
Nick Ethier
ad8ced3873 multi-interface network support 2020-06-19 09:42:10 -04:00
Nick Ethier
33ce12cda9 CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Seth Hoenig
a869394a03 env_aws: combine 3 log lines into 1 2020-04-29 10:47:36 -06:00
Seth Hoenig
0d5d1781d3 env_aws: downgrade log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:34:26 -06:00
Seth Hoenig
f47c57fa2d env_aws: fixup log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:33:53 -06:00
Seth Hoenig
9230fa9eff env_aws: use best-effort lookup table for CPU performance in EC2
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
2020-04-28 19:01:33 -06:00
Mahmood Ali
6199e96972 fixup! tests: Add tests for EC2 Metadata immitation cases 2020-03-26 11:37:54 -04:00
Mahmood Ali
0c1dd0e75b fixup! tests: Add tests for EC2 Metadata immitation cases 2020-03-26 11:33:44 -04:00
Mahmood Ali
e37f7af811 fingerprint: handle incomplete AWS immitation APIs
Fix a regression where we accidentally started treating non-AWS
environments as AWS environments, resulting in bad networking settings.

Two factors some at play:

First, in [1], we accidentally switched the ultimate AWS test from
checking `ami-id` to `instance-id`.  This means that nomad started
treating more environments as AWS; e.g. Hetzner implements `instance-id`
but not `ami-id`.

Second, some of these environments return empty values instead of
errors!  Hetzner returns empty 200 response for `local-ipv4`, resulting
into bad networking configuration.

This change fix the situation by restoring the check to `ami-id` and
ensuring that we only set network configuration when the ip address is
not-empty.  Also, be more defensive around response whitespace input.

[1] https://github.com/hashicorp/nomad/pull/6779
2020-03-26 11:23:15 -04:00
Mahmood Ali
500c3c2d87 tests: Add tests for EC2 Metadata immitation cases
Test that nomad doesn't set empty/bad network configuration when in an
environment that does incomplete immitation of EC2 Metadata API.
2020-03-26 11:13:21 -04:00
Danielle
59eb882197 Update client/fingerprint/env_aws.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-12-16 14:48:52 +01:00
Danielle Lancashire
544807ba79 env_aws: Disable Retries and set Session cfg
Previously, Nomad used hand rolled HTTP requests to interact with the
EC2 metadata API. Recently however, we switched to using the AWS SDK for
this fingerprinting.

The default behaviour of the AWS SDK is to perform retries with
exponential backoff when a request fails. This is problematic for Nomad,
because interacting with the EC2 API is in our client start path.

Here we revert to our pre-existing behaviour of not performing retries
in the fast path, as if the metadata service is unavailable, it's likely
that nomad is not running in AWS.
2019-12-16 10:56:32 +01:00
Mahmood Ali
b55bc6443e fingerprint code refactor
Some code cleanup:

* Use a field for setting EC2 metadata instead of env-vars in testing;
but keep environment variables for backward compatibility reasons

* Update tests to use testify
2019-11-26 10:51:28 -05:00
Mahmood Ali
f24dd5bec9 fingerprint: avoid api query if config overrides it 2019-11-26 10:51:28 -05:00
Mahmood Ali
34b9ee5433 fingerprint: use ec2metadata package 2019-11-26 10:51:27 -05:00
Mahmood Ali
816f8fbb20 Revert "lint: ignore generated windows syscall wrappers"
This reverts commit 482862e6ab.
2019-10-22 08:23:44 -04:00
Michael Schurter
aed2691ca7 Remove 0.10.0-rc1 generated files 2019-10-10 13:31:42 -07:00
Mahmood Ali
97448cf33d Merge pull request #6260 from hashicorp/c-circleci-tweak-20190903
ci: ignore nested pkgs in GOTEST_PKGS_EXCLUDE
2019-09-11 11:17:10 -07:00
Danielle
b56b9c60a2 fingerprint: Add backwards compatibility comment
Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
2019-09-04 17:38:35 +02:00
Danielle Lancashire
e9a8e9ea0a fingerprint: Restore support for legacy memory fingerprint 2019-09-04 17:10:28 +02:00
Mahmood Ali
482862e6ab lint: ignore generated windows syscall wrappers 2019-09-03 10:59:58 -04:00
Danielle Lancashire
c20e604d2d client: Fix memory fingerprinting on 32bit
Also introduce regression ci for 32 bit fingerprinting
2019-08-31 18:33:59 +02:00
Lang Martin
33f550fb52 Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config
client fingerprinter doesn't overwrite manual configuration
2019-04-26 12:55:34 -04:00
Lang Martin
583ae3722c client fingerprinter doesn't overwrite manual configuration
Revert "Revert accidental merge of pr #5482"
This reverts commit c45652ab8c.
2019-04-19 15:23:48 -04:00
Yorick Gersie
77a8fda87c fix nil pointer in fingerprinting AWS env leading to crash
HTTP Client returns a nil response if an error has occured. We first
  need to check for an error before being able to check the HTTP response
  code.
2019-04-19 11:07:13 +02:00