Commit Graph

25496 Commits

Author SHA1 Message Date
Marvin Chin
d75293d2ab Add OOM detection for exec driver (#19563)
* Add OomKilled field to executor proto format

* Teach linux executor to detect and report OOMs

* Teach exec driver to propagate OOMKill information

* Fix data race

* use tail /dev/zero to create oom condition

* use new test framework

* minor tweaks to executor test

* add cl entry

* remove type conversion

---------

Co-authored-by: Marvin Chin <marvinchin@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-01-03 09:50:27 -06:00
Tim Gross
f2630add91 acl: remove timestamps from WhoAmI response (#19578)
In Nomad 1.7 we updated our JWT library to go-jose, but this changed the wire
format of the embedded struct we have in the `IdentityClaims` struct that we
return as part of the `WhoAmI` RPC response. This wasn't originally intended to
be sent over the wire but other changes in Nomad 1.5+ added a caller to the
client. The library change causes a deserialization error on Nomad 1.5 and 1.6
clients, which prevents access to Nomad Variables and SD via template blocks.

Removed the incompatible fields from the response, which are unused by any
current caller. In a future version of Nomad, we'll likely remove the `WhoAmI`
callers from the client in lieu of using the public keys the clients have to
check auth.

Fixes: https://github.com/hashicorp/nomad/issues/19555
2024-01-03 08:24:38 -05:00
James Rasell
91cba75f5c copywrite: fix and add copywrite config enterprise comments. (#19590)
Nomad CI checks for copywrite headers using multiple config files
for specific exemption paths. This means the top-level config file
does not take effect when running the copywrite script within
these sub-folders. Exempt files therefore need to be added to the
sub-config files, along with the top level.
2024-01-03 08:58:53 +00:00
Piotr Kazmierczak
a87aa71f55 e2e: fix typo in Consul e2e (#19589) 2024-01-03 09:34:38 +01:00
Tim Gross
e7ca2b51ad vault: ignore allow_unauthenticated config if identity is set (#19585)
When the server's `vault` block has a default identity, we don't check the
user's Vault token (and in fact, we warn them on job submit if they've provided
one). But the validation hook still checks for a token if
`allow_unauthenticated` is set to true. This is a misconfiguration but there's
no reason for Nomad not to do the expected thing here.

Fixes: https://github.com/hashicorp/nomad/issues/19565
2024-01-02 16:46:34 -05:00
Luiz Aoqui
cd8a03431c docs: add scale_in_protection to AWS Autoscaler (#19546)
Document new `scale_in_protection` configuration of the AWS ASG
Autoscaler target plugin.
2024-01-02 14:48:56 -05:00
Luiz Aoqui
0bef6f05a2 docs: add note about * namespace on autoscaling (#19547)
Explain the behaviour when the wildcard namespace value `*` is used to
configure the Nomad Autoscaler agent.
2024-01-02 14:48:20 -05:00
Matt Robenolt
656bb5cafa drivers/executor: set oom_score_adj for raw_exec (#19515)
* drivers/executor: set oom_score_adj for raw_exec

This might not be wholly true since I don't know all configurations of
Nomad, but in our use cases, we run some of our tasks as `raw_exec` for
reasons.

We observed that our tasks were running with `oom_score_adj = -1000`,
which prevents them from being OOM'd. This value is being inherited from
the nomad agent parent process, as configured by systemd.

Similar to #10698, we also were shocked to have this value inherited
down to every child process and believe that we should also set this
value to 0 explicitly.

I have no idea if there are other paths that might leverage this or
other ways that `raw_exec` can manifest, but this is how I was able to
observe and fix in one of our configurations.

We have been running in production our tasks wrapped in a script that
does: `echo 0 > /proc/self/oom_score_adj` to avoid this issue.

* drivers/executor: minor cleanup of setting oom adjustment

* e2e: add test for raw_exec oom adjust score

* e2e: set oom score adjust to -999

* cl: add cl

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2024-01-02 13:35:09 -06:00
Seth Hoenig
c06f804cea build: make copywrite thing happy (#19577) 2024-01-02 13:33:45 -06:00
Luiz Aoqui
7eecca65ec docs: add autoscaler AWS retry_attempts config (#19549)
Document the Nomad Autoscaler AWS target plugin config `retry_attempts`.
2024-01-02 14:08:10 -05:00
Luiz Aoqui
56b1bf3240 docs: add policy_id and target_name metric labels (#19551) 2024-01-02 14:06:37 -05:00
Luiz Aoqui
1694e69b77 docs: clarify the behaviour of lower_bound and upper_bound (#19552) 2024-01-02 14:06:07 -05:00
hc-github-team-es-release-engineering
a4ecc2fbc8 Merge pull request #19283 from hashicorp/RELENG-960-EOY-license-fixes
[DO NOT MERGE UNTIL EOY] update year in LICENSE and copywrite files
2024-01-02 09:38:54 -08:00
Seth Hoenig
23e5ffbfd0 build: bump setup-golang action version to v2 (#19568) 2024-01-02 09:41:50 -06:00
Luiz Aoqui
09731442e4 docs: add node_pool autoscaler node selector (#19548)
Document the `node_pool` node selector configuration.
2024-01-02 10:19:58 -05:00
Piotr Kazmierczak
bb3d2227a2 e2e: add a test for checking default WI Consul workflow for services and tasks (#19500) 2024-01-02 16:02:32 +01:00
James Rasell
76ba3e10e7 docs: add Nomad Autoscaler HA configuration details. (#19010)
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2023-12-27 08:00:07 +00:00
Mike Nomitch
dd15bdff9c Adds vault role to JWT claims if specified in jobspec (#19535) 2023-12-20 15:51:34 -08:00
Piotr Kazmierczak
84115d732d docs: correct Nomad Autoscaler example link in HA vars documentation (#19537) 2023-12-20 16:26:35 +01:00
Phil Renaud
005147f850 [ui] Mask token secret when logged in (#19529)
* Sign-in page now hides token secret by default (toggleable) and updates components to Helios

* General helios-ification

* All the notifications get dismissal buttons

* token-details grid for spacing
2023-12-20 10:04:53 -05:00
Phil Renaud
e26c2e243c [ui] node eligibilty taken into consideration when clients list filtered to "ready" (#18607)
* node eligibilty taken into consideration when clients list filtered to 'ready'

* A working draft of complex positive querying

* tags and filter badge

* CompositeStatus -> Status

* Buttons within a Helios SegmentedGroup

* Convert the other dropdowns to helios on clients index

* A bunch of client index test fixes

* Remaining clients list acceptance tests for State facet modified
2023-12-19 16:40:56 -05:00
Luiz Aoqui
e4e70b086a ci: run linter in ./api package (#19513) 2023-12-19 15:59:47 -05:00
Luiz Aoqui
95766aaa1b docs: add Submission parameter to job update (#19516) 2023-12-19 10:09:16 -05:00
Luiz Aoqui
859606a54a consul: fix parsing of service.cluster field (#19510) 2023-12-19 09:55:41 -05:00
dependabot[bot]
b2f640346d build(deps): bump golang.org/x/crypto from 0.14.0 to 0.17.0 (#19514) 2023-12-19 11:17:48 +00:00
Etienne Bruines
f18d5c7c32 docs: fix migration to workload identity links (#19508)
Fixes #19507
2023-12-18 21:27:38 -05:00
Luiz Aoqui
dfce76e511 ui: fix AllocationRow for job without action (#19505)
The allocation table header sometimes conditionally renders the
`Actions` table column, but the allocation row would render it
unconditionally, resulting in broken tables when rendering allocations
for jobs without actions, where rows had more columns than the header.

Also fix the conditional class for the deployments allocation table to
read `length` from the right value.
2023-12-18 11:30:20 -05:00
Phil Renaud
7a87049eab Merge pull request #18823 from Sanskar531/ui-logs-disabled-message
UI: Show message for when log collection is disabled
2023-12-18 09:20:51 -05:00
Sanskar Gauchan
e0e8357661 Merge branch 'hashicorp:main' into ui-logs-disabled-message 2023-12-16 10:49:26 +11:00
Sanskar Gauchan
2df79becaf Merge pull request #1 from philrenaud/ui-logs-disabled-message
Changelog added
2023-12-16 09:07:01 +11:00
Phil Renaud
a6e164673e Adds a copy button to Action output (#19496) 2023-12-15 15:57:31 -05:00
Tim Gross
14200a800f docs: note replacement of - characters in meta env vars (#19501)
The keys of `meta` fields have all characters outside of `[A-Za-z0-9_.]`
replaced by underscores when we create `NOMAD_META` environment variables. Make
sure this replacement is documented.

Fixes: https://github.com/hashicorp/nomad/issues/15359
2023-12-15 15:48:23 -05:00
Mike Nomitch
e39b39e656 Surfaces errors from namespace delete properly (#19483)
Surfaces errors from namespace delete properly
2023-12-15 08:25:55 -08:00
David Ventura
fb43b14fb0 Mark CGroups as off when missing essential controllers (#19176) 2023-12-15 11:20:52 -05:00
Piotr Kazmierczak
f1fb51422b client: consul hook not called for templates (#19490)
Due to some refactoring mishap, task-level Consul hook was never triggered and
thus never wrote any secrets in task secret dirs.
2023-12-15 17:16:00 +01:00
Phil Renaud
01372c17ec Changelog added 2023-12-15 09:54:10 -05:00
Tim Gross
2e33115c15 consul: fingerprint Consul Enterprise admin partitions (#19485)
Consul Enterprise agents all belong to an admin partition. Fingerprint this
attribute when available. When a Consul agent is not explicitly configured with
"default" it is in the default partition but will not report this in its
`/v1/agent/self` endpoint. Fallback to "default" when missing only for Consul
Enterprise.

This feature provides users the ability to add constraints for jobs to land on
Nomad nodes that have a Consul in that partition. Or it can allow cluster
administrators to pair Consul partitions 1:1 with Nomad node pools. We'll also
have the option to implement a future `partition` field in the jobspec's
`consul` block to create an implicit constraint.

Ref: https://github.com/hashicorp/nomad/issues/13139#issuecomment-1856479581
2023-12-15 09:26:25 -05:00
Luiz Aoqui
a8d1447550 docs: update Consul and Vault integration (#19424) 2023-12-14 15:14:55 -05:00
Mike Nomitch
31f4296826 Adds support for failures before warning to Consul service checks (#19336)
Adds support for failures before warning and failures before critical
to the automatically created Nomad client and server services in Consul
2023-12-14 11:33:31 -08:00
Tim Gross
0e42569ffb docs: note that 1.7.2 yanks 1.7.0-1.7.1 due to CPU fingeprint bug (#19474) 2023-12-14 11:32:13 -05:00
Mitch Pronschinske
a0fc269e8f docs: update auth-methods API docs to comply with style guide (#19435)
Lower cased the title and headings in line with our company-wide style since this is being linked in an upcoming blog I was editing.  I also lowercased words such as "Auth Method" and other primitives/components when mentioned in prose - this is in line with our style guide as well where we don't capitalize auth method and we only capitalize components that are SKU/product-like in their separateness/importance.

https://docs.google.com/document/d/1MRvGd6tS5JkIwl_GssbyExkMJqOXKeUE00kSEtFi8m8/edit

Adam Trujilo should be in agreement with changes like this based on our past discussions, but feel free to bring in stake holders if you're not sure about accepting and we can discuss.
2023-12-14 11:28:42 -05:00
James Rasell
94b8b7769a docs: add reporting config block documentation. (#19470) 2023-12-14 15:11:29 +00:00
Piotr Kazmierczak
b2357e7cf0 Merge pull request #19469 from hashicorp/post-1.7.2-release
Post 1.7.2 release
2023-12-14 13:05:23 +01:00
Piotr Kazmierczak
250440a8bb Merge release 1.7.2 files 2023-12-14 11:25:59 +01:00
hc-github-team-nomad-core
aaaa69a2b9 Prepare for next release 2023-12-14 11:23:56 +01:00
hc-github-team-nomad-core
b777013ff9 Generate files for 1.7.2 release 2023-12-14 11:23:55 +01:00
Phil Renaud
49b4996d46 overflow-anchor to hold view to bottom of scrollable action output (#19452) 2023-12-13 15:19:22 -05:00
Grant Griffiths
9b2e8ae20f CSI: prevent stage_publish_base_dir from being subdir of mount_dir (#19441) 2023-12-13 14:31:40 -05:00
Seth Hoenig
7e43317e37 core: account for linux systems with no reservable cores (#19458)
* core: account for linux systems with no reservable cores

* cl: add cl

* core: remove condition on reservable cores for legacy empty check
2023-12-13 13:06:45 -06:00
Seth Hoenig
6e4d57b330 numalib: provide a fallback for topology scanning on linux (#19457)
* numalib: provide a fallback for topology scanning on linux

* numalib: better package var names

* cl: add cl

* lint: fix my sloppy code

* cl: fixup wording
2023-12-13 13:06:30 -06:00