* drivers/executor: set oom_score_adj for raw_exec
This might not be wholly true since I don't know all configurations of
Nomad, but in our use cases, we run some of our tasks as `raw_exec` for
reasons.
We observed that our tasks were running with `oom_score_adj = -1000`,
which prevents them from being OOM'd. This value is being inherited from
the nomad agent parent process, as configured by systemd.
Similar to #10698, we also were shocked to have this value inherited
down to every child process and believe that we should also set this
value to 0 explicitly.
I have no idea if there are other paths that might leverage this or
other ways that `raw_exec` can manifest, but this is how I was able to
observe and fix in one of our configurations.
We have been running in production our tasks wrapped in a script that
does: `echo 0 > /proc/self/oom_score_adj` to avoid this issue.
* drivers/executor: minor cleanup of setting oom adjustment
* e2e: add test for raw_exec oom adjust score
* e2e: set oom score adjust to -999
* cl: add cl
---------
Co-authored-by: Seth Hoenig <shoenig@duck.com>
* Sign-in page now hides token secret by default (toggleable) and updates components to Helios
* General helios-ification
* All the notifications get dismissal buttons
* token-details grid for spacing
* node eligibilty taken into consideration when clients list filtered to 'ready'
* A working draft of complex positive querying
* tags and filter badge
* CompositeStatus -> Status
* Buttons within a Helios SegmentedGroup
* Convert the other dropdowns to helios on clients index
* A bunch of client index test fixes
* Remaining clients list acceptance tests for State facet modified
The allocation table header sometimes conditionally renders the
`Actions` table column, but the allocation row would render it
unconditionally, resulting in broken tables when rendering allocations
for jobs without actions, where rows had more columns than the header.
Also fix the conditional class for the deployments allocation table to
read `length` from the right value.
The keys of `meta` fields have all characters outside of `[A-Za-z0-9_.]`
replaced by underscores when we create `NOMAD_META` environment variables. Make
sure this replacement is documented.
Fixes: https://github.com/hashicorp/nomad/issues/15359
Consul Enterprise agents all belong to an admin partition. Fingerprint this
attribute when available. When a Consul agent is not explicitly configured with
"default" it is in the default partition but will not report this in its
`/v1/agent/self` endpoint. Fallback to "default" when missing only for Consul
Enterprise.
This feature provides users the ability to add constraints for jobs to land on
Nomad nodes that have a Consul in that partition. Or it can allow cluster
administrators to pair Consul partitions 1:1 with Nomad node pools. We'll also
have the option to implement a future `partition` field in the jobspec's
`consul` block to create an implicit constraint.
Ref: https://github.com/hashicorp/nomad/issues/13139#issuecomment-1856479581
Lower cased the title and headings in line with our company-wide style since this is being linked in an upcoming blog I was editing. I also lowercased words such as "Auth Method" and other primitives/components when mentioned in prose - this is in line with our style guide as well where we don't capitalize auth method and we only capitalize components that are SKU/product-like in their separateness/importance.
https://docs.google.com/document/d/1MRvGd6tS5JkIwl_GssbyExkMJqOXKeUE00kSEtFi8m8/edit
Adam Trujilo should be in agreement with changes like this based on our past discussions, but feel free to bring in stake holders if you're not sure about accepting and we can discuss.
* numalib: provide a fallback for topology scanning on linux
* numalib: better package var names
* cl: add cl
* lint: fix my sloppy code
* cl: fixup wording
Unsupported environments like containers or guests OSs inside LXD can
incorrectly number of available cores thus leading to numalib having trouble
detecting cores and panicking. This code adds tests for linux sysfs detection
methods and fixes the panic.
It is often expected that a task that needs access to Vault defines a
`vault` block to specify the Vault policy to use to derive a token.
But in some scenarios, like when the Nomad client is connected to a
local Vault agent that is responsible for authn/authz, the task is not
required to defined a `vault` block.
In these situations, the `default` Vault cluster should be used to
render the template.
The Connect-related constraints injected by the Connect job mutating hook do not
account for non-default `consul` blocks (for Nomad Enterprise). This works when
both the default and non-default clusters are available and are the same
version, but not when they do not.
Fixes: https://github.com/hashicorp/nomad/issues/19442