The pprof `allocs` profile is identical to the `heap` profile, just with a
different default view. Collecting only one of the two is sufficient to view all
of `alloc_objects`, `alloc_space`, `inuse_objects`, and `inuse_space`, and
collecting only one means that both views will be of the same profile.
Also improve the docstrings on the goroutine profiles explaining what's in each
so that it's clear why we might want all of debug=0, debug=1, and debug=2.
This change adds configuration options for setting the in-memory
telemetry sink collection and retention durations. This sink backs
the metrics JSON API and previously had hard-coded default values.
The new options are particularly useful when running development or
debug environments, where metrics collection is desired at a fast
and granular rate.
The `nomad operator debug` command saves a CPU profile for each interval, and
names these files based on the interval.
The same functions takes a goroutine profile, heap profile, etc. but is missing
the logic to interpolate the file name with the interval. This results in the
operator debug command making potentially many expensive profile requests, and
then overwriting the data. Update the command to save every profile it scrapes,
and number them similarly to the existing CPU profile.
Additionally, the command flags for `-pprof-interval` and `-pprof-duration` were
validated backwards, which meant that we always coerced the `-pprof-interval` to
be the same as the `-pprof-duration`, which always resulted in a single profile
being taken at the start of the bundle. Correct the check as well as change the
defaults to be more sensible.
Fixes: https://github.com/hashicorp/nomad/issues/20151
Add support for further configuring `gateway.ingress.service` blocks to bring
this block up-to-date with currently available Consul API fields (except for
namespace and admin partition, which will need be handled under a different
PR). These fields are sent to Consul as part of the job endpoint submission hook
for Connect gateways.
Co-authored-by: Horacio Monsalvo <horacio.monsalvo@southworks.com>
Replaces #18812
Upgraded with:
```
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/go-msgpack\/codec"/"github.com\/hashicorp\/go-msgpack\/v2\/codec/" '{}' ';'
find . -name '*.go' -exec sed -i s/"github.com\/hashicorp\/net-rpc-msgpackrpc"/"github.com\/hashicorp\/net-rpc-msgpackrpc\/v2/" '{}' ';'
go get
go get -v -u github.com/hashicorp/raft-boltdb/v2
go get -v github.com/hashicorp/serf@5d32001edfaa18d1c010af65db707cdb38141e80
```
see https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0
for details
When loading the client configuration, the user-specified `client.template`
block was not properly merged with the default values. As a result, if the user
set any `client.template` field, all the other field defaulted to their zero
values instead of the documented defaults.
This changeset:
* Adds the missing `Merge` method for the client template config and ensures
it's called.
* Makes a single source of truth for the default template configuration,
instead of two different constructors.
* Extends the tests to cover the merge of a partial block better.
Fixes: https://github.com/hashicorp/nomad/issues/20164
Add information about autopilot health to the `/operator/autopilot/health` API
in Nomad Enterprise.
I've pulled the CE changes required for this feature out of @lindleywhite's PR
in the Enterprise repo. A separate PR will include a new `operator autopilot
health` command that can present this information at the command line.
Ref: https://github.com/hashicorp/nomad-enterprise/pull/1394
Co-authored-by: Lindley <lindley@hashicorp.com>
* exec2: add client support for unveil filesystem isolation mode
This PR adds support for a new filesystem isolation mode, "Unveil". The
mode introduces a "alloc_mounts" directory where tasks have user-owned
directory structure which are bind mounts into the real alloc directory
structure. This enables a task driver to use landlock (and maybe the
real unveil on openbsd one day) to isolate a task to the task owned
directory structure, providing sandboxing.
* actually create alloc-mounts-dir directory
* fix doc strings about alloc mount dir paths
* exec: add a client.users configuration block
For now just add min/max dynamic user values; soon we can also absorb
the "user.denylist" and "user.checked_drivers" options from the
deprecated client.options map.
* give the no-op pool implementation a better name
* use explicit error types to make referencing them cleaner in tests
* use import alias to not shadow package name
stopAlloc() checks if an allocation represents a system job like this:
```
if *alloc.Job.Type == api.JobTypeSystem {
...
}
```
This caused the cli to crash:
```
==> 2024-02-29T08:45:53+01:00: Restarting 2 allocations
2024-02-29T08:45:54+01:00: Rescheduling allocation "6a9da11a" for group "redacted-group"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x20 pc=0x10686affc]
goroutine 36 [running]:
github.com/hashicorp/nomad/command.(*JobRestartCommand).stopAlloc(0x14000b11040, {0x14000996dc0?, 0x0?})
github.com/hashicorp/nomad/command/job_restart.go:968 +0x25c
github.com/hashicorp/nomad/command.(*JobRestartCommand).handleAlloc(0x14000b11040, {0x14000996dc0?, 0x0?})
github.com/hashicorp/nomad/command/job_restart.go:868 +0x34
github.com/hashicorp/nomad/command.(*JobRestartCommand).Run.(*JobRestartCommand).Run.func1.func2()
github.com/hashicorp/nomad/command/job_restart.go:392 +0x28
github.com/hashicorp/go-multierror.(*Group).Go.func1()
github.com/hashicorp/go-multierror@v1.1.1/group.go:23 +0x60
created by github.com/hashicorp/go-multierror.(*Group).Go in goroutine 1
github.com/hashicorp/go-multierror@v1.1.1/group.go:20 +0x84
```
Attaching a debugger revealed that `alloc.Job` was set, but
`alloc.Job.Type` was nil. After guarding the `.Type` check with a
`alloc.Job.Type != nil`, it still crashed. This time, `alloc.Job` was
nil.
I was scrambling to get the job running again, so I didn't have the
opportunity to find out why those values were nil, but this change
ensures the CLI does not crash in these situations.
Fixes#20048
* tests: swap testify for test in plugins/csi/client_test.go
* tests: swap testify for test in testutil/
* tests: swap testify for test in host_test.go
* tests: swap testify for test in plugin_test.go
* tests: swap testify for test in utils_test.go
* tests: swap testify for test in scheduler/
* tests: swap testify for test in parse_test.go
* tests: swap testify for test in attribute_test.go
* tests: swap testify for test in plugins/drivers/
* tests: swap testify for test in command/
* tests: fixup some test usages
* go: run go mod tidy
* windows: cpuset test only on linux
This PR is the first on two that will implement the new Disconnect block. In this PR the new block is introduced to be backwards compatible with the fields it will replace. For more information refer to this RFC and this ticket.
In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:
> msgpack decode error [pos 0]: io: read/write on closed pipe"
they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.
This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.
This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
error messages.
* Move all writes to the websocket into the same response streaming
goroutine. The main handler goroutine will block on a results channel, and the
response streaming goroutine will send on that channel with the final error when
it's done so it can be reported to the user.
On Windows, Nomad uses `syscall.NewLazyDLL` and `syscall.LoadDLL` functions to
load a few system DLL files, which does not prevent DLL hijacking
attacks. Hypothetically a local attacker on the client host that can place an
abusive library in a specific location could use this to escalate privileges to
the Nomad process. Although this attack does not fall within the Nomad security
model, it doesn't hurt to follow good practices here.
We can remove two of these DLL loads by using wrapper functions provided by the
stdlib in `x/sys/windows`
Co-authored-by: dduzgun-security <deniz.duzgun@hashicorp.com>
The current implementation of the `nomad tls ca create` command
ovierrides the value of the `-domain` flag with `"nomad"` if no
additional customization is provided.
This results in a certificate for the wrong domain or an error if the
`-name-constraint` flag is also used.
THe logic for `IsCustom()` also seemed reversed. If all custom fields
are empty then the certificate is _not_ customized, so `IsCustom()`
should return false.
When a job eval is blocked due to missing capacity, the `nomad job run`
command will monitor the deployment, which may succeed once additional
capacity is made available.
But the current implementation would return `2` even when the deployment
succeeded because it only took the first eval status into account.
This commit updates the eval monitoring logic to reset the scheduling
error state if the deployment eventually succeeds.
Add new configuration option on task's volume_mounts, to give a fine grained control over SELinux "z" label
* Update website/content/docs/job-specification/volume_mount.mdx
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
* fix: typo
* func: make volume mount verification happen even on mounts with no volume
---------
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
The new `nomad setup vault -check` commmand can be used to retrieve
information about the changes required before a cluster is migrated from
the deprecated legacy authentication flow with Vault to use only
workload identities.
Some users with batch workloads or short-lived prestart tasks want to derive a
Vaul token, use it, and then allow it to expire without requiring a constant
refresh. Add the `vault.allow_token_expiration` field, which works only with the
Workload Identity workflow and not the legacy workflow.
When set to true, this disables the client's renewal loop in the
`vault_hook`. When Vault revokes the token lease, the token will no longer be
valid. The client will also now automatically detect if the Vault auth
configuration does not allow renewals and will disable the renewal loop
automatically.
Note this should only be used when a secret is requested from Vault once at the
start of a task or in a short-lived prestart task. Long-running tasks should
never set `allow_token_expiration=true` if they obtain Vault secrets via
`template` blocks, as the Vault token will expire and the template runner will
continue to make failing requests to Vault until the `vault_retry` attempts are
exhausted.
Fixes: https://github.com/hashicorp/nomad/issues/8690
Add support for Consul Enterprise admin partitions. We added fingerprinting in
https://github.com/hashicorp/nomad/pull/19485. This PR adds a `consul.partition`
field. The expectation is that most users will create a mapping of Nomad node
pool to Consul admin partition. But we'll also create an implicit constraint for
the fingerprinted value.
Fixes: https://github.com/hashicorp/nomad/issues/13139
Add new optional `OIDCDisableUserInfo` setting for OIDC auth provider which
disables a request to the identity provider to get OIDC UserInfo.
This option is helpful when your identity provider doesn't send any additional
claims from the UserInfo endpoint, such as Microsoft AD FS OIDC Provider:
> The AD FS UserInfo endpoint always returns the subject claim as specified in the
> OpenID standards. AD FS doesn't support additional claims requested via the
> UserInfo endpoint
Fixes#19318
The `defaultVault` variable is a pointer to the Vault configuration
named `default`. Initially, this variable points to the Vault
configuration that is used to load CLI flag values, but after those are
merged with the default and config file values the pointer reference
must be updated before mutating the config with environment variable
values.
The `-dev-consul` and `-dev-vault` flags add default identities and
configuration to the Nomad agent to connect and use the workload
identity integration with Consul and Vault.
When a Connect service is registered with Consul, Nomad includes the nested
`Connect.SidecarService` field that includes health checks for the Envoy
proxy. Because these are not part of the job spec, the alloc health tracker
created by `health_hook` doesn't know to read the value of these checks.
In many circumstances this won't be noticed, but if the Envoy health check
happens to take longer than the `update.min_healthy_time` (perhaps because it's
been set low), it's possible for a deployment to progress too early such that
there will briefly be no healthy instances of the service available in Consul.
Update the Consul service client to find the nested sidecar service in the
service catalog and attach it to the results provided to the tracker. The
tracker can then check the sidecar health checks.
Fixes: https://github.com/hashicorp/nomad/issues/19269
This commit introduces the parameter preventRescheduleOnLost which indicates that the task group can't afford to have multiple instances running at the same time. In the case of a node going down, its allocations will be registered as unknown but no replacements will be rescheduled. If the lost node comes back up, the allocs will reconnect and continue to run.
In case of max_client_disconnect also being enabled, if there is a reschedule policy, an error will be returned.
Implements issue #10366
Co-authored-by: Dom Lavery <dom@circleci.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Update the `nomad setup consul` command to include a `Selector` for the
`NamespaceRule` so the logic is only applied when the token has a claim
for `consul_namespace`.
Jobs without an explicit `consul.namespace` value receive a JWT without
the `consul_namespace` claim because Nomad is unable to determine which
Consul namespace should be used.
By using `NamespaceRules`, cluster operators are able to set a default
value for these jobs.
* API command and jobspec docs
* PR comments addressed
* API docs for job/jobid/action socket
* Removing a perhaps incorrect origin of job_id across the jobs api doc
* PR comments addressed