nomad

mirror of https://github.com/kemko/nomad.git synced 2026-01-01 16:05:42 +03:00

Author	SHA1	Message	Date
James Rasell	facc3e8013	agent: allow configuration of in-memory telemetry sink. (#20166 ) This change adds configuration options for setting the in-memory telemetry sink collection and retention durations. This sink backs the metrics JSON API and previously had hard-coded default values. The new options are particularly useful when running development or debug environments, where metrics collection is desired at a fast and granular rate.	2024-03-25 15:00:18 +00:00
Seth Hoenig	05937ab75b	exec2: add client support for unveil filesystem isolation mode (#20115 ) * exec2: add client support for unveil filesystem isolation mode This PR adds support for a new filesystem isolation mode, "Unveil". The mode introduces a "alloc_mounts" directory where tasks have user-owned directory structure which are bind mounts into the real alloc directory structure. This enables a task driver to use landlock (and maybe the real unveil on openbsd one day) to isolate a task to the task owned directory structure, providing sandboxing. * actually create alloc-mounts-dir directory * fix doc strings about alloc mount dir paths	2024-03-13 08:24:17 -05:00
James Rasell	41555b6370	cli: Fix minor help formatting issue in agent command. (#19743 )	2024-01-17 12:18:00 +00:00
Mike Nomitch	31f4296826	Adds support for failures before warning to Consul service checks (#19336 ) Adds support for failures before warning and failures before critical to the automatically created Nomad client and server services in Consul	2023-12-14 11:33:31 -08:00
Luiz Aoqui	099ee06a60	Revert "deps: update go-metrics to v0.5.3 (#19190 )" (#19374 ) * Revert "deps: update go-metrics to v0.5.3 (#19190)" This reverts commit `ddb060d8b3`. * changelog: add entry for #19374	2023-12-08 08:46:55 -05:00
Luiz Aoqui	c624dc2121	config: fix loading Vault token from env var (#19349 ) The `defaultVault` variable is a pointer to the Vault configuration named `default`. Initially, this variable points to the Vault configuration that is used to load CLI flag values, but after those are merged with the default and config file values the pointer reference must be updated before mutating the config with environment variable values.	2023-12-07 11:56:53 -05:00
Luiz Aoqui	27d2ad1baf	cli: add `-dev-consul` and `-dev-vault` agent mode (#19327 ) The `-dev-consul` and `-dev-vault` flags add default identities and configuration to the Nomad agent to connect and use the workload identity integration with Consul and Vault.	2023-12-07 11:51:20 -05:00
Luiz Aoqui	ddb060d8b3	deps: update go-metrics to v0.5.3 (#19190 ) Update `go-metrics` to v0.5.3 to pick https://github.com/hashicorp/go-metrics/pull/146.	2023-11-28 12:37:57 -05:00
Adriano Caloiaro	f66eb83fc0	Add `go-netaddrs` support to `retry_join` (#18745 )	2023-11-15 10:07:18 -05:00
Tim Gross	9d075c44b2	config: remove old Vault/Consul config blocks from parser (#18997 ) Remove the now-unused original configuration blocks for Consul and Vault from the agent configuration parsing. When the agent needs to refer to a Consul or Vault block it will always be for a specific cluster for the task/service (or the default cluster for the agent's own use). This is third of three changesets for this work. Fixes: https://github.com/hashicorp/nomad/issues/18947 Ref: https://github.com/hashicorp/nomad/pull/18991 Ref: https://github.com/hashicorp/nomad/pull/18994	2023-11-08 09:30:08 -05:00
Tim Gross	8f8265fa6d	add deprecation warning for Vault/Consul token usage (#18863 ) Submitting a Consul or Vault token with a job is deprecated in Nomad 1.7 and intended for removal in Nomad 1.9. Add a deprecation warning to the CLI when the user passes in the appropriate flag or environment variable. Nomad agents will no longer need a Vault token when configured with workload identity, and we'll ignore Vault tokens in the agent config after Nomad 1.9. Log a warning at agent startup. Ref: https://github.com/hashicorp/nomad/issues/15617 Ref: https://github.com/hashicorp/nomad/issues/15618	2023-10-26 10:46:02 -04:00
Michael Schurter	a806363f6d	OpenID Configuration Discovery Endpoint (#18691 ) Added the [OIDC Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) `/.well-known/openid-configuration` endpoint to Nomad, but it is only enabled if the `server.oidc_issuer` parameter is set. Documented the parameter, but without a tutorial trying to actually _use_ this will be very hard. I intentionally did not use https://github.com/hashicorp/cap for the OIDC configuration struct because it's built to be a compliant OIDC provider. Nomad is not trying to be compliant initially because compliance to the spec does not guarantee it will actually satisfy the requirements of third parties. I want to avoid the problem where in an attempt to be standards compliant we ship configuration parameters that lock us in to a certain behavior that we end up regretting. I want to add parameters and behaviors as there's a demonstrable need. Users always have the escape hatch of providing their own OIDC configuration endpoint. Nomad just needs to know the Issuer so that the JWTs match the OIDC configuration. There's no reason the actual OIDC configuration JSON couldn't live in S3 and get served directly from there. Unlike JWKS the OIDC configuration should be static, or at least change very rarely. This PR is just the endpoint extracted from #18535. The `RS256` algorithm still needs to be added in hopes of supporting third parties such as [AWS IAM OIDC Provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html). Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-10-20 17:11:41 -07:00
James Rasell	1ffdd576bb	agent: add config option to enable file and line log detail. (#18768 )	2023-10-16 15:59:16 +01:00
Tim Gross	5001bf4547	consul: use constant instead of "default" literal (#18611 ) Use the constant `structs.ConsulDefaultCluster` instead of the string literal "default", as we've done for Vault.	2023-09-28 16:50:21 -04:00
Luiz Aoqui	868aba57bb	vault: update identity name to start with `vault_` (#18591 ) * vault: update identity name to start with `vault_` In the original proposal, workload identities used to derive Vault tokens were expected to be called just `vault`. But in order to support multiple Vault clusters it is necessary to associate identities with specific Vault cluster configuration. This commit implements a new proposal to have Vault identities named as `vault_<cluster>`.	2023-09-27 15:53:28 -03:00
Juana De La Cuesta	124272c050	server: Add reporting option to agent (#18572 ) * func: add reporting option to agent * func: add test for merge and fix comments * Update config_ce.go * Update config_ce.go * Update config_ce.go * fix: add reporting config to default configuration and update to use must over require * Update command/agent/config_parse.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update nomad/structs/config/reporting.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update nomad/structs/config/reporting.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * style: rename license and reporting config * fix: use default function instead of empty struct --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-09-27 00:11:32 +02:00
Tim Gross	3ee6c31241	ACLs: allow/deny/default config for Consul/Vault clusters by namespace (#18425 ) In Nomad Enterprise when multiple Vault/Consul clusters are configured, cluster admins can control access to clusters for jobs via namespace ACLs, similar to how we've done so for node pools. This changeset updates the ACL configuration structs, but doesn't wire them up.	2023-09-08 11:37:20 -04:00
Tim Gross	a8bad048b6	config: parsing support for multiple Consul clusters in agent config (#18255 ) Add the plumbing we need to accept multiple Consul clusters in Nomad agent configuration, to support upcoming Nomad Enterprise features. The `consul` blocks are differentiated by a new `name` field, and if the `name` is omitted it becomes the "default" Consul configuration. All blocks with the same name are merged together, as with the existing behavior. As with the `vault` block, we're still using HCL1 for parsing configuration and the `Decode` method doesn't parse multiple blocks differentiated only by a field name without a label. So we've had to add an extra parsing pass, similar to what we've done for HCL1 jobspecs. This also revealed a subtle bug in the `vault` block handling of extra keys when there are multiple `vault` blocks, which I've fixed here. For now, all existing consumers will use the "default" Consul configuration, so there's no user-facing behavior change in this changeset other than the contents of the agent self API. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-08-18 15:25:16 -04:00
Tim Gross	74b796e6d0	config: parsing support for multiple Vault clusters in agent config (#18224 ) Add the plumbing we need to accept multiple Vault clusters in Nomad agent configuration, to support upcoming Nomad Enterprise features. The `vault` blocks are differentiated by a new `name` field, and if the `name` is omitted it becomes the "default" Vault configuration. All blocks with the same name are merged together, as with the existing behavior. Unfortunately we're still using HCL1 for parsing configuration and the `Decode` method doesn't parse multiple blocks differentiated only by a field name without a label. So we've had to add an extra parsing pass, similar to what we've done for HCL1 jobspecs. For now, all existing consumers will use the "default" Vault configuration, so there's no user-facing behavior change in this changeset other than the contents of the agent self API. Ref: https://github.com/hashicorp/team-nomad/issues/404	2023-08-17 14:10:32 -04:00
hashicorp-copywrite[bot]	a9d61ea3fd	Update copyright file headers to BUSL-1.1	2023-08-10 17:27:29 -05:00
Luiz Aoqui	5db9e64cdd	node pool: node pool upsert on multiregion node register (#17503 ) When registering a node with a new node pool in a non-authoritative region we can't create the node pool because this new pool will not be replicated to other regions. This commit modifies the node registration logic to only allow automatic node pool creation in the authoritative region. In non-authoritative regions, the client is registered, but the node pool is not created. The client is kept in the `initialing` status until its node pool is created in the authoritative region and replicated to the client's region.	2023-06-13 11:28:28 -04:00
Dao Thanh Tung	67e39d5d24	Add check for missing `path` in client `host_volume` config (#17393 )	2023-06-05 19:31:19 -04:00
Luiz Aoqui	81f0b359dd	node pools: register a node in a node pool (#17405 )	2023-06-02 17:50:50 -04:00
Bram Vogelaar	89a4930b1d	agent: display node id on start up for servers (#17084 ) Signed-off-by: Bram Vogelaar <bram@attachmentgenie.com>	2023-05-18 11:23:12 -04:00
hashicorp-copywrite[bot]	f005448366	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	b08edf385a	agent: add top-level warning if mTLS is not configured (#16800 ) Nomad's security model requires mTLS in order to secure client-to-server and server-to-server communications. Configuring ACLs alone is not enough. Loudly warn the user if mTLS is not configured in non-dev modes.	2023-04-05 14:43:45 -04:00
jmwilkinson	46f3977db2	Allow wildcard datacenters to be specified in job file (#11170 ) Also allows for default value of `datacenters = ["*"]`	2023-02-02 09:57:45 -05:00
Piotr Kazmierczak	949a6f60c7	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Dao Thanh Tung	f89ac80801	agent: Make agent syslog log level inherit from Nomad agent log (#15625 )	2023-01-04 09:38:06 -05:00
Michael Schurter	01648e615a	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Michael Schurter	f998a2b77b	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Karan Sharma	2d95f06026	feat: Warn if bootstrap_expect is even number (#12961 )	2022-06-06 15:22:59 +02:00
Michael Schurter	3968509886	artifact: fix numerous go-getter security issues Fix numerous go-getter security issues: - Add timeouts to http, git, and hg operations to prevent DoS - Add size limit to http to prevent resource exhaustion - Disable following symlinks in both artifacts and `job run` - Stop performing initial HEAD request to avoid file corruption on retries and DoS opportunities. Approach Since Nomad has no ability to differentiate a DoS-via-large-artifact vs a legitimate workload, all of the new limits are configurable at the client agent level. The max size of HTTP downloads is also exposed as a node attribute so that if some workloads have large artifacts they can specify a high limit in their jobspecs. In the future all of this plumbing could be extended to enable/disable specific getters or artifact downloading entirely on a per-node basis.	2022-05-24 16:29:39 -04:00
Will Jordan	304d0cf595	Don't buffer json logs on agent startup (#13076 ) There's no reason to buffer json logs on agent startup since logs in this format already aren't reordered.	2022-05-19 15:40:30 -04:00
James Rasell	bab219a8ba	agent: fix panic when logging about protocol version config use. (#12962 ) The log line comes before the agent logger has been setup, therefore we need to use the UI logging to avoid panic.	2022-05-13 09:28:43 +02:00
Yoan Blanc	bda7b1ece0	feat: remove dependency to consul/lib Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-04-09 13:22:44 +02:00
James Rasell	9dc0b88cb5	client: add Nomad template service functionality to runner. (#12458 ) This change modifies the template task runner to utilise the new consul-template which includes Nomad service lookup template funcs. In order to provide security and auth to consul-template, we use a custom HTTP dialer which is passed to consul-template when setting up the runner. This method follows Vault implementation. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-06 19:17:05 +02:00
Michael Schurter	2411d3afd2	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Thomas Lefebvre	41f84c657a	Add config command and config validate subcommand to nomad CLI (#9198 )	2022-02-08 16:52:35 -05:00
James Rasell	c08a036655	Merge pull request #11403 from hashicorp/f-gh-11059 agent/docs: add better clarification when top-level data dir needs setting	2022-01-13 16:41:35 +01:00
Luiz Aoqui	3bf78fde47	Fix log level parsing from lines that include a timestamp (#11838 )	2022-01-13 09:56:35 -05:00
Michael Schurter	b4d3a610db	agent: validate reserved_ports are valid Goal is to fix at least one of the causes that can cause a node to be ineligible to receive work: https://github.com/hashicorp/nomad/issues/9506#issuecomment-1002880600	2022-01-12 14:21:47 -08:00
Kevin Schoonover	0873e08af1	agent: support multiple http address in addresses.http (#11582 )	2022-01-03 09:33:53 -05:00
James Rasell	0a571b7389	agent: clarify error info when data dir needs setting.	2021-10-28 15:05:56 +02:00
Mahmood Ali	3ce89b75f4	logging: Log the cause behind agent startup failure (#11353 ) Log the failure error when the agent fails to start. Previously, the agent startup failure error would be emitted to the command UI but not logged. So it doesn't get emitted to syslog or `log_file` if they are set, and it makes debugging much harder. Also, logging the error again before exit makes the error more visible: previously, the operator needed to scroll to the top to find the error. On a sample failure, the output will look like: ``` ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation. ==> Loaded configuration from sample-configs/config-bad ==> Starting Nomad agent... ==> Error starting agent: setting up server node ID failed: mkdir /path-without-permission: read-only file system 2021-10-20T14:38:51.179-0400 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [ERROR] agent: error starting agent: error="setting up server node ID failed: mkdir /path-without-permission: read-only file system" ``` This change adds the final `ERROR` message. It's easy to miss the `==> Error starting agent` above.	2021-10-27 10:41:17 -07:00
Michael Schurter	6a0dede9b6	Merge pull request #11167 from a-zagaevskiy/master Support configurable dynamic port range	2021-10-13 16:47:38 -07:00
Michael Schurter	fc89835daf	client: improve errors & tests for dynamic ports	2021-10-13 16:25:25 -07:00
Luiz Aoqui	713094ffb7	wrap `log` messages with `hclog` (#11291 )	2021-10-12 14:38:44 -04:00
Aleksandr Zagaevskiy	0620bb04a5	fixup! Support configurable dynamic port range	2021-10-11 14:13:59 +03:00
Aleksandr Zagaevskiy	e3b6f62198	Support configurable dynamic port range	2021-09-10 11:52:47 +03:00

1 2 3 4 5

240 Commits