Commit Graph

616 Commits

Author SHA1 Message Date
Danielle Lancashire
ab5ba7aa9b config: Hoist volume.config.source into volume
Currently, using a Volume in a job uses the following configuration:

```
volume "alias-name" {
  type = "volume-type"
  read_only = true

  config {
    source = "host_volume_name"
  }
}
```

This commit migrates to the following:

```
volume "alias-name" {
  type = "volume-type"
  source = "host_volume_name"
  read_only = true
}
```

The original design was based due to being uncertain about the future of storage
plugins, and to allow maxium flexibility.

However, this causes a few issues, namely:
- We frequently need to parse this configuration during submission,
scheduling, and mounting
- It complicates the configuration from and end users perspective
- It complicates the ability to do validation

As we understand the problem space of CSI a little more, it has become
clear that we won't need the `source` to be in config, as it will be
used in the majority of cases:

- Host Volumes: Always need a source
- Preallocated CSI Volumes: Always needs a source from a volume or claim name
- Dynamic Persistent CSI Volumes*: Always needs a source to attach the volumes
                                   to for managing upgrades and to avoid dangling.
- Dynamic Ephemeral CSI Volumes*: Less thought out, but `source` will probably point
                                  to the plugin name, and a `config` block will
                                  allow you to pass meta to the plugin. Or will
                                  point to a pre-configured ephemeral config.
*If implemented

The new design simplifies this by merging the source into the volume
stanza to solve the above issues with usability, performance, and error
handling.
2019-09-13 04:37:59 +02:00
Preetha Appan
654c72a7b4 update comment 2019-09-05 18:43:30 -05:00
Preetha Appan
87e998d043 Fix inplace updates bug with group level networks
During inplace updates, we should be using network information
from the previous allocation being updated.
2019-09-05 18:37:24 -05:00
Jasmine Dahilig
c346a47b5b add default update stanza and max_parallel=0 disables deployments (#6191) 2019-09-02 10:30:09 -07:00
Mahmood Ali
8a0647c9cf schedulers: check all drivers on node
When checking driver feasability for an alloc with multiple drivers, we
must check that all drivers are detected and healthy.

Nomad 0.9 and 0.8 have a bug where we may check a single driver only,
but which driver is dependent on map traversal order, which is
unspecified in golang spec.
2019-08-29 09:03:31 -04:00
Mahmood Ali
542d17e745 scheduler: tests for multiple drivers in TG 2019-08-29 09:03:31 -04:00
Danielle Lancashire
41292055de scheduler: Implicit constraint on readonly hostvol
When a Client declares a volume is ReadOnly, we should only schedule it
for requests for ReadOnly volumes. This change means that if a host
exposes a readonly volume, we then validate that the group level
requests for the volume are all read only for that host.
2019-08-21 20:57:05 +02:00
Danielle Lancashire
af5d42c058 structs: Unify Volume and VolumeRequest 2019-08-12 15:39:08 +02:00
Danielle
0f5cf5fa91 Update scheduler/feasible.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-08-12 15:39:08 +02:00
Danielle Lancashire
709abbc675 scheduler: Add a feasability checker for Host Vols 2019-08-12 15:39:08 +02:00
Preetha Appan
5a1dd79179 Code review feedback 2019-07-31 01:04:08 -04:00
Preetha Appan
b561816343 Scheduler changes to support network at task group level
Also includes unit tests for binpacker and preemption.
The tests verify that network resources specified at the
task group level are properly accounted for
2019-07-31 01:04:08 -04:00
Nick Ethier
4cb99a1112 scheduler: fix disk constraints 2019-07-31 01:04:08 -04:00
Nick Ethier
e910fdbb32 fix failing tests 2019-07-31 01:04:07 -04:00
Nick Ethier
e15005bdcb networking: Add new bridge networking mode implementation 2019-07-31 01:04:06 -04:00
Nick Ethier
c742f8b580 ar: cleanup lint errors 2019-07-31 01:03:18 -04:00
Nick Ethier
e20fa7ccc1 Add network lifecycle management
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
2019-07-31 01:03:17 -04:00
Lang Martin
2d8bfb8d11 system_sched submits failed evals as blocked 2019-07-18 10:32:12 -04:00
Preetha Appan
bead05f05f Fix more tests 2019-06-26 16:30:53 -05:00
Preetha Appan
913427428a Remove compat code associated with many previous versions of nomad
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00
Mahmood Ali
2899991ccd Merge pull request #5790 from hashicorp/b-reschedule-desired-state
Mark rescheduled allocs as stopped.
2019-06-13 17:28:59 -04:00
Mahmood Ali
25b44b18db Test behavior no reschedule for service/batch jobs 2019-06-13 16:41:19 -04:00
Mahmood Ali
34a66835db Don't stop rescheduleLater allocations
When an alloc is due to be rescheduleLater, it goes through the
reconciler twice: once to be ignored with a follow up evals, and once
again when processing the follow up eval where they appear as
rescheduleNow.

Here, we ignore them in the first run and mark them as stopped in second
iteration; rather than stop them twice.
2019-06-13 09:44:41 -04:00
Mahmood Ali
d342a24ba0 Only preempt for network when there is a network
When examining preemption for networks, only consider allocs that have
networks.

Fixes https://github.com/hashicorp/nomad/issues/5793
2019-06-07 18:55:55 -04:00
Mahmood Ali
2808674fac test: add tests for network devices and preemption 2019-06-07 18:55:02 -04:00
Mahmood Ali
c62c246ad9 Stop allocs to be rescheduled
Currently, when an alloc fails and is rescheduled, the alloc desired
state remains as "run" and the nomad client may not free the resources.

Here, we ensure that an alloc is marked as stopped when it's
rescheduled.

Notice the Desired Status and Description before and after this change:

Before:
```
mars-2:nomad notnoop$ nomad alloc status 02aba49e
ID                   = 02aba49e
Eval ID              = bb9ed1d2
Name                 = example-reschedule.nodes[0]
Node ID              = 5853d547
Node Name            = mars-2.local
Job ID               = example-reschedule
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = run
Desired Description  = <none>
Created              = 10s ago
Modified             = 5s ago
Replacement Alloc ID = d6bf872b

Task "payload" is "dead"
Task Resources
CPU        Memory          Disk     Addresses
0/100 MHz  24 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2019-06-06T21:12:45Z
Finished At    = 2019-06-06T21:12:50Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type            Description
2019-06-06T17:12:50-04:00  Not Restarting  Policy allows no restarts
2019-06-06T17:12:50-04:00  Terminated      Exit Code: 1
2019-06-06T17:12:45-04:00  Started         Task started by client
2019-06-06T17:12:45-04:00  Task Setup      Building Task Directory
2019-06-06T17:12:45-04:00  Received        Task received by client

```

After:

```
ID                   = 5001ccd1
Eval ID              = 53507a02
Name                 = example-reschedule.nodes[0]
Node ID              = a3b04364
Node Name            = mars-2.local
Job ID               = example-reschedule
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 13s ago
Modified             = 3s ago
Replacement Alloc ID = 7ba7ac20

Task "payload" is "dead"
Task Resources
CPU         Memory          Disk     Addresses
21/100 MHz  24 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2019-06-06T21:22:50Z
Finished At    = 2019-06-06T21:22:55Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type            Description
2019-06-06T17:22:55-04:00  Not Restarting  Policy allows no restarts
2019-06-06T17:22:55-04:00  Terminated      Exit Code: 1
2019-06-06T17:22:50-04:00  Started         Task started by client
2019-06-06T17:22:50-04:00  Task Setup      Building Task Directory
2019-06-06T17:22:50-04:00  Received        Task received by client
```
2019-06-06 17:27:12 -04:00
Mahmood Ali
5574b2f3d0 tests: Migrated allocs aren't lost
Fix `TestServiceSched_NodeDown` for checking that the migrated allocs
are actually marked to be stopped.

The boolean logic in test made it skip actually checking client status
as long as desired status was stop.

Here, we mark some jobs for migration while leaving others as running,
and we check that lost flag is only set for non-migrated allocs.
2019-06-06 16:05:07 -04:00
Lang Martin
21dccdf8dd describe a pending deployment with auto_promote accurately 2019-05-22 12:32:08 -04:00
Lang Martin
2165f8be94 sched reconcile copy AutoPromote to DeploymentState 2019-05-22 12:32:08 -04:00
Preetha Appan
566dd71486 Fix comment and assert score in test case 2019-05-15 12:35:57 -05:00
Nick Ethier
5709bf7b54 fix missing brace 2019-05-15 13:02:04 -04:00
Nick Ethier
ea843a507a scheduler: add check to prohibit returning inf during spread boost calculation 2019-05-15 13:00:24 -04:00
Lang Martin
b1228536e8 system_sched & test cleanup comments 2019-05-01 12:25:26 -04:00
Lang Martin
d1308420b2 system_sched_test extend the test to check ineligible nodes 2019-05-01 12:25:26 -04:00
Lang Martin
8cc8fc6200 system_sched when a node is filtered, don't mark failure 2019-05-01 12:25:26 -04:00
Lang Martin
178092591b system_sched_test create partially constrained job 2019-05-01 12:25:26 -04:00
Arshneet Singh
ee268a58db Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh
97686e371f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh
02b832c3ff Compat tags 2019-04-23 09:18:01 -07:00
Arshneet Singh
f75c6b4bdb Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Arshneet Singh
4eedab18a7 Add code for plan normalization 2019-04-23 09:18:01 -07:00
Danielle
9a4fe5e98f Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire
bb142af5d6 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Preetha Appan
a134c16c22 remove stray new line 2019-04-12 10:32:48 -05:00
Preetha Appan
4743561396 Refactor scheduler package to enable preemption for batch/service jobs 2019-04-10 20:24:01 -05:00
James Rasell
ee92bf86d8 Add NodeName to the alloc/job status outputs.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.

This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.

Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Preetha Appan
1323b4d5cc Fix bug where scoring metadata would be overridden during an inplace upgrade. 2019-03-12 23:36:46 -05:00
Alex Dadgar
bc42873e07 Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
Nick Ethier
80a04052b6 scheduler: fix NPE when deployment is nil, but placement is a canary 2019-01-28 20:22:59 -06:00
Alex Dadgar
8264f50c52 convert driver to device for device constraint/attributes 2019-01-23 10:58:45 -08:00