docker: stop network pause container of lost alloc after node restart (#17455)

This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
This commit is contained in:
Seth Hoenig
2023-06-09 08:46:29 -05:00
committed by GitHub
parent 408ab828f7
commit 89ce092b20
7 changed files with 142 additions and 38 deletions

View File

@@ -490,6 +490,10 @@ func (d *driverPluginClient) CreateNetwork(allocID string, _ *NetworkCreateReque
}
func (d *driverPluginClient) DestroyNetwork(allocID string, spec *NetworkIsolationSpec) error {
if spec == nil {
return nil
}
req := &proto.DestroyNetworkRequest{
AllocId: allocID,
IsolationSpec: NetworkIsolationSpecToProto(spec),