-
Bug
-
Resolution: Obsolete
-
Major
-
None
-
4.11
-
Moderate
-
None
-
3
-
SDN Sprint 231
-
1
-
Rejected
-
False
-
Description of problem:
When I have a windows worker in an AWS instance that I power down and then power on again (i.e. I shut poweroff all nodes over night and turn on the next day), pods are unable to schedule on the nodes, even though they are joined to the cluster and reporting Ready.
Version-Release number of selected component (if applicable):
OCP 4.11
WMCO 6.0.0
How reproducible:
Not 100% every time but I saw it two different days with two different clusters.
Steps to Reproduce:
1. Create a cluster with windows workers on AWS.
2. Power off all nodes
3. Power on all nodes (~12 hours later)
4. Schedule some workload to the windows workers
Actual results:
Pods are stuck in ContainerCreating and unable to schedule to any windows nodes, with the error:
Reason:"FailedCreatePodSandBox", Message:"(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"cab04ed4ba7ceb30ca41949b4e6ef08ded85a93b81daadf6d8575c9a1eec8265\": plugin type=\"win-overlay\" name=\"OVNKubernetesHybridOverlayNetwork\" failed (add): error while hcn.GetNetworkByName(OVNKubernetesHybridOverlayNetwork): Network name \"OVNKubernetesHybridOverlayNetwork\" not found",
Expected results:
Pods are able to schedule and run without errors after a node is rebooted.
Additional info:
Deleting the Machines, which recreates the instance entirely, does provision a functional node that can accept pods.