Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-225

Pods cannot schedule to a node that has been powered off

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • 4.11
    • Windows Containers
    • Moderate
    • None
    • 3
    • SDN Sprint 231
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      When I have a windows worker in an AWS instance that I power down and then power on again (i.e. I shut poweroff all nodes over night and turn on the next day), pods are unable to schedule on the nodes, even though they are joined to the cluster and reporting Ready.

      Version-Release number of selected component (if applicable):
      OCP 4.11
      WMCO 6.0.0

      How reproducible:
      Not 100% every time but I saw it two different days with two different clusters.

      Steps to Reproduce:
      1. Create a cluster with windows workers on AWS.
      2. Power off all nodes
      3. Power on all nodes (~12 hours later)
      4. Schedule some workload to the windows workers

      Actual results:
      Pods are stuck in ContainerCreating and unable to schedule to any windows nodes, with the error:

      Reason:"FailedCreatePodSandBox", Message:"(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"cab04ed4ba7ceb30ca41949b4e6ef08ded85a93b81daadf6d8575c9a1eec8265\": plugin type=\"win-overlay\" name=\"OVNKubernetesHybridOverlayNetwork\" failed (add): error while hcn.GetNetworkByName(OVNKubernetesHybridOverlayNetwork): Network name \"OVNKubernetesHybridOverlayNetwork\" not found",
      

      Expected results:
      Pods are able to schedule and run without errors after a node is rebooted.

      Additional info:
      Deleting the Machines, which recreates the instance entirely, does provision a functional node that can accept pods.

              team-winc Team WinC
              ancollin@redhat.com Andrew Collins
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: