Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63432

MicroShift: Pods not starting and hitting failed to destroy network for pod sandbox after node reboot

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18.z
    • Node / CRI-O
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • Production
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

         In this partners MicroShift deployment they have noticed that after the node reboots, two postgres pods fail to start back up properly and are hitting:
      
      Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: kubelet E1020 09:25:58.438702    3534 log.go:32] "StopPodSandbox from runtime service failed" err=<
      Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]:         rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_dps-infra-postgres-repo-host-0_dps_8b4cd4e8-32db-48a3-891b-26ae090370f0_0(d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3): error removing pod dps-infra-postgres-repo-host-0 from CNI network "ovn-kubernetes": plugin type="ovn-k8s-cni-overlay" name="ovn-kubernetes" failed (delete): CNI request failed with status 400: '[<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] [<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] failed to get container namespace for pod <namespace>/dps-infra-postgres-repo-host-0 NAD default: failed to Statfs "": no such file or directory
      Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]:         ': stat netns path "": stat : no such file or directory
      Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]:  > podSandboxID="d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3"
      Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: kubelet E1020 09:25:58.438773    3534 kuberuntime_manager.go:1479] "Failed to stop sandbox" podSandboxID={"Type":"cri-o","ID":"d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3"}
      Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: kubelet E1020 09:25:58.438817    3534 kuberuntime_manager.go:1079] "killPodWithSyncResult failed" err="failed to \"KillPodSandbox\" for \"8b4cd4e8-32db-48a3-891b-26ae090370f0\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_dps-infra-postgres-repo-host-0-dps_8b4cd4e8-32db-48a3-891b-26ae090370f0_0(d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3): error removing pod dps-infra-postgres-repo-host-0 from CNI network \\\"ovn-kubernetes\\\": plugin type=\\\"ovn-k8s-cni-overlay\\\" name=\\\"ovn-kubernetes\\\" failed (delete): CNI request failed with status 400: '[<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] [<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] failed to get container namespace for pod <namespace>/dps-infra-postgres-repo-host-0 NAD default: failed to Statfs \\\"\\\": no such file or directory\\n': stat netns path \\\"\\\": stat : no such file or directory\""
      Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: kubelet E1020 09:25:58.438855    3534 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"8b4cd4e8-32db-48a3-891b-26ae090370f0\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_dps-infra-postgres-repo-host-0_8b4cd4e8-32db-48a3-891b-26ae090370f0_0(d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3): error removing pod dps-infra-postgres-repo-host-0 from CNI network \\\"ovn-kubernetes\\\": plugin type=\\\"ovn-k8s-cni-overlay\\\" name=\\\"ovn-kubernetes\\\" failed (delete): CNI request failed with status 400: '[<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] [<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] failed to get container namespace for pod <namespace>/dps-infra-postgres-repo-host-0 NAD default: failed to Statfs \\\"\\\": no such file or directory\\n': stat netns path \\\"\\\": stat : no such file or directory\"" pod="<namespace>/dps-infra-postgres-repo-host-0" podUID="8b4cd4e8-32db-48a3-891b-26ae090370f0"

      Version-Release number of selected component (if applicable):

          cri-o: 1.31.11-2.rhaos4.18.git65ec77a.el9
          microshift: 4.18

      How reproducible:

          Always

      Steps to Reproduce:

          1. With all the pods up & running reboot the node
          2. Wait for the node to come back up and for the pods to be recreated, impacted pods never come back up
          
          

      Actual results:

          Impacted pods never manage to come back up on their own and need to be forcefully deleted so a new one can be created

      Expected results:

          Pods should be able to come back up without intervention

      Additional info:

          Looks to be similar to OCPBUGS-58229 however in that bug the cri-o version was 1.33, in our case is 1.31.11

              pehunt@redhat.com Peter Hunt
              rh-ee-mnicolae Marius Paulica Nicolae
              None
              None
              John George John George
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: