-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
Production
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
In this partners MicroShift deployment they have noticed that after the node reboots, two postgres pods fail to start back up properly and are hitting:
Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: kubelet E1020 09:25:58.438702 3534 log.go:32] "StopPodSandbox from runtime service failed" err=<
Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_dps-infra-postgres-repo-host-0_dps_8b4cd4e8-32db-48a3-891b-26ae090370f0_0(d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3): error removing pod dps-infra-postgres-repo-host-0 from CNI network "ovn-kubernetes": plugin type="ovn-k8s-cni-overlay" name="ovn-kubernetes" failed (delete): CNI request failed with status 400: '[<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] [<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] failed to get container namespace for pod <namespace>/dps-infra-postgres-repo-host-0 NAD default: failed to Statfs "": no such file or directory
Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: ': stat netns path "": stat : no such file or directory
Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: > podSandboxID="d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3"
Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: kubelet E1020 09:25:58.438773 3534 kuberuntime_manager.go:1479] "Failed to stop sandbox" podSandboxID={"Type":"cri-o","ID":"d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3"}
Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: kubelet E1020 09:25:58.438817 3534 kuberuntime_manager.go:1079] "killPodWithSyncResult failed" err="failed to \"KillPodSandbox\" for \"8b4cd4e8-32db-48a3-891b-26ae090370f0\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_dps-infra-postgres-repo-host-0-dps_8b4cd4e8-32db-48a3-891b-26ae090370f0_0(d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3): error removing pod dps-infra-postgres-repo-host-0 from CNI network \\\"ovn-kubernetes\\\": plugin type=\\\"ovn-k8s-cni-overlay\\\" name=\\\"ovn-kubernetes\\\" failed (delete): CNI request failed with status 400: '[<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] [<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] failed to get container namespace for pod <namespace>/dps-infra-postgres-repo-host-0 NAD default: failed to Statfs \\\"\\\": no such file or directory\\n': stat netns path \\\"\\\": stat : no such file or directory\""
Oct 20 09:25:58 spgttmicro-os.schuler.de microshift[3534]: kubelet E1020 09:25:58.438855 3534 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"8b4cd4e8-32db-48a3-891b-26ae090370f0\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_dps-infra-postgres-repo-host-0_8b4cd4e8-32db-48a3-891b-26ae090370f0_0(d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3): error removing pod dps-infra-postgres-repo-host-0 from CNI network \\\"ovn-kubernetes\\\": plugin type=\\\"ovn-k8s-cni-overlay\\\" name=\\\"ovn-kubernetes\\\" failed (delete): CNI request failed with status 400: '[<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] [<namespace>/dps-infra-postgres-repo-host-0 d21853b5e0c87d279d08f2484bd5d7b8c1a13788d625cc2f2301fcc83532c2e3 network default NAD default] failed to get container namespace for pod <namespace>/dps-infra-postgres-repo-host-0 NAD default: failed to Statfs \\\"\\\": no such file or directory\\n': stat netns path \\\"\\\": stat : no such file or directory\"" pod="<namespace>/dps-infra-postgres-repo-host-0" podUID="8b4cd4e8-32db-48a3-891b-26ae090370f0"
Version-Release number of selected component (if applicable):
cri-o: 1.31.11-2.rhaos4.18.git65ec77a.el9
microshift: 4.18
How reproducible:
Always
Steps to Reproduce:
1. With all the pods up & running reboot the node
2. Wait for the node to come back up and for the pods to be recreated, impacted pods never come back up
Actual results:
Impacted pods never manage to come back up on their own and need to be forcefully deleted so a new one can be created
Expected results:
Pods should be able to come back up without intervention
Additional info:
Looks to be similar to OCPBUGS-58229 however in that bug the cri-o version was 1.33, in our case is 1.31.11
- is related to
-
OCPBUGS-58229 MicroShift: Pod in offline scenario does not start after reboot after bumping CRIO to 1.33.1
-
- Verified
-