-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
4.12.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
No
-
None
-
None
-
Rejected
-
OCPNODE Sprint 238 (Blue), OCPNODE Sprint 239 (Blue)
-
2
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Containers fail to start and end up with this status: Last State: Terminated Reason: Error Exit Code: 139 Started: Fri, 09 Jun 2023 10:01:37 -0400 Finished: Fri, 09 Jun 2023 10:01:37 -0400 and exit without showing any logs. For example, restarting a previously "Running" pod puts it into this state: ❯ k get po -n hypershift NAME READY STATUS RESTARTS AGE external-dns-7cc4b775d9-t558s 1/1 Running 0 2d20h operator-58f644b4bb-2rbt4 0/1 CrashLoopBackOff 17 (4m53s ago) 66m operator-58f644b4bb-w4mkf 1/1 Running 0 2d16h
Version-Release number of selected component (if applicable):
4.12.12
How reproducible:
Unsure
Steps to Reproduce:
Unsure
Actual results:
What we ended up doing on this cluster is replacing the worker machines, which created new nodes. By forcing pods to reschedule onto the new nodes, all pods were able to start successfully.
This behavior was confusingly consistent for some pods - but not all, for example I was able to
kubectl run ubuntu --image ubuntu --rm -it
and that worked just fine, or deleting/recreating an ovnkube-node pod scheduled on the same worker node, even "oc debug"ing onto an affected worker node and successfully running
podman run --rm -it --entrypoint=bash ${CONTAINER_IMAGE}
for pods that wouldn't start up
Expected results:
In one sense, that pods are able to start successfully (we do not understand what caused this bug) and on the other hand, that the new and old nodes have the same configuration. It is strange that the new replacement machines are viable, but the existing ones were not.
Additional info:
Must gather link: https://drive.google.com/file/d/10m7TpJEdmBbLec35PD9vpHqoIV8bYzW-/view?usp=sharing
We have living clusters where this bug is still occuring and have cordoned nodes for live investigation if needed, please feel free to reach out if you would like to, we can screenshare!