Details
-
Bug
-
Resolution: Can't Do
-
Normal
-
None
-
4.11
-
Moderate
-
OCPNODE Sprint 230 (Blue), OCPNODE Sprint 233 (Blue)
-
2
-
False
-
Description
Description of problem:
A 4.11.20 to 4.11.21 update was progressing happily until the machine-config operator got to rolling the control plane nodes. Draining master-0 was slowed by an installer-9-...-master-0 pod in openshift-etcd that was nominally still Terminating despite being 15 days old, and library-go installer-... pods usually making quick work of installing static-pod assets and then exiting. Manually deleting the pod unstuck the drain, and the update proceeded to complete successfully without further excitement.
Version-Release number of selected component (if applicable):
The cluster was transitioning from 4.11.20 to 4.11.21. At the time of the issue, it was still 4.11.20 kubelet and CRI-O components running on the outgoing node, although both 4.11.20 and 4.11.21 have the same RHCOS 411.86.202212072103-0 anyway.
How reproducible:
Unknown. The other two control-plane nodes in this cluster drained without incident, so expected reproducibility is low.
Steps to Reproduce:
Unknown.
Actual results:
Node failed to drain, with a pod stuck in Terminating despite having no backing container.
Expected results:
Successful drain, with pods only being reported as Terminating when they have associated containers that are also terminating.