-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.18, 4.19
Description of problem:
We're seeing another issue related to the DNS restart that happens during the SNO reboot. [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers
This payload failed because these two jobs hit this excessive restart error. 1 2
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
event [namespace/openshift-catalogd node/ip-10-0-105-161.us-west-1.compute.internal pod/catalogd-controller-manager-59d9788859-n5sf7 hmsg/e0d473c239 - Back-off restarting failed container manager in pod catalogd-controller-manager-59d9788859-n5sf7_openshift-catalogd(78fff808-fe29-48eb-a031-0ba3650a3d84)] happened 89 times event [namespace/openshift-operator-controller node/ip-10-0-105-161.us-west-1.compute.internal pod/operator-controller-controller-manager-7b46748475-v9d7m hmsg/96a44e679e - Back-off restarting failed container manager in pod operator-controller-controller-manager-7b46748475-v9d7m_openshift-operator-controller(242e6d7e-cedc-42c2-9606-94848608f244)] happened 88 times}
Expected results:
Test passes
Additional info:
Unpacking the logs in loki shows a spike of errors during the the DNS outage caused by rolling out the new DNS pod during the upgrade. This is the case for both the catalogd-controller-manager-59d9788859-n5sf7_openshift-catalogd pod in ns/openshift-catalogd and operator-controller-controller-manager-7b46748475-v9d7m_openshift-operator-controller in ns/openshift-operator-controller
- is related to
-
OCPBUGS-42777 SNO Regression for [sig-network-edge] Verify DNS availability during and after upgrade success
- ASSIGNED
- links to