-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.0
-
None
-
Critical
-
None
-
Proposed
-
False
-
Description of problem:
We have one automation case to be executed in Prwo CI cluster of openstack. After stopped and started KCM leader master node, etcd pod will run into CrashLoopBackOff status
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.18.0-0.nightly-2024-11-14-090045 True False 5h37m Cluster version is 4.18.0-0.nightly-2024-11-14-090045
How reproducible:
always
Steps to Reproduce:
1. Shutdown the KCM leader master node, $ oc debug node/<master node name> sh-4.2# chroot /host sh-4.2# poweroff Make sure the master node has been powered off $ oc get node 2. To verify the cluster works fine with two master nodes, run: oc login oc new-project test-project-1 oc new-app aosqe/hello-openshift -n test-project-1 3. Start the powered off master again. Check the cluster status: oc get no oc get co oc get po -A
Actual results:
1. etcd operator is degraded. $ oc get co ... etcd 4.18.0-0.nightly-2024-11-14-090045 True False True 4h23m EtcdMembersDegraded: 2 of 3 members are available, qizv777c-4bb48-4vjc4-master-0 is unhealthy... $ oc describe co/etcd ... Status: Conditions: Last Transition Time: 2024-11-15T09:07:28Z Message: EtcdMembersDegraded: 2 of 3 members are available, qizv777c-4bb48-4vjc4-master-0 is unhealthy StaticPodsDegraded: pod/etcd-qizv777c-4bb48-4vjc4-master-0 container "etcd" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=etcd pod=etcd-qizv777c-4bb48-4vjc4-master-0_openshift-etcd(ab2c484f207208c9eb9adb8c2f9b65e4) Reason: EtcdMembers_UnhealthyMembers::StaticPods_Error Status: True Type: Degraded Last Transition Time: 2024-11-15T05:04:14Z Message: NodeInstallerProgressing: 3 nodes are at revision 8 EtcdMembersProgressing: No unstarted etcd members found Reason: AsExpected Status: False Type: Progressing Last Transition Time: 2024-11-15T04:47:13Z Message: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 8 EtcdMembersAvailable: 2 of 3 members are available, qizv777c-4bb48-4vjc4-master-0 is unhealthy Reason: AsExpected Status: True Type: Available Last Transition Time: 2024-11-15T04:43:44Z Message: All is well Reason: AsExpected Status: True Type: Upgradeable Last Transition Time: 2024-11-15T04:43:44Z Reason: NoData Status: Unknown Type: EvaluationConditionsDetected $ oc get pod -n openshift-etcd NAME READY STATUS RESTARTS AGE ... etcd-qizv777c-4bb48-4vjc4-master-0 4/5 CrashLoopBackOff 11 (119s ago) 4h12m ...
Expected results:
Etcd operator should not be degraded.
Additional info:
workaround: [https://access.redhat.com/solutions/6962106]