Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44608

Etcd pod ran into CrashLoopBackOff status after KCM leader master node stopped and started

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.0
    • Etcd
    • None
    • Critical
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      We have one automation case to be executed in Prwo CI cluster of openstack. 
      After stopped and started KCM leader master node,  etcd pod will run into CrashLoopBackOff status 
          

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.18.0-0.nightly-2024-11-14-090045   True        False         5h37m   Cluster version is 4.18.0-0.nightly-2024-11-14-090045
      
          

      How reproducible:

      always
          

      Steps to Reproduce:

          1. Shutdown the KCM leader master node, 
      $ oc debug node/<master node name>
      sh-4.2# chroot /host
      sh-4.2# poweroff
      Make sure the master node has been powered off
      $ oc  get node
          2. To verify the cluster works fine with two master nodes, run:
      oc login
      oc new-project test-project-1
      oc new-app aosqe/hello-openshift -n test-project-1
          3. Start the powered off master again. Check the cluster status:
      oc get no
      oc get co
      oc get po -A
          

      Actual results:

        1. etcd operator is degraded.
      $ oc get co
      ...
      etcd                                       4.18.0-0.nightly-2024-11-14-090045   True        False         True       4h23m   EtcdMembersDegraded: 2 of 3 members are available, qizv777c-4bb48-4vjc4-master-0 is unhealthy...
      
      $ oc describe co/etcd
      ...
      Status:
        Conditions:
          Last Transition Time:  2024-11-15T09:07:28Z
          Message:               EtcdMembersDegraded: 2 of 3 members are available, qizv777c-4bb48-4vjc4-master-0 is unhealthy
      StaticPodsDegraded: pod/etcd-qizv777c-4bb48-4vjc4-master-0 container "etcd" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=etcd pod=etcd-qizv777c-4bb48-4vjc4-master-0_openshift-etcd(ab2c484f207208c9eb9adb8c2f9b65e4)
          Reason:                EtcdMembers_UnhealthyMembers::StaticPods_Error
          Status:                True
          Type:                  Degraded
          Last Transition Time:  2024-11-15T05:04:14Z
          Message:               NodeInstallerProgressing: 3 nodes are at revision 8
      EtcdMembersProgressing: No unstarted etcd members found
          Reason:                AsExpected
          Status:                False
          Type:                  Progressing
          Last Transition Time:  2024-11-15T04:47:13Z
          Message:               StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 8
      EtcdMembersAvailable: 2 of 3 members are available, qizv777c-4bb48-4vjc4-master-0 is unhealthy
          Reason:                AsExpected
          Status:                True
          Type:                  Available
          Last Transition Time:  2024-11-15T04:43:44Z
          Message:               All is well
          Reason:                AsExpected
          Status:                True
          Type:                  Upgradeable
          Last Transition Time:  2024-11-15T04:43:44Z
          Reason:                NoData
          Status:                Unknown
          Type:                  EvaluationConditionsDetected
      
       $ oc get pod -n openshift-etcd
      NAME                                              READY   STATUS             RESTARTS        AGE
      ...
      etcd-qizv777c-4bb48-4vjc4-master-0                4/5     CrashLoopBackOff   11 (119s ago)   4h12m
      ...
      
          

      Expected results:

      Etcd operator should not be degraded.
        

      Additional info:

      workaround: [https://access.redhat.com/solutions/6962106]
      

              dwest@redhat.com Dean West
              wk2019 Ke Wang
              Ge Liu Ge Liu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: