Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3042

SNO etcd operator is degraded blocked cluster upgrade

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None

      Description of problem:

      While upgrading 1241 SNOs using TALM, 3 clusters were unable to upgrade because their etcd operator was degraded.

      Version-Release number of selected component (if applicable):

      Deployed SNO OCP 4.10.32
      Attempted to upgrade to 4.11.5

      How reproducible:

      Rare because only 3 out of 1241 clusters had this issue
      Out of all the upgrade failures, it was 3 out 26 failures

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      [root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00423/kubeconfig get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.10.32   True        False         41h     Error while reconciling 4.10.32: the cluster operator etcd is degraded
      [root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00454/kubeconfig get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.10.32   True        False         41h     Error while reconciling 4.10.32: the cluster operator etcd is degraded
      [root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno01049/kubeconfig get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.10.32   True        False         39h     Error while reconciling 4.10.32: the cluster operator etcd is degraded

      Expected results:

      SNO clusters to not degrade etcd operator and prevent upgrading

      Additional info:

        - lastTransitionTime: "2022-10-26T04:36:26Z"                                                                                                                                                
          message: |-                                                                                                                                                                               
            EtcdEndpointsDegraded: no etcd members are present                                                                                                                                      
            UpgradeBackupControllerDegraded: etcdmembers.etcd.operator.openshift.io "sno01049" not found                                                                                            
          reason: EtcdEndpoints_ErrorUpdatingEtcdEndpoints::UpgradeBackupController_Error                                                                                                           
          status: "True"                                                                                                                                                                            
          type: Degraded 
      
      [root@sno00423 ~]# netstat -ntlp | egrep "2380|2379|9978"
      tcp6       0      0 :::2379                 :::*                    LISTEN      9047/etcd           
      tcp6       0      0 :::2380                 :::*                    LISTEN      9047/etcd           
      tcp6       0      0 :::9978                 :::*                    LISTEN      9047/etcd  

      I opened this against etcd however it isn't clear yet what is the actual responsible component for the failures.

        1. sno00644.tar.gz
          21.70 MB
        2. sno00454.tar.gz
          32.81 MB
        3. sno00423.tar.gz
          36.80 MB
        4. sosreport-sno00945-2023-01-17-qiadjng.tar.xz
          28.80 MB
        5. sosreport-sno01084-2023-01-17-efeyylb.tar.xz
          29.20 MB
        6. must-gather-sno00945.tar.gz
          58.03 MB
        7. must-gather-sno01084.tar.gz
          58.11 MB
        8. must-gather-sno01433.tar.gz
          58.27 MB
        9. must-gather-sno01821.tar.gz
          57.40 MB

            rphillip@redhat.com Ryan Phillips
            akrzos@redhat.com Alex Krzos
            ge liu ge liu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: