Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3042

SNO etcd operator is degraded blocked cluster upgrade

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None

      Description of problem:

      While upgrading 1241 SNOs using TALM, 3 clusters were unable to upgrade because their etcd operator was degraded.

      Version-Release number of selected component (if applicable):

      Deployed SNO OCP 4.10.32
      Attempted to upgrade to 4.11.5

      How reproducible:

      Rare because only 3 out of 1241 clusters had this issue
      Out of all the upgrade failures, it was 3 out 26 failures

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      [root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00423/kubeconfig get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.10.32   True        False         41h     Error while reconciling 4.10.32: the cluster operator etcd is degraded
      [root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00454/kubeconfig get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.10.32   True        False         41h     Error while reconciling 4.10.32: the cluster operator etcd is degraded
      [root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno01049/kubeconfig get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.10.32   True        False         39h     Error while reconciling 4.10.32: the cluster operator etcd is degraded

      Expected results:

      SNO clusters to not degrade etcd operator and prevent upgrading

      Additional info:

        - lastTransitionTime: "2022-10-26T04:36:26Z"                                                                                                                                                
          message: |-                                                                                                                                                                               
            EtcdEndpointsDegraded: no etcd members are present                                                                                                                                      
            UpgradeBackupControllerDegraded: etcdmembers.etcd.operator.openshift.io "sno01049" not found                                                                                            
          reason: EtcdEndpoints_ErrorUpdatingEtcdEndpoints::UpgradeBackupController_Error                                                                                                           
          status: "True"                                                                                                                                                                            
          type: Degraded 
      
      [root@sno00423 ~]# netstat -ntlp | egrep "2380|2379|9978"
      tcp6       0      0 :::2379                 :::*                    LISTEN      9047/etcd           
      tcp6       0      0 :::2380                 :::*                    LISTEN      9047/etcd           
      tcp6       0      0 :::9978                 :::*                    LISTEN      9047/etcd  

      I opened this against etcd however it isn't clear yet what is the actual responsible component for the failures.

        1. must-gather-sno00945.tar.gz
          58.03 MB
        2. must-gather-sno01084.tar.gz
          58.11 MB
        3. must-gather-sno01433.tar.gz
          58.27 MB
        4. must-gather-sno01821.tar.gz
          57.40 MB
        5. sno00423.tar.gz
          36.80 MB
        6. sno00454.tar.gz
          32.81 MB
        7. sno00644.tar.gz
          21.70 MB
        8. sosreport-sno00945-2023-01-17-qiadjng.tar.xz
          28.80 MB
        9. sosreport-sno01084-2023-01-17-efeyylb.tar.xz
          29.20 MB

            rphillip@redhat.com Ryan Phillips
            akrzos@redhat.com Alex Krzos
            ge liu ge liu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: