Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31053

4.14 z stream rollback is stuck due to pending CSR

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Undefined Undefined
    • None
    • 4.14, 4.14.z
    • kube-apiserver
    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      I'm testing 4.14 z stream rollback by installing 4.14.0, upgrading to 4.14.0-0.nightly-2024-03-13-015516, then rolling back to 4.14.0. The cluster gets stuck on etcd and kube-apiserver when rolling back to 4.14.0.
      
      # oc get node
      NAME                                        STATUS                        ROLES                  AGE   VERSION
      ip-10-0-58-12.us-east-2.compute.internal    NotReady,SchedulingDisabled   control-plane,master   11h   v1.27.11+d8e449a
      ip-10-0-62-170.us-east-2.compute.internal   NotReady,SchedulingDisabled   worker                 10h   v1.27.11+d8e449a
      ip-10-0-64-20.us-east-2.compute.internal    Ready                         worker                 10h   v1.27.11+d8e449a
      ip-10-0-68-151.us-east-2.compute.internal   Ready                         control-plane,master   11h   v1.27.11+d8e449a
      ip-10-0-79-33.us-east-2.compute.internal    Ready                         control-plane,master   11h   v1.27.11+d8e449a
      ip-10-0-79-89.us-east-2.compute.internal    Ready                         worker                 10h   v1.27.11+d8e449a
      
      # oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.14.0                               True        False         True       2m7s    APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
      baremetal                                  4.14.0                               True        False         False      11h     
      cloud-controller-manager                   4.14.0                               True        False         False      11h     
      cloud-credential                           4.14.0                               True        False         False      11h     
      cluster-autoscaler                         4.14.0                               True        False         False      11h     
      config-operator                            4.14.0                               True        False         False      11h     
      console                                    4.14.0                               True        False         False      4h5m    
      control-plane-machine-set                  4.14.0                               False       True          False      6h39m   Missing 1 available replica(s)
      csi-snapshot-controller                    4.14.0                               True        False         False      11h     
      dns                                        4.14.0                               True        True          False      11h     DNS "default" reports Progressing=True: "Have 4 available node-resolver pods, want 6."
      etcd                                       4.14.0                               True        False         True       11h     NodeControllerDegraded: The master nodes not ready: node "ip-10-0-58-12.us-east-2.compute.internal" not ready since 2024-03-19 07:39:35 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      image-registry                             4.14.0                               True        True          False      10h     Progressing: The deployment has not completed...
      ingress                                    4.14.0                               True        False         False      10h     
      insights                                   4.14.0                               True        False         False      10h     
      kube-apiserver                             4.14.0                               True        False         True       11h     NodeControllerDegraded: The master nodes not ready: node "ip-10-0-58-12.us-east-2.compute.internal" not ready since 2024-03-19 07:39:35 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-controller-manager                    4.14.0                               True        False         True       11h     NodeControllerDegraded: The master nodes not ready: node "ip-10-0-58-12.us-east-2.compute.internal" not ready since 2024-03-19 07:39:35 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-scheduler                             4.14.0                               True        False         True       11h     NodeControllerDegraded: The master nodes not ready: node "ip-10-0-58-12.us-east-2.compute.internal" not ready since 2024-03-19 07:39:35 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-storage-version-migrator              4.14.0                               True        False         False      9h      
      machine-api                                4.14.0                               True        False         False      10h     
      machine-approver                           4.14.0                               True        False         False      11h     
      machine-config                             4.14.0-0.nightly-2024-03-13-015516   False       True          True       6h2m    Cluster not available for [{operator 4.14.0-0.nightly-2024-03-13-015516}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 4, unavailable: 2)]
      marketplace                                4.14.0                               True        False         False      11h     
      monitoring                                 4.14.0                               False       True          True       6h29m   reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of openshift-monitoring/node-exporter: context deadline exceeded
      network                                    4.14.0                               True        True          False      11h     DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 2 nodes)...
      node-tuning                                4.14.0                               True        True          False      6h55m   Working towards "4.14.0"
      openshift-apiserver                        4.14.0                               True        False         True       36m     APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
      openshift-controller-manager               4.14.0                               True        False         False      10h     
      openshift-samples                          4.14.0                               True        False         False      6h56m   
      operator-lifecycle-manager                 4.14.0                               True        False         False      11h     
      operator-lifecycle-manager-catalog         4.14.0                               True        False         False      11h     
      operator-lifecycle-manager-packageserver   4.14.0                               True        False         False      64m     
      service-ca                                 4.14.0                               True        False         False      11h     
      storage                                    4.14.0                               True        True          False      11h     AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
      
          

      Version-Release number of selected component (if applicable):

          4.14.0-0.nightly-2024-03-13-015516
          

      How reproducible:

          100%
          

      Steps to Reproduce:

          1. Install 4.14.0 aws cluster
          2. Upgrade to  4.14.0-0.nightly-2024-03-13-015516
          3. Roll back to 4.14.0
          

      Actual results:

          Rollback gets stuck
          

      Expected results:

          Rollback passes
          

      Additional info:

      # oc get csr
      NAME        AGE     SIGNERNAME                                    REQUESTOR                                               REQUESTEDDURATION   CONDITION
      csr-47mw2   6h25m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-48wvf   151m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-492bl   4h53m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-5fx9c   46m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-5sftr   28m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-5szpc   138m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-6n8bx   6h40m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-6t5jk   3h51m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-78fjh   3h18m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-8gnsc   58m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-8j7vb   74m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-8vffh   169m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-9hm5l   5h8m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-9l797   167m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-9tdkw   89m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-blg96   6h10m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-bpkss   6h38m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-btzxc   4h7m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-cjhqr   76m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-d68f7   43m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-dw6d6   4h35m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-fbd7r   6h37m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-fh4km   3h2m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-g5w56   5h6m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-gbz75   30m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-gjs2v   4h20m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-glggp   61m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-h64nk   15m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-jnq8l   3h20m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-jvbr6   4h50m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-k74ht   6h40m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-l5jr9   12m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-lc9cm   5h52m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-mvnhm   154m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-nsvhr   4h38m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-pgc8m   5h37m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-pxm7m   136m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-r5tbv   105m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-r5zss   120m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-r7lcz   6h22m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-rhh8n   4h22m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-rnw4c   3h36m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-rqlj8   107m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-rspvk   3h5m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-tqmb9   5h21m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-vc66v   5h39m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-vtxc7   92m     kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-wdk2p   5h24m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-ws4wb   123m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-wsnk5   3h49m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-x9vkx   5h55m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-62-170.us-east-2.compute.internal   <none>              Pending
      csr-xpgxj   6h7m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-zf479   3h33m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
      csr-zz4hc   4h4m    kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-58-12.us-east-2.compute.internal    <none>              Pending
          

      Must gather is available here https://drive.google.com/file/d/17gTjlws_S45LMOFSBkTdrFL66PTuEdnx/view?usp=sharing

              Unassigned Unassigned
              yanyang@redhat.com Yang Yang
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: