Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18468

Upgrade Recovery Fails on etcd Redeployment

XMLWordPrintable

    • Important
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      8/15: fix merged, should be ON_QA & off this list soon
      8/8: fix tested with upgrade-recovery qe pipeline (status: passed); awaiting lab from qe to re-run test again to proceed with merge.
      Show
      8/15: fix merged, should be ON_QA & off this list soon 8/8: fix tested with upgrade-recovery qe pipeline (status: passed); awaiting lab from qe to re-run test again to proceed with merge.

      This is a clone of issue OCPBUGS-16032. The following is the description of the original issue:

      Description of problem:

      Upgrade recovery fails during the second phase due to a deprecated command in upgrade-recovery.sh. Summary of errors:
      
      #/var/recovery/upgrade-recovery.sh --resume
      ...
      Deprecated: Use `etcdutl snapshot status`
      
      
      

      Version-Release number of selected component (if applicable):

      Upgrade 4.13.z to 4.14

      How reproducible:

      Every Time

      Steps to Reproduce:

      1. Hub cluster running OCP 4.14, TALM 4.14, ZTP 4.14
      2. Spoke Cluster running 4.13
      3. Trigger platform-upgrade (ocp-far-edge-vran-upgrade-recovery pipeline can be used for this)
         3a. While upgrade is progressing, interrupt process by running /var/recovery/upgrade-recovery.sh
         3b. upgrade-recovery.sh first phase succeeds.
         3c. Manually reboot spoke cluster as per upgrade-recovery.sh output.
         3d. Run /var/recovery/upgrade-recovery.sh --resume
      4. upgrade-recovery.sh fails with `Deprecated: Use `etcdutl snapshot status`

      Actual results:

      upgrade-recovery.sh fails when redeploying etcd.

      Expected results:

      upgrade-recovery.sh succeeds. 

      Additional info:

      Summary of output from /var/recovery/upgrade-recovery.sh
      
      "##### Tue Jul  4 22:10:27 UTC 2023: Completed restoring /var/lib/kubelet content",
              "##### Tue Jul  4 22:10:27 UTC 2023: Starting crio.service",
              "##### Tue Jul  4 22:10:27 UTC 2023: Restoring cluster",
              "etcdctl is already installed",
              "{\"hash\":4055618727,\"revision\":62187,\"totalKey\":13255,\"totalSize\":93892608}",
              "...stopping kube-apiserver-pod.yaml",
              "...stopping kube-controller-manager-pod.yaml",
              "...stopping kube-scheduler-pod.yaml",
              "...stopping etcd-pod.yaml",
              "Waiting for container etcd to stop",
              "complete",
              "Waiting for container etcdctl to stop",
              "complete",
              "Waiting for container etcd-metrics to stop",
              "complete",
              "Waiting for container kube-controller-manager to stop",
              "complete",
              "Waiting for container kube-apiserver to stop",
              "complete",
              "Waiting for container kube-scheduler to stop",
              "complete",
              "Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup",
              "starting restore-etcd static pod",
              "starting kube-apiserver-pod.yaml",
              "static-pod-resources/kube-apiserver-pod-6/kube-apiserver-pod.yaml",
              "starting kube-controller-manager-pod.yaml",
              "static-pod-resources/kube-controller-manager-pod-10/kube-controller-manager-pod.yaml",
              "starting kube-scheduler-pod.yaml",
              "static-pod-resources/kube-scheduler-pod-7/kube-scheduler-pod.yaml",
              "##### Tue Jul  4 22:10:29 UTC 2023: Restarting kubelet.service",
              "##### Tue Jul  4 22:10:30 UTC 2023: Restarting crio.service",
              "##### Tue Jul  4 22:10:31 UTC 2023: Waiting for required container restarts",
              "##### Tue Jul  4 22:10:31 UTC 2023: Waiting for etcd container to restart",
              ".",
              "##### Tue Jul  4 22:10:42 UTC 2023: etcd container restarted",
              "##### Tue Jul  4 22:10:42 UTC 2023: Waiting for etcd-operator container to restart",
              "........",
              "##### Tue Jul  4 22:12:08 UTC 2023: etcd-operator container restarted",
              "##### Tue Jul  4 22:12:08 UTC 2023: Waiting for kube-apiserver-operator container to restart",
              "",
              "##### Tue Jul  4 22:12:10 UTC 2023: kube-apiserver-operator container restarted",
              "##### Tue Jul  4 22:12:10 UTC 2023: Waiting for kube-controller-manager-operator container to restart",
              "",
              "##### Tue Jul  4 22:12:11 UTC 2023: kube-controller-manager-operator container restarted",
              "##### Tue Jul  4 22:12:11 UTC 2023: Waiting for kube-scheduler-operator-container container to restart",
              "",
              "##### Tue Jul  4 22:12:11 UTC 2023: kube-scheduler-operator-container container restarted",
              "##### Tue Jul  4 22:12:11 UTC 2023: Required containers have restarted",
              "##### Tue Jul  4 22:12:11 UTC 2023: Triggering redeployments",
              "##### Tue Jul  4 22:12:11 UTC 2023: Triggering etcd redeployment"
      
       Deprecated: Use `etcdutl snapshot status`   

              jche@redhat.com Jun Chen
              openshift-crt-jira-prow OpenShift Prow Bot
              Periyamaruthu Mohanraj Periyamaruthu Mohanraj
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: