Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5956

Operators fail to upgrade because "bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline"

XMLWordPrintable

    • Grumpy 241
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      While upgrading both platform and operators of 3423 SNOs, 9 clusters failed to upgrade any of their operators because each installplan is reporting "bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline"

      Version-Release number of selected component (if applicable):

      SNO OCP 4.10.32 (Clusters with issue) attempting to be upgraded to 4.11.5
      Hub OCP 4.11.19
      ACM Version - 2.7.0-DOWNSTREAM-2023-01-12-20-55-01
      Operators being upgraded from the v4.10 to the v4.11 operators catalog

      How reproducible:

      9 out of 3423 cluster upgrades
      9 of the 84 total failures

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

      Example cluster sno00801:

      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno00801/kubeconfig get installplan -A
      NAMESPACE                          NAME            CSV                                          APPROVAL   APPROVED
      openshift-local-storage            install-7pj97   local-storage-operator.4.10.0-202212061900   Manual     true
      openshift-local-storage            install-xdwsj   local-storage-operator.4.11.0-202212070335   Manual     true
      openshift-logging                  install-nbxlj   cluster-logging.5.5.5                        Manual     true
      openshift-ptp                      install-2mlg4   ptp-operator.4.11.0-202301031954             Manual     true
      openshift-ptp                      install-m77t6   ptp-operator.4.10.0-202212072254             Manual     true
      openshift-sriov-network-operator   install-n2rvh   sriov-network-operator.4.10.0-202212061900   Manual     true
      openshift-sriov-network-operator   install-rffhx   sriov-network-operator.4.11.0-202212071535   Manual     true
      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno00801/kubeconfig get csv -A
      NAMESPACE                              NAME                                         DISPLAY                     VERSION               REPLACES   PHASE
      openshift-local-storage                local-storage-operator.4.10.0-202212061900   Local Storage               4.10.0-202212061900              Succeeded
      openshift-logging                      cluster-logging.5.5.5                        Red Hat OpenShift Logging   5.5.5                            Succeeded
      openshift-operator-lifecycle-manager   packageserver                                Package Server              0.19.0                           Succeeded
      openshift-ptp                          ptp-operator.4.10.0-202212072254             PTP Operator                4.10.0-202212072254              Succeeded
      openshift-sriov-network-operator       sriov-network-operator.4.10.0-202212061900   SR-IOV Network Operator     4.10.0-202212061900              Succeeded
      

      Note all approved installplans however none of the operator's CSVs are to the version expected

      en looking at the openshift-marketplace namespace:

       

      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno00801/kubeconfig get po,job -n openshift-marketplace
      NAME                                                                  READY   STATUS      RESTARTS   AGE
      pod/5884db547fb0aebac3a93dece7eb6effaf706e25a01d46ce23f25ff2cffvg8r   0/1     Completed   0          3d17h
      pod/7ff2113420a370bd4ca107c3800ace5ad581a2335a355d77de3b9b6f5bqw6d5   0/1     Completed   0          3d17h
      pod/9d47955b1a539d0aa8d707cf941bfeef574528d0cb67b9270e0eb0aafcl25gj   0/1     Completed   0          3d17h
      pod/bffb5564f8c0b54d67e6a72a609648bfb70e05d03a6a7f9fc970a57451vdv8t   0/1     Completed   0          3d17h
      pod/e77265ee6f6e18fe1204e4bbec687b3929f866b3eb14dde98599ce3f74frhr5   0/1     Completed   0          3d17h
      pod/marketplace-operator-6fd78976f6-xfkzk                             1/1     Running     2          2d5h
      pod/rh-du-operators-kjq9t                                             1/1     Running     0          2d4hNAME                                                                        COMPLETIONS   DURATION   AGE
      job.batch/37e8a8637099e9504d1b0862d0efa22ad127781f6cd58ca0b950e996b853552   0/1           2d4h       2d4h
      job.batch/5884db547fb0aebac3a93dece7eb6effaf706e25a01d46ce23f25ff2cf5dcd2   1/1           11s        3d17h
      job.batch/7ff2113420a370bd4ca107c3800ace5ad581a2335a355d77de3b9b6f5b62769   1/1           44s        3d17h
      job.batch/9d47955b1a539d0aa8d707cf941bfeef574528d0cb67b9270e0eb0aafc30b21   1/1           9s         3d17h
      job.batch/bffb5564f8c0b54d67e6a72a609648bfb70e05d03a6a7f9fc970a574518c652   1/1           30s        3d17h
      job.batch/cf5c1fe37d69824891c1587cbe87cf59ff99e134dd96107ce8ad26b8af2c4b2   0/1           2d4h       2d4h
      job.batch/e4ec62041f18f79682d92d06da3b06b8a4aee8a1e30ecdf502839389363354b   0/1           2d4h       2d4h
      job.batch/e77265ee6f6e18fe1204e4bbec687b3929f866b3eb14dde98599ce3f74e7f8c   1/1           40s        3d17h

      3 noncompleted jobs for the 3 operators we expect to be upgraded

      And if we inspect the installplans we see:

      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno00801/kubeconfig get installplan -A -o json | jq '.items[] | "\(.metadata.namespace) \(.metadata.name)  \(.status.conditions[] | .message)"'
      "openshift-local-storage install-7pj97  null"
      "openshift-local-storage install-xdwsj  bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline"
      "openshift-logging install-nbxlj  null"
      "openshift-ptp install-2mlg4  bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline"
      "openshift-ptp install-m77t6  null"
      "openshift-sriov-network-operator install-n2rvh  null"
      "openshift-sriov-network-operator install-rffhx  bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline"
      

       

            ankithom Ankita Thomas
            akrzos@redhat.com Alex Krzos
            Jian Zhang Jian Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: