Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29194

Operator installation/upgrade fails with "Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline - 4.14.z

XMLWordPrintable

    • Moderate
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Operator installation/upgrade fails stating: "Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline"

      Version-Release number of selected component (if applicable):

      4.10

      How reproducible:

      oc -n openshift-marketplace get job 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e -o yaml
      apiVersion: batch/v1
      kind: Job
      metadata:
        creationTimestamp: "2022-08-04T12:54:19Z"
        generation: 1
        labels:
          controller-uid: e236f157-ab03-4153-b095-b6b1a97ef3c8
          job-name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
        name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
        namespace: openshift-marketplace
        ownerReferences:
        - apiVersion: v1
          blockOwnerDeletion: false
          controller: false
          kind: ConfigMap
          name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
          uid: 2d6d332d-e680-4828-b97f-e6024b34575b
        resourceVersion: "1299311475"
        uid: e236f157-ab03-4153-b095-b6b1a97ef3c8
      spec:
        activeDeadlineSeconds: 600
        backoffLimit: 3
        completionMode: NonIndexed
        completions: 1
        parallelism: 1
        selector:
          matchLabels:
            controller-uid: e236f157-ab03-4153-b095-b6b1a97ef3c8
        suspend: false
        template:
          metadata:
            creationTimestamp: null
            labels:
              controller-uid: e236f157-ab03-4153-b095-b6b1a97ef3c8
              job-name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
            name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
          spec:
            containers:
            - command:
              - opm
              - alpha
              - bundle
              - extract
              - -m
              - /bundle/
              - -n
              - openshift-marketplace
              - -c
              - 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
              - -z
              env:
              - name: CONTAINER_IMAGE
                value: registry.redhat.io/openshift-logging/cluster-logging-operator-bundle@sha256:d19c4b7b67a70b46b6b3ac43b2f285cc19c52f2795c8dfbea4315bd06e7485ca
              image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8de7a35f7ca26e678b8e3d8bf5fa6aa80b84287413247dc031a785d0d139698c
              imagePullPolicy: IfNotPresent
              name: extract
              resources:
                requests:
                  cpu: 10m
                  memory: 50Mi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
              - mountPath: /bundle
                name: bundle
            dnsPolicy: ClusterFirst
            initContainers:
            - command:
              - /bin/cp
              - -Rv
              - /bin/cpb
              - /util/cpb
              image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cc477d763835d8c874b050223261dde5bcd73429f0cb55aa7f7cde3df892ce0f
              imagePullPolicy: IfNotPresent
              name: util
              resources:
                requests:
                  cpu: 10m
                  memory: 50Mi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
              - mountPath: /util
                name: util
            - command:
              - /util/cpb
              - /bundle
              image: registry.redhat.io/openshift-logging/cluster-logging-operator-bundle@sha256:d19c4b7b67a70b46b6b3ac43b2f285cc19c52f2795c8dfbea4315bd06e7485ca
              imagePullPolicy: Always
              name: pull
              resources:
                requests:
                  cpu: 10m
                  memory: 50Mi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
              - mountPath: /bundle
                name: bundle
              - mountPath: /util
                name: util
            restartPolicy: Never
            schedulerName: default-scheduler
            securityContext: {}
            terminationGracePeriodSeconds: 30
            volumes:
            - emptyDir: {}
              name: bundle
            - emptyDir: {}
              name: util
      status:
        conditions:
        - lastProbeTime: "2022-08-04T13:04:19Z"
          lastTransitionTime: "2022-08-04T13:04:19Z"
          message: Job was active longer than specified deadline
          reason: DeadlineExceeded
          status: "True"
          type: Failed
        failed: 1
        startTime: "2022-08-04T12:54:19Z"
      
      
      oc -n openshift-logging get installplan install-qzrfp -o yaml
      apiVersion: operators.coreos.com/v1alpha1
      kind: InstallPlan
      metadata:
        creationTimestamp: "2022-08-04T12:54:19Z"
        generateName: install-
        generation: 1
        labels:
          operators.coreos.com/cluster-logging.openshift-logging: ""
        name: install-qzrfp
        namespace: openshift-logging
        ownerReferences:
        - apiVersion: operators.coreos.com/v1alpha1
          blockOwnerDeletion: false
          controller: false
          kind: Subscription
          name: cluster-logging-subscription
          uid: 48580ca3-bd57-449e-84ec-84efc8c8035d
        resourceVersion: "1299311512"
        uid: cd93ba60-b8db-448f-9239-1c8b15059eef
      spec:
        approval: Automatic
        approved: true
        clusterServiceVersionNames:
        - cluster-logging.5.4.4
        generation: 26
      status:
        bundleLookups:
        - catalogSourceRef:
            name: redhat-operators
            namespace: openshift-marketplace
          conditions:
          - message: bundle contents have not yet been persisted to installplan status
            reason: BundleNotUnpacked
            status: "True"
            type: BundleLookupNotPersisted
          - lastTransitionTime: "2022-08-04T12:54:19Z"
            message: 'unpack job not completed: Unpack pod(openshift-marketplace/14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4d5l7rv)
              container(pull) is pending. Reason: ImagePullBackOff, Message: Back-off pulling
              image "registry.redhat.io/openshift-logging/cluster-logging-operator-bundle@sha256:d19c4b7b67a70b46b6b3ac43b2f285cc19c52f2795c8dfbea4315bd06e7485ca"'
            reason: JobIncomplete
            status: "True"
            type: BundleLookupPending
          - lastTransitionTime: "2022-08-04T13:04:20Z"
            message: Job was active longer than specified deadline
            reason: DeadlineExceeded
            status: "True"
            type: BundleLookupFailed
          identifier: cluster-logging.5.4.4
          path: registry.redhat.io/openshift-logging/cluster-logging-operator-bundle@sha256:d19c4b7b67a70b46b6b3ac43b2f285cc19c52f2795c8dfbea4315bd06e7485ca
          properties: '{"properties":[{"type":"olm.package","value":{"packageName":"cluster-logging","version":"5.4.4"}},{"type":"olm.maxOpenShiftVersion","value":"4.11"},{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogForwarder","version":"v1"}},{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogging","version":"v1"}}]}'
          replaces: cluster-logging.5.4.3
        catalogSources: []
        conditions:
        - lastTransitionTime: "2022-08-04T13:04:20Z"
          lastUpdateTime: "2022-08-04T13:04:20Z"
          message: 'Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job
            was active longer than specified deadline'
          reason: InstallCheckFailed
          status: "False"
          type: Installed
        phase: Failed
      
      The solution from https://access.redhat.com/solutions/6459071 works and helps to eventually complete the Operator upgrade. But it's rather nasty if this kind of activtiy needs to be done on +10 OpenShift Container Platform 4 - Cluster and it's therefore requested to further investigate the root cause and make the overall process more robust.

      Steps to Reproduce:

      Seen often when upgrading Operators

      Actual results:

      Operator upgrade is failing and steps from https://access.redhat.com/solutions/6459071 needs to be applied to resume and eventually complete the upgrade
      

      Expected results:

      Operator upgrade should complete as expected without hitting problem even when there are certain resource or networking constrains. The timeout should be big enough to cope with many different situation/conditon and otherwise should report what is causing the problem.

      Additional info:

      https://access.redhat.com/solutions/6459071
      
      Around 100+ cases have used above article to resolve this issue and a large number of people are affected.

              ankithom Ankita Thomas
              rhn-support-dahernan David Hernandez Fernandez
              Xia Zhao Xia Zhao
              Daniel Messer
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: