Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9463

Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline

XMLWordPrintable

    • Moderate
    • OPECO 233
    • 1
    • Rejected
    • x86_64
    • If docs needed, set a value

      Description of problem:

      Checking https://bugzilla.redhat.com/show_bug.cgi?id=1921264 and https://bugzilla.redhat.com/show_bug.cgi?id=2014308 it seems that the problem with `Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline` during Operator upgrade should be resolved or no longer happening.

      But we are still seeing the error reported on OpenShift Container Platform 4.10.15 and 4.10.24, recently especially during Cluster Logging 5.4.4 updates.

      oc -n openshift-marketplace get job 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e -o yaml
      apiVersion: batch/v1
      kind: Job
      metadata:
      creationTimestamp: "2022-08-04T12:54:19Z"
      generation: 1
      labels:
      controller-uid: e236f157-ab03-4153-b095-b6b1a97ef3c8
      job-name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
      name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
      namespace: openshift-marketplace
      ownerReferences:

      • apiVersion: v1
        blockOwnerDeletion: false
        controller: false
        kind: ConfigMap
        name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
        uid: 2d6d332d-e680-4828-b97f-e6024b34575b
        resourceVersion: "1299311475"
        uid: e236f157-ab03-4153-b095-b6b1a97ef3c8
        spec:
        activeDeadlineSeconds: 600
        backoffLimit: 3
        completionMode: NonIndexed
        completions: 1
        parallelism: 1
        selector:
        matchLabels:
        controller-uid: e236f157-ab03-4153-b095-b6b1a97ef3c8
        suspend: false
        template:
        metadata:
        creationTimestamp: null
        labels:
        controller-uid: e236f157-ab03-4153-b095-b6b1a97ef3c8
        job-name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
        name: 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
        spec:
        containers:
      • command:
      • opm
      • alpha
      • bundle
      • extract
      • -m
      • /bundle/
      • -n
      • openshift-marketplace
      • -c
      • 14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4dec25e
      • -z
        env:
      • name: CONTAINER_IMAGE
        value: registry.redhat.io/openshift-logging/cluster-logging-operator-bundle@sha256:d19c4b7b67a70b46b6b3ac43b2f285cc19c52f2795c8dfbea4315bd06e7485ca
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8de7a35f7ca26e678b8e3d8bf5fa6aa80b84287413247dc031a785d0d139698c
        imagePullPolicy: IfNotPresent
        name: extract
        resources:
        requests:
        cpu: 10m
        memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
      • mountPath: /bundle
        name: bundle
        dnsPolicy: ClusterFirst
        initContainers:
      • command:
      • /bin/cp
      • -Rv
      • /bin/cpb
      • /util/cpb
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cc477d763835d8c874b050223261dde5bcd73429f0cb55aa7f7cde3df892ce0f
        imagePullPolicy: IfNotPresent
        name: util
        resources:
        requests:
        cpu: 10m
        memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
      • mountPath: /util
        name: util
      • command:
      • /util/cpb
      • /bundle
        image: registry.redhat.io/openshift-logging/cluster-logging-operator-bundle@sha256:d19c4b7b67a70b46b6b3ac43b2f285cc19c52f2795c8dfbea4315bd06e7485ca
        imagePullPolicy: Always
        name: pull
        resources:
        requests:
        cpu: 10m
        memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
      • mountPath: /bundle
        name: bundle
      • mountPath: /util
        name: util
        restartPolicy: Never
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
        volumes:
      • emptyDir: {}
        name: bundle
      • emptyDir: {}
        name: util
        status:
        conditions:
      • lastProbeTime: "2022-08-04T13:04:19Z"
        lastTransitionTime: "2022-08-04T13:04:19Z"
        message: Job was active longer than specified deadline
        reason: DeadlineExceeded
        status: "True"
        type: Failed
        failed: 1
        startTime: "2022-08-04T12:54:19Z"

      oc -n openshift-logging get installplan install-qzrfp -o yaml
      apiVersion: operators.coreos.com/v1alpha1
      kind: InstallPlan
      metadata:
      creationTimestamp: "2022-08-04T12:54:19Z"
      generateName: install-
      generation: 1
      labels:
      operators.coreos.com/cluster-logging.openshift-logging: ""
      name: install-qzrfp
      namespace: openshift-logging
      ownerReferences:

      • apiVersion: operators.coreos.com/v1alpha1
        blockOwnerDeletion: false
        controller: false
        kind: Subscription
        name: cluster-logging-subscription
        uid: 48580ca3-bd57-449e-84ec-84efc8c8035d
        resourceVersion: "1299311512"
        uid: cd93ba60-b8db-448f-9239-1c8b15059eef
        spec:
        approval: Automatic
        approved: true
        clusterServiceVersionNames:
      • cluster-logging.5.4.4
        generation: 26
        status:
        bundleLookups:
      • catalogSourceRef:
        name: redhat-operators
        namespace: openshift-marketplace
        conditions:
      • message: bundle contents have not yet been persisted to installplan status
        reason: BundleNotUnpacked
        status: "True"
        type: BundleLookupNotPersisted
      • lastTransitionTime: "2022-08-04T12:54:19Z"
        message: 'unpack job not completed: Unpack pod(openshift-marketplace/14359dfdd866df54d278e75b42202a5af9ce0cefdf416216dd11e09e4d5l7rv)
        container(pull) is pending. Reason: ImagePullBackOff, Message: Back-off pulling
        image "registry.redhat.io/openshift-logging/cluster-logging-operator-bundle@sha256:d19c4b7b67a70b46b6b3ac43b2f285cc19c52f2795c8dfbea4315bd06e7485ca"'
        reason: JobIncomplete
        status: "True"
        type: BundleLookupPending
      • lastTransitionTime: "2022-08-04T13:04:20Z"
        message: Job was active longer than specified deadline
        reason: DeadlineExceeded
        status: "True"
        type: BundleLookupFailed
        identifier: cluster-logging.5.4.4
        path: registry.redhat.io/openshift-logging/cluster-logging-operator-bundle@sha256:d19c4b7b67a70b46b6b3ac43b2f285cc19c52f2795c8dfbea4315bd06e7485ca
        properties: '{"properties":[{"type":"olm.package","value":{"packageName":"cluster-logging","version":"5.4.4"}}, {"type":"olm.maxOpenShiftVersion","value":"4.11"}

        ,{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogForwarder","version":"v1"}},{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogging","version":"v1"}}]}'
        replaces: cluster-logging.5.4.3
        catalogSources: []
        conditions:

      • lastTransitionTime: "2022-08-04T13:04:20Z"
        lastUpdateTime: "2022-08-04T13:04:20Z"
        message: 'Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job
        was active longer than specified deadline'
        reason: InstallCheckFailed
        status: "False"
        type: Installed
        phase: Failed

      The solution from https://access.redhat.com/solutions/6459071 works and helps to eventually complete the Operator upgrade. But it's rather nasty if this kind of activtiy needs to be done on +10 OpenShift Container Platform 4 - Cluster and it's therefore requested to further investigate the root cause and make the overall process more robust.

      Version-Release number of selected component (if applicable):

      • OpenShift Container Platform 4.10.15 and 4.10.24

      How reproducible:

      • Random/unclear

      Steps to Reproduce:
      1. Was seen rather often when Cluster Logging 5.4.4 was made available

      Actual results:

      Operator upgrade is failing and steps from https://access.redhat.com/solutions/6459071 needs to be applied to resume and eventually complete the upgrade

      Expected results:

      Operator upgrade should complete as expected without hitting problem even when there are certain resource or networking constrains. The timeout should be big enough to cope with many different situation/conditon and otherwise should report what is causing the problem.

      Additional info:

            anik120 Anik Bhattacharjee
            rhn-support-sreber Simon Reber
            Xia Zhao Xia Zhao
            Red Hat Employee
            Per Goncalves da Silva
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved: