Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31391

The certified operator crash due to computed digest is different from the cache digest

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • 4.15.z
    • OLM / Registry
    • None
    • Critical
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      As the below shows, the certified-operators pod crashed due to computed digest different from the cache one.

      jiazha-mac:~ jiazha$ oc get pods 
      NAME                                                              READY   STATUS             RESTARTS       AGE
      14fc5dc0a1c98939d5fc4cd6cba7cc131eb09d8dce7edb3c3d54b652d7zscj2   0/1     Completed          0              5d14h
      430fe5b8321219f6673acdda7ac585c22d07f7278ed6e600065efccfc7fqxlz   0/1     Completed          0              5d14h
      4b765aa05d90c72fca6cea04c38cf149984837330f9cc3758f7b5c3e05lm729   0/1     Completed          0              5d14h
      4ce145ddbb1fe97e760101e0dd9e1472b635d115f8ce6881c2601b2061fn9w5   0/1     Completed          0              5d14h
      7153fb6f3e3e3bfae32ab0c103e2cbbf7ad65e4539b5932f56ca89a936mdjsv   0/1     Completed          0              5d14h
      89bb8aa67c480a90ad7936189826bb350918f4d1a81e6791a5f27b4fcc88t4s   0/1     Completed          0              5d14h
      a469edf7affc72c93461a430266d0f5d0e0e0a6b75c65977defdac375f4f6sd   0/1     Completed          0              5d14h
      a7a11e5db913fa0346dbd13fa3f737b35a0d6f533bb99b65cd030d782bfjxzm   0/1     Completed          0              5d14h
      b347e6457c33871c9a8bf642b41abf19d9c058b95380118e39f9bd51bbzs2r2   0/1     Completed          0              5d14h
      ba2b3abb8aa34b6a1a774e0639d7c9584701c28d66a49e18c344c916f52pxp2   0/1     Completed          0              5d14h
      certified-operators-jxnpp                                         0/1     CrashLoopBackOff   35 (47s ago)   5d14h
      community-operators-n55vz                                         1/1     Running            1              5d14h
      marketplace-operator-fc999f7db-p8wgs                              1/1     Running            2 (154m ago)   5d15h
      redhat-marketplace-45mcm                                          1/1     Running            1              5d14h
      redhat-operators-mpvzm                                            1/1     Running            1              5d14h
      
      jiazha-mac:~ jiazha$ oc logs certified-operators-jxnpp 
      Defaulted container "registry-server" out of: registry-server, extract-utilities (init), extract-content (init)
      time="2024-03-26T06:51:24Z" level=info msg="starting pprof endpoint" address="localhost:6060"
      time="2024-03-26T06:51:25Z" level=fatal msg="cache requires rebuild: cache reports digest as \"6e88e679aef6d1a8\", but computed digest is \"0fb07a8d3e69c464\""
      jiazha-mac:~ jiazha$ 
      
      

      Version-Release number of selected component (if applicable):

          jiazha-mac:~ jiazha$ oc get clusterversion 
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.15.2    True        False         5d16h   Cluster version is 4.15.2

      How reproducible:

          Not Always

      Steps to Reproduce:

          1. Install OCP 4.15.2, runing days.
          2. check the default catalog sources pods.
          3.
          

      Actual results:

      The certified-operators pod crash, as follows,

      jiazha-mac:~ jiazha$ oc logs certified-operators-jxnpp 
      Defaulted container "registry-server" out of: registry-server, extract-utilities (init), extract-content (init)
      time="2024-03-26T08:23:47Z" level=info msg="starting pprof endpoint" address="localhost:6060"
      time="2024-03-26T08:23:48Z" level=fatal msg="cache requires rebuild: cache reports digest as \"6e88e679aef6d1a8\", but computed digest is \"0fb07a8d3e69c464\""
      
      jiazha-mac:~ jiazha$ oc logs redhat-operators-mpvzm 
      Defaulted container "registry-server" out of: registry-server, extract-utilities (init), extract-content (init)
      time="2024-03-26T04:21:41Z" level=info msg="starting pprof endpoint" address="localhost:6060"
      time="2024-03-26T04:21:43Z" level=info msg="serving registry" configs=/extracted-catalog/catalog port=50051
      time="2024-03-26T04:21:43Z" level=info msg="stopped caching cpu profile data" address="localhost:6060" 

      Expected results:

      All catalog source pods work well.

      Additional info:

      I tried to check the volume folder `/extracted-catalog` on the host, but nothing was found. As follows,

      jiazha-mac:~ jiazha$ oc get pods certified-operators-jxnpp -o wide 
      NAME                        READY   STATUS             RESTARTS       AGE     IP            NODE                            NOMINATED NODE   READINESS GATES
      certified-operators-jxnpp   0/1     CrashLoopBackOff   47 (90s ago)   5d15h   10.129.0.45   qe3-vmware-ibm-z9gs6-master-1   <none>           <none>
      
      jiazha-mac:~ jiazha$ oc get pods certified-operators-jxnpp -o yaml
      apiVersion: v1
      kind: Pod
      ...
          volumeMounts:
          - mountPath: /utilities
            name: utilities
          - mountPath: /extracted-catalog
            name: catalog-content
      ...
        containerStatuses:
        - containerID: cri-o://f653ed968d723882fdfafe358460ef62ded9b4be91291a4ab6eb1ad6a00b990b
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e
          imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e
          lastState:
            terminated:
              containerID: cri-o://f653ed968d723882fdfafe358460ef62ded9b4be91291a4ab6eb1ad6a00b990b
              exitCode: 1
              finishedAt: "2024-03-26T07:53:08Z"
              message: |
                time="2024-03-26T07:53:07Z" level=info msg="starting pprof endpoint" address="localhost:6060"
                time="2024-03-26T07:53:08Z" level=fatal msg="cache requires rebuild: cache reports digest as \"6e88e679aef6d1a8\", but computed digest is \"0fb07a8d3e69c464\""
              reason: Error
              startedAt: "2024-03-26T07:53:07Z"
          name: registry-server
      ...
      
      jiazha-mac:~ jiazha$ oc debug node/qe3-vmware-ibm-z9gs6-master-1
      Temporary namespace openshift-debug-v6xk5 is created for debugging node...
      Starting pod/qe3-vmware-ibm-z9gs6-master-1-debug-b6656 ...
      To use host binaries, run `chroot /host`
      Pod IP: 150.240.97.241
      If you don't see a command prompt, try pressing enter.
      sh-4.4# chroot /host
      sh-5.1# crictl attach f653ed968d723882fdfafe358460ef62ded9b4be91291a4ab6eb1ad6a00b990b
      FATA[0000] attaching running container failed: Internal error occurred: error attaching to container: rpc error: code = NotFound desc = could not find container "f653ed968d723882fdfafe358460ef62ded9b4be91291a4ab6eb1ad6a00b990b": container with ID starting with f653ed968d723882fdfafe358460ef62ded9b4be91291a4ab6eb1ad6a00b990b not found: ID does not exist 
      sh-5.1# ls -l /var/run/containers/storage/overlay |grep f653ed968d723882
      sh-5.1#

            jlanford@redhat.com Joe Lanford
            rhn-support-jiazha Jian Zhang
            Jia Fan Jia Fan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: