Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31438

Default catalog source pod never get updates

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • 4.16.0
    • 4.15.z, 4.16.0
    • OLM
    • None
    • Critical
    • No
    • Quality OLM Sprint 251
    • 1
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the default catalog source pod would not receive updates, requiring users to manually recreate it to get updates. This was caused by image IDs for catalog pods not getting detected correctly. This bug fix updates {olm} to correctly detect catalog pod image IDs, and as a result, default catalog sources are updated as expected. (link:https://issues.redhat.com/browse/OCPBUGS-31438[*OCPBUGS-31438*])
      Show
      * Previously, the default catalog source pod would not receive updates, requiring users to manually recreate it to get updates. This was caused by image IDs for catalog pods not getting detected correctly. This bug fix updates {olm} to correctly detect catalog pod image IDs, and as a result, default catalog sources are updated as expected. (link: https://issues.redhat.com/browse/OCPBUGS-31438 [* OCPBUGS-31438 *])
    • Bug Fix
    • Done

      Description of problem:

      The default catalog source pod never gets updates, the users have to manually recreate it to get updated. Here is must-gather log for your debugging: https://drive.google.com/file/d/16_tFq5QuJyc_n8xkDFyK83TdTkrsVFQe/view?usp=drive_link 

      I went through the code and found the `updateStrategy` depends on the `ImageID`, see

      https://github.com/openshift/operator-framework-olm/blob/master/staging/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go#L527-L534 

      // imageID returns the ImageID of the primary catalog source container or an empty string if the image ID isn't available yet.
      // Note: the pod must be running and the container in a ready status to return a valid ImageID.
      func imageID(pod *corev1.Pod) string {
       if len(pod.Status.ContainerStatuses) < 1 {
       logrus.WithField("CatalogSource", pod.GetName()).Warn("pod status unknown")
       return ""
       }
       return pod.Status.ContainerStatuses[0].ImageID
      }
      
      

      But, for those default catalog source pods, their `pod.Status.ContainerStatuses[0].ImageID` will never change since it's the `opm` image, not index image.

      jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.containerStatuses} |jq
      [
        {
          "containerID": "cri-o://115bd207312c7c8c36b63bfd251c085a701c58df2a48a1232711e15d7595675d",
          "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e",
          "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e",
          "lastState": {},
          "name": "registry-server",
          "ready": true,
          "restartCount": 1,
          "started": true,
          "state": {
            "running": {
              "startedAt": "2024-03-26T04:21:41Z"
            }
          }
        }
      ] 

      The imageID() func should return the index image ID for those default catalog sources.

      jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.initContainerStatuses[1]} |jq
      {
        "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c",
        "image": "registry.redhat.io/redhat/redhat-operator-index:v4.15",
        "imageID": "registry.redhat.io/redhat/redhat-operator-index@sha256:19010760d38e1a898867262698e22674d99687139ab47173e2b4665e588635e1",
        "lastState": {},
        "name": "extract-content",
        "ready": true,
        "restartCount": 1,
        "started": false,
        "state": {
          "terminated": {
            "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c",
            "exitCode": 0,
            "finishedAt": "2024-03-26T04:21:39Z",
            "reason": "Completed",
            "startedAt": "2024-03-26T04:21:27Z"
          }
        }
      } 

      Version-Release number of selected component (if applicable):

          4.15.2

      How reproducible:

          always

      Steps to Reproduce:

          1. Install an OCP 4.16.0
          2. Waiting for the redhat-operator catalog source updates
          3.
          

      Actual results:

      The redhat-operator catalog source never gets updates.

      Expected results:

      These default catalog source should get updates depending on the `updateStrategy`.

          jiazha-mac:~ jiazha$ oc get catalogsource redhat-operators -o yaml
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      metadata:
        annotations:
          operatorframework.io/managed-by: marketplace-operator
          target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
        creationTimestamp: "2024-03-20T15:48:59Z"
        generation: 1
        name: redhat-operators
        namespace: openshift-marketplace
        resourceVersion: "12217605"
        uid: cc0fc420-c9d8-4c7d-997e-f0893b4c497f
      spec:
        displayName: Red Hat Operators
        grpcPodConfig:
          extractContent:
            cacheDir: /tmp/cache
            catalogDir: /configs
          memoryTarget: 30Mi
          nodeSelector:
            kubernetes.io/os: linux
            node-role.kubernetes.io/master: ""
          priorityClassName: system-cluster-critical
          securityContextConfig: restricted
          tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/master
            operator: Exists
          - effect: NoExecute
            key: node.kubernetes.io/unreachable
            operator: Exists
            tolerationSeconds: 120
          - effect: NoExecute
            key: node.kubernetes.io/not-ready
            operator: Exists
            tolerationSeconds: 120
        icon:
          base64data: ""
          mediatype: ""
        image: registry.redhat.io/redhat/redhat-operator-index:v4.15
        priority: -100
        publisher: Red Hat
        sourceType: grpc
        updateStrategy:
          registryPoll:
            interval: 10m
      status:
        connectionState:
          address: redhat-operators.openshift-marketplace.svc:50051
          lastConnect: "2024-03-27T06:35:36Z"
          lastObservedState: READY
        latestImageRegistryPoll: "2024-03-27T10:23:16Z"
        registryService:
          createdAt: "2024-03-20T15:56:03Z"
          port: "50051"
          protocol: grpc
          serviceName: redhat-operators
          serviceNamespace: openshift-marketplace

      Additional info:

      I also checked the currentPodsWithCorrectImageAndSpec, but no hash changed due to the pod.spec are the same always.

      time="2024-03-26T03:22:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace
      time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW
      time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW
      time="2024-03-26T03:27:02Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA
      time="2024-03-26T03:27:03Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA

              jlanford@redhat.com Joe Lanford
              rhn-support-jiazha Jian Zhang
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: