Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31438

Default catalog source pod never get updates

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • 4.16.0
    • 4.15.z
    • OLM
    • None
    • Critical
    • No
    • Quality OLM Sprint 251
    • 1
    • Approved
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The default catalog source pod never gets updates, the users have to manually recreate it to get updated. Here is must-gather log for your debugging: https://drive.google.com/file/d/16_tFq5QuJyc_n8xkDFyK83TdTkrsVFQe/view?usp=drive_link 

      I went through the code and found the `updateStrategy` depends on the `ImageID`, see

      https://github.com/openshift/operator-framework-olm/blob/master/staging/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go#L527-L534 

      // imageID returns the ImageID of the primary catalog source container or an empty string if the image ID isn't available yet.
      // Note: the pod must be running and the container in a ready status to return a valid ImageID.
      func imageID(pod *corev1.Pod) string {
       if len(pod.Status.ContainerStatuses) < 1 {
       logrus.WithField("CatalogSource", pod.GetName()).Warn("pod status unknown")
       return ""
       }
       return pod.Status.ContainerStatuses[0].ImageID
      }
      
      

      But, for those default catalog source pods, their `pod.Status.ContainerStatuses[0].ImageID` will never change since it's the `opm` image, not index image.

      jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.containerStatuses} |jq
      [
        {
          "containerID": "cri-o://115bd207312c7c8c36b63bfd251c085a701c58df2a48a1232711e15d7595675d",
          "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e",
          "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e",
          "lastState": {},
          "name": "registry-server",
          "ready": true,
          "restartCount": 1,
          "started": true,
          "state": {
            "running": {
              "startedAt": "2024-03-26T04:21:41Z"
            }
          }
        }
      ] 

      The imageID() func should return the index image ID for those default catalog sources.

      jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.initContainerStatuses[1]} |jq
      {
        "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c",
        "image": "registry.redhat.io/redhat/redhat-operator-index:v4.15",
        "imageID": "registry.redhat.io/redhat/redhat-operator-index@sha256:19010760d38e1a898867262698e22674d99687139ab47173e2b4665e588635e1",
        "lastState": {},
        "name": "extract-content",
        "ready": true,
        "restartCount": 1,
        "started": false,
        "state": {
          "terminated": {
            "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c",
            "exitCode": 0,
            "finishedAt": "2024-03-26T04:21:39Z",
            "reason": "Completed",
            "startedAt": "2024-03-26T04:21:27Z"
          }
        }
      } 

      Version-Release number of selected component (if applicable):

          4.15.2

      How reproducible:

          always

      Steps to Reproduce:

          1. Install an OCP 4.16.0
          2. Waiting for the redhat-operator catalog source updates
          3.
          

      Actual results:

      The redhat-operator catalog source never gets updates.

      Expected results:

      These default catalog source should get updates depending on the `updateStrategy`.

          jiazha-mac:~ jiazha$ oc get catalogsource redhat-operators -o yaml
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      metadata:
        annotations:
          operatorframework.io/managed-by: marketplace-operator
          target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
        creationTimestamp: "2024-03-20T15:48:59Z"
        generation: 1
        name: redhat-operators
        namespace: openshift-marketplace
        resourceVersion: "12217605"
        uid: cc0fc420-c9d8-4c7d-997e-f0893b4c497f
      spec:
        displayName: Red Hat Operators
        grpcPodConfig:
          extractContent:
            cacheDir: /tmp/cache
            catalogDir: /configs
          memoryTarget: 30Mi
          nodeSelector:
            kubernetes.io/os: linux
            node-role.kubernetes.io/master: ""
          priorityClassName: system-cluster-critical
          securityContextConfig: restricted
          tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/master
            operator: Exists
          - effect: NoExecute
            key: node.kubernetes.io/unreachable
            operator: Exists
            tolerationSeconds: 120
          - effect: NoExecute
            key: node.kubernetes.io/not-ready
            operator: Exists
            tolerationSeconds: 120
        icon:
          base64data: ""
          mediatype: ""
        image: registry.redhat.io/redhat/redhat-operator-index:v4.15
        priority: -100
        publisher: Red Hat
        sourceType: grpc
        updateStrategy:
          registryPoll:
            interval: 10m
      status:
        connectionState:
          address: redhat-operators.openshift-marketplace.svc:50051
          lastConnect: "2024-03-27T06:35:36Z"
          lastObservedState: READY
        latestImageRegistryPoll: "2024-03-27T10:23:16Z"
        registryService:
          createdAt: "2024-03-20T15:56:03Z"
          port: "50051"
          protocol: grpc
          serviceName: redhat-operators
          serviceNamespace: openshift-marketplace

      Additional info:

      I also checked the currentPodsWithCorrectImageAndSpec, but no hash changed due to the pod.spec are the same always.

      time="2024-03-26T03:22:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace
      time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW
      time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW
      time="2024-03-26T03:27:02Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA
      time="2024-03-26T03:27:03Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA

            jlanford@redhat.com Joe Lanford
            rhn-support-jiazha Jian Zhang
            Jian Zhang Jian Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated: