Description of problem:
The default catalog source pod never gets updates, the users have to manually recreate it to get updated. Here is must-gather log for your debugging: https://drive.google.com/file/d/16_tFq5QuJyc_n8xkDFyK83TdTkrsVFQe/view?usp=drive_link
I went through the code and found the `updateStrategy` depends on the `ImageID`, see
// imageID returns the ImageID of the primary catalog source container or an empty string if the image ID isn't available yet. // Note: the pod must be running and the container in a ready status to return a valid ImageID. func imageID(pod *corev1.Pod) string { if len(pod.Status.ContainerStatuses) < 1 { logrus.WithField("CatalogSource", pod.GetName()).Warn("pod status unknown") return "" } return pod.Status.ContainerStatuses[0].ImageID }
But, for those default catalog source pods, their `pod.Status.ContainerStatuses[0].ImageID` will never change since it's the `opm` image, not index image.
jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.containerStatuses} |jq [ { "containerID": "cri-o://115bd207312c7c8c36b63bfd251c085a701c58df2a48a1232711e15d7595675d", "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e", "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e", "lastState": {}, "name": "registry-server", "ready": true, "restartCount": 1, "started": true, "state": { "running": { "startedAt": "2024-03-26T04:21:41Z" } } } ]
The imageID() func should return the index image ID for those default catalog sources.
jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.initContainerStatuses[1]} |jq { "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c", "image": "registry.redhat.io/redhat/redhat-operator-index:v4.15", "imageID": "registry.redhat.io/redhat/redhat-operator-index@sha256:19010760d38e1a898867262698e22674d99687139ab47173e2b4665e588635e1", "lastState": {}, "name": "extract-content", "ready": true, "restartCount": 1, "started": false, "state": { "terminated": { "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c", "exitCode": 0, "finishedAt": "2024-03-26T04:21:39Z", "reason": "Completed", "startedAt": "2024-03-26T04:21:27Z" } } }
Version-Release number of selected component (if applicable):
4.15.2
How reproducible:
always
Steps to Reproduce:
1. Install an OCP 4.16.0 2. Waiting for the redhat-operator catalog source updates 3.
Actual results:
The redhat-operator catalog source never gets updates.
Expected results:
These default catalog source should get updates depending on the `updateStrategy`.
jiazha-mac:~ jiazha$ oc get catalogsource redhat-operators -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: annotations: operatorframework.io/managed-by: marketplace-operator target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}' creationTimestamp: "2024-03-20T15:48:59Z" generation: 1 name: redhat-operators namespace: openshift-marketplace resourceVersion: "12217605" uid: cc0fc420-c9d8-4c7d-997e-f0893b4c497f spec: displayName: Red Hat Operators grpcPodConfig: extractContent: cacheDir: /tmp/cache catalogDir: /configs memoryTarget: 30Mi nodeSelector: kubernetes.io/os: linux node-role.kubernetes.io/master: "" priorityClassName: system-cluster-critical securityContextConfig: restricted tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 120 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 120 icon: base64data: "" mediatype: "" image: registry.redhat.io/redhat/redhat-operator-index:v4.15 priority: -100 publisher: Red Hat sourceType: grpc updateStrategy: registryPoll: interval: 10m status: connectionState: address: redhat-operators.openshift-marketplace.svc:50051 lastConnect: "2024-03-27T06:35:36Z" lastObservedState: READY latestImageRegistryPoll: "2024-03-27T10:23:16Z" registryService: createdAt: "2024-03-20T15:56:03Z" port: "50051" protocol: grpc serviceName: redhat-operators serviceNamespace: openshift-marketplace
Additional info:
I also checked the currentPodsWithCorrectImageAndSpec, but no hash changed due to the pod.spec are the same always.
time="2024-03-26T03:22:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW time="2024-03-26T03:27:02Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA time="2024-03-26T03:27:03Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA
- blocks
-
OCPBUGS-31651 [release-4.15]: Default catalog source pod never get updates
- Closed
- is caused by
-
OPRUN-3130 Allow Consuming OPM From The Payload, Not The Index Image
- To Do
- is cloned by
-
OCPBUGS-31651 [release-4.15]: Default catalog source pod never get updates
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update