-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.21.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
Yes
-
None
-
None
-
Approved
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Catalog sources fail to start when HostedCluster uses spec.olmCatalogPlacement: guest. This is true for both default catalog sources (e.g. certified-operators) and custom sources. This is a regression brought in https://github.com/openshift/operator-framework-olm/pull/1129
Version-Release number of selected component (if applicable):
4.21 (since Oct 25)
How reproducible:
Always
Steps to Reproduce:
1. Start a hosted cluster using:
hypershift create cluster aws ...
--olm-catalog-placement=Guest
--release-image=<at 4.21 nightly since Oct 25>
2. Check Catalog Sources in openshift-marketplace NS in guest cluster and package manifests.
Actual results:
Catalog Source in guest cluster:
- apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
annotations:
target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
creationTimestamp: "2025-11-04T08:13:07Z"
generation: 1
labels:
hypershift.openshift.io/managed: "true"
name: redhat-operators
namespace: openshift-marketplace
resourceVersion: "35089"
uid: 9795a184-7a8e-4198-bb6a-057038116b7b
spec:
displayName: Red Hat Operators
grpcPodConfig:
securityContextConfig: restricted
icon:
base64data: ""
mediatype: ""
image: registry.redhat.io/redhat/redhat-operator-index:v4.20
priority: -100
publisher: Red Hat
sourceType: grpc
updateStrategy:
registryPoll:
interval: 10m
status:
connectionState:
address: redhat-operators.openshift-marketplace.svc:50051
lastConnect: "2025-11-04T09:34:33Z"
lastObservedState: TRANSIENT_FAILURE
latestImageRegistryPoll: "2025-11-04T09:45:56Z"
registryService:
createdAt: "2025-11-04T08:32:13Z"
port: "50051"
protocol: grpc
serviceName: redhat-operators
serviceNamespace: openshift-marketplace
Lots of Pods in openshift-marketplace NS in guest cluster being started and terminated quickly. It takes a lot of time to settle down to get just 4 pods that are running (several minutes).
ᐅ oc klock pods NAME READY STATUS RESTARTS AGEcertified-operators-4bz7x 0/1 Terminating 0 43scertified-operators-5pvds 0/1 Terminating 0 36scertified-operators-bggc8 0/1 Terminating 0 28scertified-operators-c6tkb 0/1 Terminating 0 54scertified-operators-gwwq7 0/1 ContainerCreating 0 1scertified-operators-hkm4c 0/1 Terminating 0 20scertified-operators-qrgh8 0/1 ContainerCreating 0 54scertified-operators-tg9qt 0/1 Terminating 0 49scertified-operators-vr2bp 0/1 Terminating 0 11scommunity-operators-4wv48 0/1 Terminating 0 41scommunity-operators-5nfb6 0/1 Terminating 0 27scommunity-operators-ck76l 0/1 Terminating 0 47scommunity-operators-h85jp 0/1 Terminating 0 54scommunity-operators-hq7pq 0/1 Terminating 0 35scommunity-operators-lhkr8 0/1 Terminating 0 19scommunity-operators-nk8mq 0/1 ContainerCreating 0 54scommunity-operators-wff2v 0/1 Terminating 0 9sredhat-marketplace-294l9 0/1 Terminating 0 46sredhat-marketplace-5298j 0/1 Terminating 0 52sredhat-marketplace-7thcg 0/1 Error 0 16sredhat-marketplace-c2shh 0/1 Terminating 0 24sredhat-marketplace-fv5jw 0/1 Terminating 0 40sredhat-marketplace-gnrkf 0/1 Running 0 6sredhat-marketplace-qwmdx 0/1 ContainerCreating 0 52sredhat-marketplace-sv9lj 0/1 Terminating 0 32sredhat-operators-4dqgf 0/1 Terminating 0 45sredhat-operators-5vtxt 0/1 Terminating 0 38sredhat-operators-bkpx6 0/1 Terminating 0 14sredhat-operators-h2kx7 0/1 ContainerCreating 0 51sredhat-operators-hq8rg 0/1 ContainerCreating 0 5sredhat-operators-ptgmf 0/1 Terminating 0 51s
catalog-operator Pod log in the management cluster repeats this:
time="2025-10-31T13:31:20Z" level=info msg="evaluating current pod" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-marketplace-hgkpc current-pod.namespace=openshift-marketplace id=SFOdj time="2025-10-31T13:31:20Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-marketplace-hgkpc current-pod.namespace=openshift-marketplace id=SFOdj time="2025-10-31T13:31:20Z" level=error msg="error ensuring registry server: could not ensure update pod" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace error="catalog polling: redhat-marketplace not ready for update: update pod redhat-marketplace-pw7qp has not yet reported ready" id=SFOdj time="2025-10-31T13:31:20Z" level=error msg="error ensuring registry server: ensure update pod error is not of type UpdateNotReadyErr" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace error="catalog polling: redhat-marketplace not ready for update: update pod redhat-marketplace-pw7qp has not yet reported ready" id=SFOdjtime="2025-10-31T13:31:20Z" level=info msg="requeueing registry server for catalog update check: update pod not yet ready" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace id=SFOdj
There are no package manifests available in the guest cluster.
Expected results:
Catalog Sources READY, package manifests available.
Additional info:
I have identified https://github.com/openshift/operator-framework-olm/pull/1129 as the source of the problem.
The nightly OCP build from Oct 24 still works, the nightly from Oct 25 does not.
I have also created a custom OCP build with the one from Oct 25 and by including custom images for operator-lifecycle-manager and operator-registry built from commit https://github.com/openshift/operator-framework-olm/commit/6e79ccc19197da354249f4753449fad3037b1c9e (this commit is before the pull/1129 was merged). And that works fine (Catalog sources READY).