Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.21.0
Component/s: HyperShift
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
Yes

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Approved
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Catalog sources fail to start when HostedCluster uses spec.olmCatalogPlacement: guest. This is true for both default catalog sources (e.g. certified-operators) and custom sources.

This is a regression brought in https://github.com/openshift/operator-framework-olm/pull/1129

Version-Release number of selected component (if applicable):

    4.21 (since Oct 25)

How reproducible:

    Always

Steps to Reproduce:

    1. Start a hosted cluster using:
         hypershift create cluster aws ... 
         --olm-catalog-placement=Guest
         --release-image=<at 4.21 nightly since Oct 25>
    2. Check Catalog Sources in openshift-marketplace NS in guest cluster and package manifests.

Actual results:

Catalog Source in guest cluster:

- apiVersion: operators.coreos.com/v1alpha1
  kind: CatalogSource
  metadata:
    annotations:
      target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
    creationTimestamp: "2025-11-04T08:13:07Z"
    generation: 1
    labels:
      hypershift.openshift.io/managed: "true"
    name: redhat-operators
    namespace: openshift-marketplace
    resourceVersion: "35089"
    uid: 9795a184-7a8e-4198-bb6a-057038116b7b
  spec:
    displayName: Red Hat Operators
    grpcPodConfig:
      securityContextConfig: restricted
    icon:
      base64data: ""
      mediatype: ""
    image: registry.redhat.io/redhat/redhat-operator-index:v4.20
    priority: -100
    publisher: Red Hat
    sourceType: grpc
    updateStrategy:
      registryPoll:
        interval: 10m
  status:
    connectionState:
      address: redhat-operators.openshift-marketplace.svc:50051
      lastConnect: "2025-11-04T09:34:33Z"
      lastObservedState: TRANSIENT_FAILURE
    latestImageRegistryPoll: "2025-11-04T09:45:56Z"
    registryService:
      createdAt: "2025-11-04T08:32:13Z"
      port: "50051"
      protocol: grpc
      serviceName: redhat-operators
      serviceNamespace: openshift-marketplace

Lots of Pods in openshift-marketplace NS in guest cluster being started and terminated quickly. It takes a lot of time to settle down to get just 4 pods that are running (several minutes).

 ᐅ oc klock pods
NAME                        READY   STATUS              RESTARTS   AGEcertified-operators-4bz7x   0/1     Terminating         0          43scertified-operators-5pvds   0/1     Terminating         0          36scertified-operators-bggc8   0/1     Terminating         0          28scertified-operators-c6tkb   0/1     Terminating         0          54scertified-operators-gwwq7   0/1     ContainerCreating   0          1scertified-operators-hkm4c   0/1     Terminating         0          20scertified-operators-qrgh8   0/1     ContainerCreating   0          54scertified-operators-tg9qt   0/1     Terminating         0          49scertified-operators-vr2bp   0/1     Terminating         0          11scommunity-operators-4wv48   0/1     Terminating         0          41scommunity-operators-5nfb6   0/1     Terminating         0          27scommunity-operators-ck76l   0/1     Terminating         0          47scommunity-operators-h85jp   0/1     Terminating         0          54scommunity-operators-hq7pq   0/1     Terminating         0          35scommunity-operators-lhkr8   0/1     Terminating         0          19scommunity-operators-nk8mq   0/1     ContainerCreating   0          54scommunity-operators-wff2v   0/1     Terminating         0          9sredhat-marketplace-294l9    0/1     Terminating         0          46sredhat-marketplace-5298j    0/1     Terminating         0          52sredhat-marketplace-7thcg    0/1     Error               0          16sredhat-marketplace-c2shh    0/1     Terminating         0          24sredhat-marketplace-fv5jw    0/1     Terminating         0          40sredhat-marketplace-gnrkf    0/1     Running             0          6sredhat-marketplace-qwmdx    0/1     ContainerCreating   0          52sredhat-marketplace-sv9lj    0/1     Terminating         0          32sredhat-operators-4dqgf      0/1     Terminating         0          45sredhat-operators-5vtxt      0/1     Terminating         0          38sredhat-operators-bkpx6      0/1     Terminating         0          14sredhat-operators-h2kx7      0/1     ContainerCreating   0          51sredhat-operators-hq8rg      0/1     ContainerCreating   0          5sredhat-operators-ptgmf      0/1     Terminating         0          51s

catalog-operator Pod log in the management cluster repeats this:

time="2025-10-31T13:31:20Z" level=info msg="evaluating current pod" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-marketplace-hgkpc current-pod.namespace=openshift-marketplace id=SFOdj

time="2025-10-31T13:31:20Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-marketplace-hgkpc current-pod.namespace=openshift-marketplace id=SFOdj

time="2025-10-31T13:31:20Z" level=error msg="error ensuring registry server: could not ensure update pod" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace error="catalog polling: redhat-marketplace not ready for update: update pod redhat-marketplace-pw7qp has not yet reported ready" id=SFOdj

time="2025-10-31T13:31:20Z" level=error msg="error ensuring registry server: ensure update pod error is not of type UpdateNotReadyErr" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace error="catalog polling: redhat-marketplace not ready for update: update pod redhat-marketplace-pw7qp has not yet reported ready" id=SFOdjtime="2025-10-31T13:31:20Z" level=info msg="requeueing registry server for catalog update check: update pod not yet ready" catalogsource.name=redhat-marketplace catalogsource.namespace=openshift-marketplace id=SFOdj

There are no package manifests available in the guest cluster.

Expected results:

    Catalog Sources READY, package manifests available.

Additional info:

    I have identified https://github.com/openshift/operator-framework-olm/pull/1129 as the source of the problem.

The nightly OCP build from Oct 24 still works, the nightly from Oct 25 does not.

I have also created a custom OCP build with the one from Oct 25 and by including custom images for operator-lifecycle-manager and operator-registry built from commit https://github.com/openshift/operator-framework-olm/commit/6e79ccc19197da354249f4753449fad3037b1c9e (this commit is before the pull/1129 was merged). And that works fine (Catalog sources READY).

Assignee:: Unassigned

Reporter:: Martin Gencur

Need Info From:: None

Contributors:: None

QA Contact:: Martin Gencur

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/11/04 11:18 AM

Updated:: 2025/11/06 7:32 AM

Resolved:: 2025/11/06 7:32 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates