Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16
Component/s: OLM
Labels:
- olmv0
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
Escape Impact:
Corrective Measures:
SDLC stage when should've been found:
None

Description of problem:

certified-operators part of the marketplace namespace is failing frequently due to startup probe 


2h19m       Normal    Created          pod/certified-operators-tkh2b   Created container registry-server
2h19m       Normal    Started          pod/certified-operators-tkh2b   Started container registry-server
2h18m       Warning   Unhealthy        pod/certified-operators-tkh2b   Startup probe failed: timeout: failed to connect service ":50051" within 1s
2h18m       Normal    Killing          pod/certified-operators-tkh2b   Stopping container registry-server
2h18m       Warning   Unhealthy        pod/certified-operators-tkh2b   Readiness probe errored: rpc error: code = Unknown desc = command error: cannot register an exec PID: container is stopping, stdout: , stderr: , exit code -1

    livenessProbe:
      exec:
        command:
        - grpc_health_probe
        - -addr=:50051
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    name: registry-server
    ports:
    - containerPort: 50051
      name: grpc
      protocol: TCP
    readinessProbe:
      exec:
        command:
        - grpc_health_probe
        - -addr=:50051
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
---

Alert:

Labels:

alertname = KubePodCrashLooping
container = registry-server
endpoint = https-main
job = kube-state-metrics
managed_cluster = e74350bb-dc29-4716-860d-51269c4ef5d0
namespace = openshift-marketplace
openshift_io_alert_source = platform
pod = community-operators-j2g4v
prometheus = openshift-monitoring/k8s
reason = CrashLoopBackOffservice = kube-state-metrics
severity = warninguid = 8d236469-bec3-4423-9662-d0bf7fc14826

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

 certified-operators failing leading alertname = KubePodCrashLooping.

Expected results:

    certified-operators should not fail and not generate the alert.

Additional info:

The pods redhat-marketplace is also restarting due to probe failure.

is related to

OCPBUGS-69441 10m catalog sync interval contributes to unbounded etcd growth

Closed

OCPBUGS-73873 10m catalog sync interval contributes to unbounded etcd growth

Closed

OCPBUGS-73875 10m catalog sync interval contributes to unbounded etcd growth

Closed

OCPBUGS-73876 10m catalog sync interval contributes to unbounded etcd growth

Closed

OCPBUGS-73877 10m catalog sync interval contributes to unbounded etcd growth

Closed

Assignee:: Catherine Chan-Tse

Reporter:: Shivam shinde

QA Contact:: Xia Zhao

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2025/06/05 10:29 AM

Updated:: 2026/03/02 3:10 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates