Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.17.0
Component/s: OLM
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    While running HCPs on Azure we have experienced continuous spike every few minutes by the catalog operator pods, it has spiked up to 200%(max) by a single pod. When running multiple HCPs, they were spiking all at once randomly that caused CPU starvation on the worker, that affected kubelet and containerd(AKS) daemons.

Version-Release number of selected component (if applicable):

    4.17.0-rc.0

How reproducible:

    Always

Steps to Reproduce:

    1. Create HCPs on Azure (it may happen on AWS)
    2. Watch catalog operator pod CPU usage for a few hours
    3. PROMQL - sum(irate(container_cpu_usage_seconds_total{container!="POD",name!="",namespace=~"clusters.*",instance=~".*user.*",pod=~".*catalog.*"}[$interval])) by (pod) * 100

Actual results:

    CPU spike randomly every few minutes, per document 12h is the default interval but its happening frequently
https://olm.operatorframework.io/docs/advanced-tasks/configuring-olm/#changing-the-package-server-sync-interval

Expected results:

    Every 12h is the expected interval based on the document and reduced CPU usage
https://olm.operatorframework.io/docs/advanced-tasks/configuring-olm/#changing-the-package-server-sync-interval

Additional info:

    This was supposed to be fixed by this - https://issues.redhat.com/browse/OCPBUGS-17950 but still happening in latest release

Usage screenshot - here

is caused by

OCPBUGS-52422 OPM no longer prunes metadata from non-channel heads

is duplicated by

OCPBUGS-36421 redhat-operators pod experiencing unusually high CPU utilization

Closed

Assignee:: Lalatendu Mohanty

Reporter:: Murali Krishnasamy

Need Info From:: None

Contributors:: None

QA Contact:: Jian Zhang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/09/16 1:32 PM

Updated:: 2025/07/21 5:25 PM

Resolved:: 2025/02/13 4:51 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide