Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: odf-4.18
Affects Version/s: odf-4.17
Component/s: ceph-csi-operator
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Dev Approval:
Committed
Docs Approval:
?
PM Approval:
?
Prod build version:
4.18.0-102
QE Approval:
Committed
Release Note Text:

Hide
Cause: On installing ODF, ceph-csi-controller-manager tries to cache all configmaps in the cluster
Consequence: ceph-csi-controller-manager pod gets OOMKilled
Fix: Cache is scoped to only to the namespace where ceph-csi-controller-manager pod is running
Result: Stable memory usage by Pod and not OOMKilled

Show
Cause: On installing ODF, ceph-csi-controller-manager tries to cache all configmaps in the cluster Consequence: ceph-csi-controller-manager pod gets OOMKilled Fix: Cache is scoped to only to the namespace where ceph-csi-controller-manager pod is running Result: Stable memory usage by Pod and not OOMKilled
Release Note Type:
Bug Fix
Release Note Status:
Proposed
Target Release:

odf-4.18
Intelligence Requested:
Market:

Target Version:

odf-4.18

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

The ceph-csi-controller-manager pod keep OOMKilled

The OCP platform infrastructure and deployment type

Cluster is Bare Metal, installed with Assited Installer, and upagraded since 4.11

The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

Internal ODF using 3 worker nodes and NVMe devices

The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

cephcsi-operator.v4.17.0-rhodf

Does this issue impact your ability to continue to work with the product?

I don't know

Is there any workaround available to the best of your knowledge?

We tried to bump the limits for the pod, based on https://access.redhat.com/solutions/7002548, but even with x30 of the orignal values, it still fails.

Can this issue be reproduced? If so, please provide the hit rate

We don't have another cluster to verify it. We were hit by this on our production cluster.

Can this issue be reproduced from the UI?

If this is a regression, please provide more details to justify this:

Steps to Reproduce:

The exact date and time when the issue was observed, including timezone details:

Actual results:

Expected results:

Logs collected and log location:

Additional info:

is related to

CNV-50664 Resolve ODF/ceph-csi-controller-manager pods OOMKilled in cnv2.eng cluster

Closed

links to

red-hat-storage/ceph-csi-operator#67: DFBUGS-1295: Syncing latest changes from upstream main for ceph-csi-operator

red-hat-storage/ceph-csi-operator#68: DFBUGS-938:release-4.18 restrict manager resource cache to supplied namespaces

RHBA-2024:138027 Red Hat OpenShift Data Foundation 4.18 security, enhancement & bug fix update

Assignee:: Leela Gangavarapu

Reporter:: Rabin Yasharzadehe

Need Info From:: Rabin Yasharzadehe

QA Contact:: Oded Viner

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Created:: 2024/11/26 3:33 PM

Updated:: 2025/03/11 9:18 AM

Resolved:: 2025/03/11 9:18 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty