Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: odf-4.18.12
Affects Version/s: odf-4.18
Component/s: ceph-monitoring
Labels:
None

Blocked:
True
Blocked Reason:

Hide

Customer is unable to upgrade their clusters as a result of this bug.

Show
Customer is unable to upgrade their clusters as a result of this bug.
Ready:
False
Dev Approval:
Committed
Docs Approval:
?
Architecture:

x86_64
PM Approval:
?
Prod build version:
4.18.12-1.konflux
QE Approval:
Committed
Target Release:

odf-4.18.12
Intelligence Requested:
Market:

Severity:
Important

Release Blocker:
Proposed
Target Version:

odf-4.18.12

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

Customer stated that after upgrading their clusters from ODF 4.17.7 to 4.18.6 the ocs-metrics-exporter pod continuously crashes due to OOMKills.

Cluster #1:
OCP Cluster ID: 0e7d9d45-756a-44dd-b971-b3e1262b66e6
Ceph Cluster ID: d9e2be69-3519-45dc-9c6e-c0577afcfdbb
OCP Version: 4.18.15
ODF Version: 4.18.6

$ oc get pod ocs-metrics-exporter-6f945875f-mpgcz -o json | jq -c '.status.containerStatuses[] | {name: .name, restarts: .restartCount, exitCode: .lastState.terminated.exitCode, reason: .lastState.terminated.reason}'
{"name":"kube-rbac-proxy-main","restarts":109,"exitCode":137,"reason":"OOMKilled"}
{"name":"kube-rbac-proxy-self","restarts":23,"exitCode":137,"reason":"OOMKilled"}
{"name":"ocs-metrics-exporter","restarts":0,"exitCode":null,"reason":null}

Cluster #2:
OCP Cluster ID: 4daaa5fd-b76f-45eb-bb35-930f7abd41ef
Ceph Cluster ID: d0a89bd3-b57e-4b44-91d0-412a63cf0492
OCP Version: 4.18.15
ODF Version: 4.18.6

$ oc get pod ocs-metrics-exporter-6f945875f-6nc4l -o json | jq -c '.status.containerStatuses[] | {name: .name, restarts: .restartCount, exitCode: .lastState.terminated.exitCode, reason: .lastState.terminated.reason}'
{"name":"kube-rbac-proxy-main","restarts":2,"exitCode":137,"reason":"OOMKilled"}
{"name":"kube-rbac-proxy-self","restarts":7,"exitCode":137,"reason":"OOMKilled"}
{"name":"ocs-metrics-exporter","

I've attempted to increase resources for this deployment, but any changes I have made did not persist. The ocs-metrics-exporter pod is unstable in two clusters.

The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

VMWare

The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

Internal-Attached (LSO)

Does this issue impact your ability to continue to work with the product?

Yes, customer states "Lost metrics and visibility for storage metrics"

Is there any workaround available to the best of your knowledge?

No. I attempted to set the do-not-reconcile annotation to the deployment and increase the resources, but these changes were reverted. Attempting to modify it via the storagecluster CR was unsuccessful as well

Can this issue be reproduced? If so, please provide the hit rate

Customer has this issue occurring in two separate clusters.

Can this issue be reproduced from the UI?

N/A

Actual results:

Unable to allocate more resources to the ocs-metrics-exporter

Expected results:

The ocs-metrics-exporter should not be getting OOMKilled or we should be able to increase the amount of resources allocated to this deployment via the storagecluster CR.

Logs collected and log location:

Case #04188878

links to

KCS Solution #7131695

red-hat-storage/ocs-operator#3516: DFBUGS-4125: [release-4.18] Add configurable resource limits for kube-rbac-proxy containers

Assignee:: Divyansh Kamboj

Reporter:: Ryan Laberinto

Need Info From:: Divyansh Kamboj

QA Contact:: Thotakura Chaitanya

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Created:: 2025/09/16 11:45 PM

Updated:: 2025/11/13 5:37 PM

Resolved:: 2025/11/13 5:37 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty