-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.16
-
None
Description of problem (please be detailed as possible and provide log
snippests):
The following storage-related processes currently share the same priority as the virt-launcher pods, making them susceptible to eviction or termination during pod evacuation or Out Of Memory (OOM) events:
csi-addons-controller-manager
noobaa-operator
ocs-metrics-exporter
ocs-operator
odf-console
odf-operator-controller
rook-ceph-crashcollector
rook-ceph-exporter
rook-ceph-operator
rook-ceph-tools
ux-backend-server
Given the important role as some of these processes play in the storage system, it is worth considering elevating their priority class.
Enhancing their priority would improve the stability and robustness of the storage system during periods of stress, ensure continued operation during critical scenarios, and facilitate system debugging and information gathering in the event of crashes.
Version of all relevant components (if applicable):
All
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Loss of these pods in a stress scenario may impact the ability to gather storage information or monitoring the system effectively.
Is there any workaround available to the best of your knowledge?
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1
Can this issue reproducible?
yes
Can this issue reproduce from the UI?
yes
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Install ODF 4.15
2. Stress the system to the point of node evacuation with high number of VMS
Actual results:
These pods are killed by OOM/evacuated
Expected results:
All important storage pods, also those who are important for debuting and monitoring, should not be evacuated/killed before all VMS are.
Additional info: