-
Feature Request
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Better Warnings and Safeguards for Over-Utilization of ODF Storage with OpenShift Virtualization
2. What is the nature and description of the request?
In CU environment, primarily running OpenShift Virtualization, they’ve encountered critical issues when ODF storage utilization exceeds 80%. At that point, the cluster becomes unresponsive, halting most operations—including critical diagnostics such as must-gather.
While this issue has been previously reported and addressed reactively, their current workaround is to over-provision their clusters with additional flash-backed worker nodes to create a buffer. This is both resource-inefficient and costly.
The default 80% utilization alert is useful for container-based workloads, but is insufficient for VM-heavy environments, where VMs tend to consume larger and more persistent storage blocks over time. They currently have no proactive warnings that take into account the impact of adding new VMs with storage-backed PVCs.
3. Why does the customer need this? (List the business requirements here)
Currently today we have to over build our clusters to prevent this issue resulting in approx 30% more resources than really needed should we have safe guards around this.
4. List any affected packages or components.
ODF 4.18.7
Virt 4.18.11, 4.19
Additional Details from the CU:
They propose the following improvements to help prevent ODF-related outages in virtualized environments:
1. Pre-scheduling Warnings for VM Additions:
When provisioning a new VM, evaluate whether its storage footprint (PVC size) could push the cluster’s ODF utilization beyond a safe threshold.
Warn users if the cumulative VM storage usage could create a risk of reaching critical utilization levels.
2. Intelligent Capacity Awareness:
Existing VM PVC usage and future write patterns (e.g., thick vs. thin provisioning).
Current data distribution across worker nodes.
3. Predictive Overcommitment Guardrails:
Introduce a predictive model or threshold system that simulates “worst-case” storage growth for VMs (e.g., if all disks hit 100% utilization), and warns users well before that scenario materializes.
-
-
- Business Impact
To prevent storage-induced outages in their virtualized workloads, they are forced to overbuild their clusters by approximately 30%. This includes additional:
- Business Impact
-
Worker nodes with SSD/flash storage
Infrastructure capacity, just to maintain stability
These measures inflate operational costs and reduce cluster efficiency. A more intelligent warning and enforcement mechanism would allow them to:
Optimize cluster sizing
Improve reliability
Avoid catastrophic failures tied to storage overconsumption
Please reference SF Ticket: 04210788