Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-7955

Better Warnings and Safeguards for Over-Utilization of ODF Storage with OpenShift Virtualization

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request
      Better Warnings and Safeguards for Over-Utilization of ODF Storage with OpenShift Virtualization

      2. What is the nature and description of the request?
      In CU environment, primarily running OpenShift Virtualization, they’ve encountered critical issues when ODF storage utilization exceeds 80%. At that point, the cluster becomes unresponsive, halting most operations—including critical diagnostics such as must-gather.

      While this issue has been previously reported and addressed reactively, their current workaround is to over-provision their clusters with additional flash-backed worker nodes to create a buffer. This is both resource-inefficient and costly.

      The default 80% utilization alert is useful for container-based workloads, but is insufficient for VM-heavy environments, where VMs tend to consume larger and more persistent storage blocks over time. They currently have no proactive warnings that take into account the impact of adding new VMs with storage-backed PVCs.

      3. Why does the customer need this? (List the business requirements here)
      Currently today we have to over build our clusters to prevent this issue resulting in approx 30% more resources than really needed should we have safe guards around this.

      4. List any affected packages or components.
      ODF 4.18.7
      Virt 4.18.11, 4.19

      Additional Details from the CU:

      They propose the following improvements to help prevent ODF-related outages in virtualized environments:

      1. Pre-scheduling Warnings for VM Additions:

      When provisioning a new VM, evaluate whether its storage footprint (PVC size) could push the cluster’s ODF utilization beyond a safe threshold.

      Warn users if the cumulative VM storage usage could create a risk of reaching critical utilization levels.

      2. Intelligent Capacity Awareness:

      Existing VM PVC usage and future write patterns (e.g., thick vs. thin provisioning).

      Current data distribution across worker nodes.

      3. Predictive Overcommitment Guardrails:

      Introduce a predictive model or threshold system that simulates “worst-case” storage growth for VMs (e.g., if all disks hit 100% utilization), and warns users well before that scenario materializes.

          1. Business Impact
            To prevent storage-induced outages in their virtualized workloads, they are forced to overbuild their clusters by approximately 30%. This includes additional:

      Worker nodes with SSD/flash storage

      Infrastructure capacity, just to maintain stability

      These measures inflate operational costs and reduce cluster efficiency. A more intelligent warning and enforcement mechanism would allow them to:

      Optimize cluster sizing

      Improve reliability

      Avoid catastrophic failures tied to storage overconsumption

      Please reference SF Ticket: 04210788

              rh_pelauter@redhat.com Peter Lauterbach
              dacarpen@redhat.com Darren Carpenter
              None
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                None
                None