-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
openshift-4.21
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Expose cgroup v2 io.max parameters to enable IOPS and BPS QoS limits for pods utilizing local storage
2. What is the nature and description of the request?
We are requesting a supported mechanism in OpenShift to configure I/O Quality of Service (QoS)—specifically, IOPS and BPS (Bytes Per Second) limits—for pods and containers that are backed by local storage.
Based on the investigation so far:
- The underlying Linux kernel supports I/O throttling via cgroup v2 io.max parameters.
- The container runtimes, including CRI-O and containerd, already support applying these io.max limits.
- However, the Kubernetes layer currently lacks mechanism to pass these configurations down to the CRI. Upstream Kubernetes enhancement issue has been stalled for over four years.
- Furthermore, TopoLVM maintainers have indicated that QoS enforcement belongs at the CRI layer, not the CSI layer.
Since the lower-level components (Kernel and CRI-O) are already capable, customers need an OpenShift-native method (e.g., via Pod annotations, a specific Operator, or by driving the upstream Kubernetes KEP forward) to expose this functionality to users.
References:
- Kernel cgroup v2: https://docs.kernel.org/admin-guide/cgroup-v2.html#io
- CRI-O PR: https://github.com/cri-o/cri-o/pull/4873
- containerd PR: https://github.com/containerd/containerd/pull/5490
- Upstream Kubernetes Issue (#3008): https://github.com/kubernetes/enhancements/issues/3008
- TopoLVM Discussion: https://github.com/topolvm/topolvm/discussions/1135
3. Why does the customer need this? (List the business requirements here)
Many customers want to migrate their multi-tenant databases (like PostgreSQL) from VMware to OpenShift. However, they cannot migrate because OpenShift lacks storage QoS.
Customers have two strict requirements:
- Performance: They must use local storage to make databases run fast.
- Noisy Neighbor Prevention: They must limit IOPS and BPS for each database pod. If one pod uses all the disk I/O, it will slow down or break other databases on the same node. QoS is required to keep SLAs.
Without local storage QoS, it is impossible to run multi-tenant databases safely on OpenShift. This missing feature is a major blocker for VMware-to-OpenShift migrations.
4. List any affected packages or components.
- Kubernetes / Kubelet
- CRI-O
- Local Storage Operator (LSO) / LVM Storage Operator