-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
Proposed title of this feature request
Single RWX PV/PVC for a VM and Its Virtual Disks
What is the nature and description of the request?
We would like to request the capability in OpenShift Virtualization to place multiple virtual disks for a single Virtual Machine (VM) on a single ReadWriteMany (RWX) Persistent Volume (PV) / Persistent Volume Claim (PVC). We propose starting with NFS as the primary storage type, given that it poses the least development challenges and is widely available in many enterprise environments. NFS would require the least amount of engineering work and given the direction of many primary storage vendors having unified arrays, this feature would benefit everyone.
By consolidating all virtual disks of a single VM onto one PV/PVC, Red Hat can leverage existing storage capabilities rather than having to build new data management functionality in-house. This approach would:
- Leverage Storage Vendor Data Services: Allow storage vendors to provide snapshots, cloning, and other data services with the granularity customers need.
- Create Natural Consistency Groups: Achieve consistency across disks without requiring new consistency group features in OpenShift Virtualization.
- Simplify Cloning and Provisioning: Facilitate rapid creation of VM templates and clones.
- Reduce QA Overhead: Minimize the need for additional test scenarios and integration complexities in Red Hat’s QA process.
- Streamline Live Migration: Placing multiple disks on a single NFS share could simplify VM live migration workflows.
- Avoid Building an SDS Layer: Eliminate the need for Red Hat to develop an internal software-defined storage stack, reducing both development risk and data-corruption liability. There are reasons SDS technologies haven’t taken over the storage market even though they’ve been around for decades.
Comparison to Other Approaches
Red Hat is exploring solutions such as Kubesan (or other methods) that can create a single large datastore for VMs. However, these alternatives come with additional complexity because Red Hat would be responsible for implementing or maintaining data services—such as cloning, snapshots, fault recovery, encryption, and more—at the VM or disk level. This approach can lead to:
- Increased Upgrade Complexity and Risk
Cluster upgrades would have to follow very specific orders to maintain compatibility among firmware, operating systems, storage vendors, and OpenShift versions, increasing the risk of data corruption and operational downtime. - Higher Liability for Administrators
More complex infrastructure layers create more points of failure. Server and OpenShift administrators would face greater risks of data loss or corruption, as responsibilities become blurred across multiple abstraction layers. - Reduced Storage Performance
Additional abstraction layers can degrade performance, reducing the overall benefit of moving from platforms like VMware. - Complicated Failure Recovery
Multiple layers for data handling introduce challenges when recovering from hardware or software failures. Storage vendors may defer responsibility to Red Hat and vice versa, leaving administrators in difficult positions if data corruption occurs. - Increased Development and QA Requirements
If Red Hat needs to maintain these data services independently, engineering and QA overhead increases significantly. This detracts from the core goal of innovating on OpenShift platform features and server-side application management.
Requested Outcome
By allowing multiple virtual disks of a single VM to reside on one RWX NFS PV/PVC:
- Storage Flexibility: Customers can use native storage vendor capabilities (snapshots, clones, encryption, etc.) while maintaining disk-level or VM-level granularity.
- Operational Simplicity: Live migration, cloning, and provisioning are streamlined.
- Reduced Liability and Risk: Red Hat avoids engineering and maintaining a complex SDS-like solution, thus reducing potential data corruption risks.
We believe this approach offers the best engineering compromise to meet customer needs while minimizing complexity for both Red Hat and its clients.
Why does the customer need this? (List the business requirements here)
As VMware customers are looking for alternatives, many are asking why Kubert makes storage management so complicated. While OpenShift virtualization offers a VVOL like model, most customers weren't using VVOLs and rather just used the large shared datastore model. The initial ask wouldn't be the single large datastore model, but a compromise where all the virtual disks of a single VM could be placed on a single PV/PVC. Allowing multiple virtual disks per VM to reside on a single PV/PVC significantly reduces management overhead, cuts the number of objects in both Kubernetes and the underlying storage system, and lowers the associated CPU, memory, and licensing costs. It accelerates provisioning, cloning, replication, and recovery tasks while making migrations, backups, and policy enforcement more straightforward. Ultimately, consolidating virtual disks into fewer volumes helps scale your environment more efficiently and cost-effectively, all while simplifying day-to-day operations. Here are some, but not all, of the points I'd like to cover with you today. Understand in an enterprise that all of these things are happening at once, so there isn't a single smoking gun. More like death by a thousand papercuts.
- LUN Limits
- Volume Limits: Most storage vendors impose maximum LUN or volume limits. If each virtual disk is backed by its own volume, these limits can be hit quickly in environments with hundreds or thousands of VMs. Once the limit is reached, organizations must re-architect their storage design or purchase larger or additional storage arrays.
- Cost & Growth Constraints: Storage expansion may require new shelves, expansions, or licenses if you hit vendor-imposed volume limits. Consolidating multiple disks onto a single volume extends the life of existing hardware and licenses.
- Block Map Tracking Overhead
- Snapshot/Clone Metadata: Each volume needs its own metadata to track changed blocks for snapshot, clone, replication, or mirroring operations. As volume count grows linearly, metadata tracking overhead can grow exponentially, requiring additional CPU and memory on the storage controllers.
- Performance Penalty: High metadata overhead can increase latencies for I/O, snapshot creation, and replication tasks, as the system has to manage and update thousands of block maps instead of fewer larger ones.
- Replication Relationships and Monitoring
- Complexity at Scale: Each volume may need its own replication relationship or be added to replication group policies. Managing thousands of individual replication relationships quickly becomes operationally painful.
- RPO/RTO Challenges: Tracking replication statuses, reporting on them, and ensuring consistent Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) is more difficult with numerous small volumes. One volume per VM can act as a natural consistency group.
- Clones and Snapshot Block Tracking
- Clone Storms: Thin clones and snapshots each maintain block tracking. Having hundreds or thousands of small volumes significantly multiplies that overhead.
- Resource Sprawl: As clones or snapshots grow, you end up with more “dangling” volumes or partial clones that need manual cleanup or extra automation.
- K8s Objects (PVCs, Snapshots, Clones)
- API and etcd Load: Each volume or snapshot is a separate Kubernetes object (PVC, VolumeSnapshot, VolumeSnapshotContent, etc.). This can inflate the number of objects stored in etcd, slowing down queries, increasing backup times, and raising complexity in general.
- Operational Overhead: More objects means more chance for orphaned resources, difficult audits, and more complicated RBAC or policy definitions.
- Strain on K8s Rate Limiter and API Calls
- Provisioning Storms: Provisioning or deleting volumes, snapshots, and clones triggers calls to the Kubernetes API and the CSI driver. Each request consumes cluster API capacity. A large number of small volumes can cause frequent rate limiting, slowing down overall cluster operations.
- Metrics Collection: Kubernetes queries object states frequently. More volumes means more calls and more burden on both the cluster and the storage subsystem to serve metrics.
- etcd Objects
- etcd Scalability: etcd is the “brain” of your cluster. The more objects etcd must track, the more disk I/O, CPU, and memory you consume on the control plane. Large expansions can require bigger or more highly available etcd clusters.
- Performance and Backup Implications: etcd backups grow in size, and restoration times increase with an excessive number of objects.
- Recovery from Failures (Volume Imports)
- Disaster Recovery Complexity: In a catastrophic Kubernetes failure, if you must re-import volumes as PVCs, it’s far easier to handle a single or a handful of volumes per VM instead of dozens. This can be the difference between a feasible DR plan and an unmanageable one.
- Fewer Steps to “Rerun”: Each volume import is a separate operation. Reducing the volume count drastically shortens recovery workflows.
- Natural Consistency Groups
- Better Data Consistency: Having all virtual disks for a VM on a single PVC lends itself to natural consistency. You don’t need elaborate orchestration to ensure all disks are snapshotted or cloned in lockstep.
- Simpler Replication: Consistency groups are essential in maintaining transaction consistency for applications that need multi-volume consistency. With a single volume, replication tasks are simpler.
- Golden Images (Thin Clones)
- Single Operation: Golden image creation on a single volume means you snapshot or clone once per VM. If each disk is on its own volume, you have multiple parallel operations—more overhead, more points of failure, more to track.
- Lifecycle Management: Rolling updates, ephemeral testing, and dev/test cycles become easier if the entire VM is one snapshot or clone operation.
- VDI Provisioning Challenges
- Explosive Volume Growth: VDI often involves hundreds or thousands of desktops. If each desktop’s OS disk and user disk are separate volumes, your LUN count grows unmanageable.
- Faster Rollouts: Deploying hundreds of VDI VMs is much simpler when you’re dealing with fewer underlying objects. Linked clones, golden images, and quick refreshes become more efficient.
- Provisioning Time for Volumes(Redhat has entire internal teams wrestling with this problem)
- API Bottlenecks: Many arrays can only process a certain number of volume creation or deletion tasks concurrently. Multiple volumes per VM multiply these tasks drastically, lengthening provisioning cycles.
- Less Infrastructure “Churn”: Fewer calls made means less overhead on the CSI driver, the orchestrator, and the storage array.
- Metrics and Reporting
- Data Collection Overload: Monitoring systems (Prometheus, Grafana, etc.) must pull metrics for every volume. More volumes = more queries, data to store, and CPU overhead for metrics pipelines.
- Simplified Dashboards: Fewer, larger volumes make it easier to drill down and manage usage at the VM level.
- Software-Defined Replications
- Coordinated Replication: Software-based replication solutions (e.g., replicating entire PVCs) are simpler when each VM is a single volume.
- Policy Application: Policies for encryption, replication intervals, or snapshots are easier to define at the VM-level vs. applying them individually to multiple volumes.
- Migration Complexity
- Cross-Cluster or Cross-Cloud Migrations: Bulk transferring many small volumes is more error-prone. A single volume per VM is simpler to move via standard or vendor-specific replication/migration tools.
- Reduced Downtime: Migrations can often be faster when fewer objects are transferred, minimizing risk and total downtime.
- Context Switching (Performance)
- Kernel and Hypervisor Overhead: Each block device can add overhead within the hypervisor or kernel, especially when it comes to scheduling I/O. Consolidating disks reduces context switching to fewer devices.
- Cache and Queue Efficiency: Local caches, queues, and scheduling are more efficient when dealing with fewer block devices.
- Queueing per Volume (Resource Utilization)
- Storage Controller Queues: Arrays often maintain queue depths per volume. Large numbers of small volumes can saturate management overhead, requiring more powerful (and expensive) controllers to handle the load.
- Load Balancing: It’s simpler to manage and balance I/O when the controller sees a smaller set of volumes with predictable I/O patterns.
- Larger Controllers, Bigger Servers, More Monitoring
- Scaling Out Storage: Buying higher-class or additional controllers just to manage the overhead from too many LUNs is an unnecessary expense.
- Infrastructure Bloat: More volumes lead to more hardware or resources to manage them—CPU, RAM, monitoring nodes, etc.
- More Objects => Harder etcd Replication and Encryption
- Increased Attack Surface: More objects in etcd leads to bigger encryption footprints and potentially more complexity in key management.
- Replication Throughput: If you run multiple control planes or DR sites for your cluster, replicating enormous etcd databases slows down cluster operations.
- Backups
- Many Volumes to Track: Backup solutions that integrate with Kubernetes or array snapshots will have to handle far more objects. This can significantly lengthen backup windows.
- Storage and Network I/O: Dumping thousands of volumes simultaneously can saturate both on-array snapshots and network backups.
- Automations
- Complex Scripting: The more volumes you have, the more you must track in automation, from naming conventions to cleanup routines.
- CI/CD Pipeline Impact: Automated environments that spin up ephemeral VMs for testing suffer from prolonged creation and teardown times with numerous volumes.
Additional Points / Angles to Consider
- Simplified Application-Level Recovery
When each VM is consolidated into a single PVC, application-level recovery is more straightforward because there’s a single point of restore. You don’t need to worry about out-of-sync volumes for the same VM. - Cost Implications
Many storage vendors license features (snapshots, replication, management tools) based on the number of volumes or capacity. Reducing the total volume count can translate into license and support cost savings. - Deduplication and Compression
Dedup and compression engines can sometimes be more efficient with a smaller number of larger volumes than with a massive number of small volumes (depending on the vendor’s implementation). - Operational Simplicity for Tiering
If you use tiered storage (e.g., SSD for hot data, HDD for cold data), having fewer volumes simplifies data movement policies and usage tracking at the VM level. - Future-Proofing
As container platforms and Kubernetes evolve, we can expect more advanced features for volume management, backups, and replication. Having fewer, larger volumes sets you up for simpler adoption of new features (e.g., CSI enhancements) without rewriting automation to manage thousands of LUNs. - Security Policy Management
Whether you apply encryption keys, access controls, or network-level policies, these are often applied per volume or per storage object. Reducing the number of volumes simplifies enforcement of security and compliance rules. - Cross-Team Collaboration
Storage administrators, Kubernetes administrators, virtualization engineers, and security teams all prefer a simpler environment. Collaboration friction goes down when there’s less complexity in mapping volumes to VMs.
List any affected packages or components.
Openshift Virtualization