-
Bug
-
Resolution: Unresolved
-
Normal
-
CNV v4.17.4
-
Quality / Stability / Reliability
-
5
-
False
-
-
False
-
None
-
-
CNV Storage Sprint 284
-
Important
-
None
Description of problem:
I have a disk shared between two servers:
# oc get vm rhel9-cluster-1 -o json |jq '.spec.template.spec.volumes[2]' { "dataVolume": { "name": "dv-rhel9-cluster-1-fuchsia-marlin-53" <== }, "name": "shared-disk" } # oc get vm rhel9-cluster-2 -o json |jq '.spec.template.spec.volumes[2]' { "name": "shared-disk", "persistentVolumeClaim": { "claimName": "dv-rhel9-cluster-1-fuchsia-marlin-53" <=== } }
I tried taking snapshot of VM "rhel9-cluster-1". This only freezed the i/o of this VM. So technically, other VM rhel9-cluster-2 can still send the i/o while taking the snapshot of the first VM and can make the snapshot inconsistent.
Now if I restore the snapshot taken for the VM rhel9-cluster-1, it creates a new PVC from the snapshot and point the VM spec to the restored PVC. Now both the VMs endup pointing to two different PVCs:
$ oc get vm rhel9-cluster-1 -o json |jq '.spec.template.spec.volumes[2]' { "dataVolume": { "name": "restore-9dc33d1b-ee74-4e66-a167-d96f444cad5b-shared-disk" }, "name": "shared-disk" } $ oc get vm rhel9-cluster-2 -o json |jq '.spec.template.spec.volumes[2]' { "name": "shared-disk", "persistentVolumeClaim": { "claimName": "dv-rhel9-cluster-1-fuchsia-marlin-53" } }
After this, the rhel9-cluster-2 fails to start because the PVC "dv-rhel9-cluster-1-fuchsia-marlin-53" is deleted:
]$ oc get vmi rhel9-cluster-2 -o json |jq '.status.conditions[2]' { "lastProbeTime": null, "lastTransitionTime": "2025-02-07T10:54:17Z", "message": "PVC nijin-cnv/dv-rhel9-cluster-1-fuchsia-marlin-53 does not exist, waiting for it to appear", "reason": "FailedPvcNotFound", "status": "False", "type": "Synchronized" }
We have to manually correct the spec of the VM cluster-2:
Version-Release number of selected component (if applicable):
OpenShift Virtualization 4.17.4
How reproducible:
100%
Steps to Reproduce:
1. Create two VMs with a shared disk between them. 2. Create a snapshot and restore it and observe the above behaviour.
Actual results:
Problems in taking snapshot for the shared volumes/disks
Expected results:
It is not clear if the snapshot is supported for shared disks. If it's not supported, then shared disks should be excluded from the snapshot operation and the limitation should be documented.
Additional info: