-
Bug
-
Resolution: Unresolved
-
Critical
-
CNV v4.15.7
-
None
-
3
-
False
-
-
False
-
CNV v4.99.0.rhel9-1748, CNV v4.15.8.rhel9-52
-
---
-
---
-
-
Storage Core Sprint 263
-
None
Description of problem:
The issue is in the vddk datasource. The issue looks to be because QueryChangedDiskAreas is only giving a maximum of “2000” changed blocks/query. Let me try to explain it below. On a 16 GiB disk, I created a snapshot. I used the dd command to write 1 KiB of data at 2 MiB intervals as below: for i in $(seq 2048 2048 20480000); do dd if=/dev/urandom bs=1k seek=$i of=/dev/sdb count=1;done As per this, if I do QueryChangedDiskAreas, it should show changes from on offset 2097152, 4194304, 6291456, 8388608 ………. 17177772032. I followed https://www.veeam.com/kb2092 to manually query the QueryChangedDiskAreas from vmware mob. When I passed 0 offset, the max number of changed blocks it was giving at a time was only 2000. So it stopped at 4194304000 offset and we are losing the rest. If we give the offset 4194304000, it shows the remaining 2000 changed blocks and so on. IIUC CDI is only querying from offset 0 https://github.com/kubevirt/containerized-data-importer/blob/448fe9d566964f0fbff4fa2f4f0f4904e5331838/pkg/importer/vddk-datasource_amd64.go#L861 , and if the changed block areas exceed 2000 blocks, we lose changes in those blocks. I did this same test with fio while mtv is doing the backup and manually queries QueryChangedDiskAreas using vmware mob. When there is corruption, we have 2000+ block area changes. If I compare dump from sdb from the VM and the downloaded image using cmp, the offset which shows the changes are the one which are not reported in QueryChangedDiskAreas with offset 0 (edited)
Version-Release number of selected component (if applicable):
4.15.7
How reproducible:
Always if more than 2000 changed blocks.
Steps to Reproduce:
1. Import from VMware 2. Change more than 2000 blocks in a snapshot (with high load) 3. Observe that the pulled snapshot is incomplete (ie .leading to disk corrpution as described in MTV-1679)
Actual results:
Corrupted disk image
Expected results:
Consistent disk image
Additional info:
- causes
-
MTV-1679 XFS filesystem corruption after warm migration of VM from VMware (too many changed blocks)
- Closed
- is cloned by
-
CNV-51716 [4.17] Incomplete transfer of change blocks leads to data corruption (vddk datasource)
- ON_QA
-
CNV-51717 [4.16] Incomplete transfer of change blocks leads to data corruption (vddk datasource)
- ON_QA
- links to
-
RHEA-2024:142753 OpenShift Virtualization 4.15.8 Images
- mentioned on
(1 links to, 5 mentioned on)