Loading...

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: CNV v4.15.8
Affects Version/s: CNV v4.15.7
Component/s: Storage Platform
Labels:
None

Activity Type:
Incidents & Support
Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
CNV v4.18.0.rhel9-398, CNV v4.15.8.rhel9-52
Market:

Sprint:
Storage Core Sprint 263

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

The issue is in the vddk datasource.

The issue looks to be because QueryChangedDiskAreas is only giving a maximum of “2000” changed blocks/query. Let me try to explain it below.
On a 16 GiB disk, I created a snapshot. I used the dd command to write 1 KiB of data at 2 MiB intervals as below:

for i in $(seq 2048 2048 20480000); do dd if=/dev/urandom bs=1k seek=$i of=/dev/sdb count=1;done

As per this, if I do QueryChangedDiskAreas, it should show changes from on offset 2097152, 4194304, 6291456, 8388608 ………. 17177772032.
I followed https://www.veeam.com/kb2092 to manually query the QueryChangedDiskAreas from vmware mob. When I passed 0 offset, the max number of changed blocks it was giving at a time was only 2000. So it stopped at 4194304000 offset and we are losing the rest. If we give the offset 4194304000, it shows the remaining 2000 changed blocks and so on.
IIUC CDI is only querying from offset 0 https://github.com/kubevirt/containerized-data-importer/blob/448fe9d566964f0fbff4fa2f4f0f4904e5331838/pkg/importer/vddk-datasource_amd64.go#L861 , and if the changed block areas exceed 2000 blocks, we lose changes in those blocks.
I did this same test with fio while mtv is doing the backup and manually queries QueryChangedDiskAreas using vmware mob. When there is corruption, we have 2000+ block area changes. If I compare dump from sdb from the VM and the downloaded image using cmp, the offset which shows the changes are the one which are not reported in QueryChangedDiskAreas with offset 0 (edited)

Version-Release number of selected component (if applicable):

4.15.7

How reproducible:

Always if more than 2000 changed blocks.

Steps to Reproduce:

1. Import from VMware
2. Change more than 2000 blocks in a snapshot (with high load)
3. Observe that the pulled snapshot is incomplete (ie .leading to disk corrpution as described in MTV-1679)

Actual results: