Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- mtv-storage-offload

Activity Type:
Future Sustainability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Status Summary:
Hide

An analysis was conducted on a 100GB test file to compare the performance of several disk integrity verification methods. The following experiments were run in a containerized environment, testing with 1, 2, and 4 CPU cores:

Full Scan (blksum): A multi-threaded, optimized C application using the blake3 hash.

Full Scan (Parallel CRC32): A custom Python implementation using the multiprocessing library.

Statistical Sampling: A probabilistic method reading ~3.6% of the disk to detect >=1MB errors with 99.99% confidence.

Key Findings:

I/O Bound Performance: The primary bottleneck for full-scan methods is the disk's read speed. Both blksum and the parallel Python CRC32 script performed similarly, indicating they are both efficient enough to be limited by disk I/O rather than CPU processing.

Optimal CPU Scaling: Performance peaked at 2 CPU cores. Adding more cores (e.g., 4) did not decrease the runtime and, in some cases, slightly increased it due to overhead. This confirms the process is I/O-bound, as 2 cores are sufficient to process data as fast as the disk can supply it.

Statistical Sampling Inefficiency: The random sampling method was significantly slower than a full sequential scan. This is due to the performance penalty of random I/O (disk seek time) outweighing the benefit of reading less data.

Confidence Levels:

Full scan methods (blksum, CRC32) provide 100% confidence for detecting accidental corruption.

Statistical sampling provides 99.99% confidence but only for errors of a specified minimum size (e.g., 1MB).

Conclusion: For ensuring complete data integrity, a parallel, full-scan checksum is the most reliable method. The blksum tool is a robust, pre-built solution that scales well up to the point of the I/O limit. Our custom parallel CRC32 script successfully replicated this performance, confirming the disk as the bottleneck.
Show
An analysis was conducted on a 100GB test file to compare the performance of several disk integrity verification methods. The following experiments were run in a containerized environment, testing with 1, 2, and 4 CPU cores: Full Scan (blksum): A multi-threaded, optimized C application using the blake3 hash. Full Scan (Parallel CRC32): A custom Python implementation using the multiprocessing library. Statistical Sampling: A probabilistic method reading ~3.6% of the disk to detect >=1MB errors with 99.99% confidence. Key Findings: I/O Bound Performance: The primary bottleneck for full-scan methods is the disk's read speed. Both blksum and the parallel Python CRC32 script performed similarly, indicating they are both efficient enough to be limited by disk I/O rather than CPU processing. Optimal CPU Scaling: Performance peaked at 2 CPU cores . Adding more cores (e.g., 4) did not decrease the runtime and, in some cases, slightly increased it due to overhead. This confirms the process is I/O-bound, as 2 cores are sufficient to process data as fast as the disk can supply it. Statistical Sampling Inefficiency: The random sampling method was significantly slower than a full sequential scan. This is due to the performance penalty of random I/O (disk seek time) outweighing the benefit of reading less data. Confidence Levels: Full scan methods ( blksum , CRC32) provide 100% confidence for detecting accidental corruption. Statistical sampling provides 99.99% confidence but only for errors of a specified minimum size (e.g., 1MB). Conclusion: For ensuring complete data integrity, a parallel, full-scan checksum is the most reliable method. The blksum tool is a robust, pre-built solution that scales well up to the point of the I/O limit. Our custom parallel CRC32 script successfully replicated this performance, confirming the disk as the bottleneck.

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Intelligence Requested:
Market:

Verifying the integrity of large disks (100GB+) during VM migrations using a full checksum is a time-consuming process that can significantly extend maintenance windows. We need to identify the fastest and most efficient method to ensure data integrity with a high degree of confidence.

Full results:
https://docs.google.com/document/d/1alCNC5wRhZrWVNR_6fAS1Spe6yU0206Io3YElFNPMIU/edit?tab=t.0#heading=h.rpo38b11pwlo

Assignee:: Amit Weinstock

Reporter:: Roy Golan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/08/28 8:23 AM

Updated:: 2025/09/28 10:52 AM

Resolved:: 2025/09/28 10:52 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates