-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
In a customer case we deployed monitoring at kernel block layer to detect IO delays from the storage, which are known to cause "MEMORY MANAGEMENT" and "CRITICAL PROCESS DIED" BSODs.
Shortly after, we got some strange results of BSODs still ocurring but not always having a matching IO delay (at the kernel block layer at least).
We already know that this kind of BSOD can happen when the storage is slow, but it appears it can also happen without slow storage.
I'm now able to reliably reproduce the behaviour, without any storage delay.
Scenario:
- Windows 2022, single virtio-blk disk.
- Latest virtio-win (1.9.50)
- Local NVMe Storage, block mode
- 10 vCPUs, 600MiB RAM (to force a lot of swap in/out)
- Only happens on AIO=native and virtio-blk. If I switch to AIO=threads or virtio-scsi it can run for hours.
- Disk timeout reduced to 20s (HKLM\System\CurrentControlSet\Services\viostor\Parameters\IoTimeoutValue)
- Windows pagefile set to fixed 8GiB
- qemu-kvm-core-9.1.0-15.el9_6.9.x86_64
Steps:
1. Fire up the Windows VM
2. Get postgres 18.1 here https://www.enterprisedb.com/downloads/postgres-postgresql-downloads
3. Install it (setup the postgres user password as it will be used in item 4-5)
4. Open console and initialize it
cd C:\Program Files\PostgreSQL\18\bin pgbench.exe -i -s 300 postgres -U postgres
5. Run the benchmark (repeat only this step to reproduce again)
cd C:\Program Files\PostgreSQL\18\bin pgbench.exe -c 60 -j 200 -T 3600 postgres -U postgres
4. BSOD will come in 10-60s usually (with NVMe storage)
5. Optional - Open Edge and play a youtube video (useful if on slower storage)
Outputs:
- Its the CRC BSOD
![]()
![]()
- The qemu histogram (BlockBackend AFAIK) doesn't show any latency at that time
# ./block_histogram.sh virt-launcher-windows-2022-postgres-mbxsf 120
Enabling histogram on /machine/peripheral/ua-rootdisk/virtio-backend...
Monitoring for 120 seconds...
Collecting statistics...
Device: /machine/peripheral/ua-rootdisk/virtio-backend
Latency Range READ WRITE FLUSH
<10ms 779380 487391 31950
10-100ms 56 22 3
100ms-1s 0 0 0
1-10s 0 0 0
10-60s 0 0 0
>60s 0 0 0
Disabling histogram on /machine/peripheral/ua-rootdisk/virtio-backend...
Raw JSON output saved to /tmp/blockstats.json
So it seems the host OS and storage are out of the picture here. There is something between qemu and the guest OS?
- clones
-
RHEL-141510 "MEMORY MANAGEMENT" BSOD with no storage delay and only on virtio-blk
-
- Release Pending
-