-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
Goal
When a VM is configured with a SCSI passthrough device (scsi-block) that is backed by a dm-multipath block device, path failures should not result in an I/O error, but failover should happen and the request should be completed using another path.
Acceptance criteria
Given a VM with a scsi-block device backed by dm-multipath, put it under I/O load and then make a path fail (e.g. by stopping the server or cutting the connection to it). The guest shouldn't observe I/O errors and the VM shouldn't be stopped for I/O errors either as long as at least one working path is left.
Background
SCSI passthrough is a feature that is customers like to use together with multipath, expecting to get working failover functionality. We're currently supporting SCSI passthrough in RHEL virt only with support exceptions, but it is available in OSV without such restrictions. See RHEL-29270 for the details.
The way scsi-block works is that it sends SG_IO ioctls to the block device that backs it. For dm-multipath, ioctl data is completely opaque and only forwarded to the path it selected, so it doesn't even see errors communicated on the SCSI level. This means that instead of failing a path on a path error, the error result is returned to userspace instead. QEMU can inspect the SG_IO result, but it currently doesn't have a way to retry the request on a different path.
In order to solve this problem, we need to find a way to communicate from QEMU to the kernel that there is a bad path that needs to be failed. Once this is done, QEMU must retry the failed request.