-
Bug
-
Resolution: Done
-
Critical
-
CNV v4.18.0
-
None
-
Quality / Stability / Reliability
-
13
-
False
-
-
False
-
None
-
-
CNV Storage 270
-
Critical
-
None
Description of problem:
I configured two Windows 2019 servers for WSFC . The cluster VMs are running on two separate nodes. The disk used is an iSCSI LUN and multipath is enabled on the OCP nodes.
The disks are passed with "reservation: true". While doing the validation test, from the Windows, it pass the "list disks" test, but fails at "Validate SCSI-3 Persistent Reservation" with following error:
Failure issuing call to Persistent Reservation RESERVE on Test Disk 0 from node WIN-5E7SHBGUBJP.mywincluster.com when that node has successfully registered. It is expected to succeed. The request could not be performed because of an I/O device error.
.
Test Disk 0 does not provide Persistent Reservations support for the mechanisms used by failover clusters. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Please contact your storage administrator or storage vendor to check the configuration of the storage to allow it to function properly with failover clusters.
I can add the disk in the cluster and will be online in one of the node. However, if I drain the owner node, the status of the disks go offline with following error:
Cluster resource 'Cluster Disk 1' of type 'Physical Disk' in clustered role 'Available Storage' failed. The error code was '0xaa' ('The requested resource is in use.').
I straced the qemu-pr process from the node and I can see following error while doing the new reservation:
132910 08:37:03.867058 ioctl(15</dev/sdb<block 8:16>>, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_TO_DEV, cmd_len=10, cmdp="\x5f\x01\x05\x00\x00\x00\x00\x00\x18\x00", mx_sb_len=160, iovec_count=0, dxfer_len=24, timeout=2000, flags=0, dxferp="\x6f\x2b\x80\x34\x74\x66\x73\x4d\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", status=0x18, masked_status=0xc, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=19, info=SG_INFO_CHECK}) = 0
3684797 08:37:03.938222 write(2<pipe:[483278317]>, "mpathb: configured reservation key doesn't match: 0x0\n", 54) = 54
Error is from the mpathpersist api. It's showing key as "0x0". Also, note that qemu-pr connections to multipathd is failing:
3684797 08:37:08.176570 connect(15<UNIX-STREAM:[489686834]>, {sa_family=AF_UNIX, sun_path=@"/org/kernel/linux/storage/multipathd"}, 39) = -1 ECONNREFUSED (Connection refused)
Since it runs in the virt-handler pod, it currently don't have a way to communicate with multipathd in the node. So it cannot send the key to the multipathd and will not be able to save the key. Is this the reason it is showing the key in the error as "0x0"? Not sure mpathpersist will work without multipathd daemon.
If I remove the disk from the multipath all the validations tests are passing and I can move the disks ownership between the nodes without any problem.
Version-Release number of selected component (if applicable):
OpenShift Virtualization 4.17.3
How reproducible:
100%
Steps to Reproduce:
1. Create a iSCSI PV and PVC.
2. Enable multipath in the OCP nodes where the VM is running:
# mpathconf --enable # systemctl restart multipathd
3. Since I only have single path, I also have to set find_multipaths no. Before the test, confirm that the iSCSI device is added to the multipath.
4. Create an AD server and connect two Windows server 2019 VMs to this AD server.
5. Pass the disk to both the VMs:
- lun: bus: scsi reservation: true name: disk-chocolate-crane-84 shareable: true
6. Create the WSFC cluster and try validation test from it. It will fail in the test "Validate SCSI-3 Persistent Reservation".
Actual results:
Windows Server Failover Cluster (WSFC) validation is not working with multipath LUNs
Expected results:
Most production environment will have multiple paths for their SAN LUNs. So WSFC should work with multipath.
Additional info:
- clones
-
CNV-55651 Windows Server Failover Cluster (WSFC) validation is not working with multipath LUNs
-
- Closed
-
- split to
-
CNV-55937 DOC - Windows Server Failover Cluster (WSFC) validation is not working with multipath LUNs
-
- Closed
-
-
RHEL-85749 [RCA] Windows Server Failover Cluster (WSFC) validation is not working with multipath LUNs
-
- Closed
-
- links to
-
RHEA-2025:147435
OpenShift Virtualization 4.17.8 Images
- mentioned on