Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-96508

Get failed result on "Validate Simultaneous Failover" in WSFC testing

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • No
    • None
    • rhel-storage-dm
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      This issue similare as RHEL-95490
      Get failed result on "Validate Disk Arbitration" in WSFC testing

      It is a different error, but it may be the same reason as above.

      Pass through the host FC multipath device to two Windows guests.

      The Windows guests login the Windows domain.

      The host is switching the active device to simulate failover on the host side.

      Run the Failover cluster manager validation (WSFC) test multiple times.

      Sometimes it gets a failed result on "Validate Simultaneous Failover" in the test report

      (There is no error if no active patch change occurs during the test.)

      What is the impact of this issue to you?
      Please provide the package NVR for which the bug is seen:
      Red Hat Enterprise Linux release 9.4 (Plow)
      Red Hat Enterprise Linux release 9.4 (Plow)
      5.14.0-427.68.1.rhel89485.el9_4.x86_64
      device-mapper-1.02.197-2.el9.x86_64
      device-mapper-multipath-0.8.7-27.el9_4.2.rhel94533.x86_64
      qemu-kvm-8.2.0-11.el9_4.12.rhel_65852_v2.x86_64
      seabios-bin-1.16.3-2.el9.noarch
      edk2-ovmf-20231122-6.el9.noarch

      Guest: Windows 2019

      How reproducible is this bug?:
      10%

      Steps to reproduce

      ENV:
      multipath -l
      mpatha (360050768128001da580000000000000b) dm-3 IBM,2145
      size=300G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw

      + policy='service-time 0' prio=0 status=enabled
      `- 10:0:3:0 sde 8:64 failed faulty offline
      + policy='service-time 0' prio=0 status=enabled
      `- 11:0:3:0 sdi 8:128 failed faulty offline
      + policy='service-time 0' prio=0 status=enabled
      `- 10:0:2:0 sdc 8:32 active undef running
      `+ policy='service-time 0' prio=0 status=enabled
      `- 11:0:1:0 sdg 8:96 active undef running
      41:00.0 Fibre Channel: Emulex Corporation LPe15000/LPe16000 Series 8Gb/16Gb Fibre Channel Adapter (rev 30)

      host multipath.conf:
      defaults

      { user_friendly_names yes find_multipaths yes enable_foreign "^$" reservation_key file #no_path_retry "queue" }

      overrides

      { path_grouping_policy failover }

      ======================================================
      1. Boot Windows guest with the pass-through multipath device

      /usr/libexec/qemu-kvm \
       -name node1 \
       -machine q35 \
       -nodefaults \
       -device VGA,bus=pcie.0,addr=0x1 \
       -device pvpanic,ioport=0x505,id=idZcGD6F \
       -device pcie-root-port,id=pcie-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
       -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \
       -device pcie-root-port,id=pcie-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
       -device pcie-root-port,id=pcie-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
       -device pcie-root-port,id=pcie-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
       -device pcie-root-port,id=pcie-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
       -device virtio-scsi-pci,id=scsi0,bus=pcie-root-port-3,addr=0x0 \
       -device virtio-scsi-pci,id=scsi1,bus=pcie-root-port-4,addr=0x0,max_sectors=512 \
       -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \
       -blockdev driver=file,node-name=file_disk,cache.direct=off,cache.no-flush=on,filename=/home/msfo/ms-node1.qcow2 \
       -blockdev driver=qcow2,node-name=protocol_disk,file=file_disk \
       -device scsi-hd,drive=protocol_disk,bus=scsi0.0,serial=node1,id=os_disk,bootindex=1 \
       -blockdev driver=host_device,filename=/dev/mapper/mpatha,cache.direct=on,node-name=drive_disk,pr-manager=helper0 \
       -blockdev driver=raw,node-name=host_disk,file=drive_disk \
       -device scsi-block,bus=scsi1.0,drive=host_disk,id=scsi0-0-0-0,bootindex=2,werror=stop,rerror=stop \
       -device virtio-net-pci,mac=9a:95:96:97:98:91,id=idKSMZST,netdev=idWCSiU5,bus=pcie-root-port-6,addr=0x0 \
       -netdev tap,id=idWCSiU5,script=/etc/qemu-ifup,vhost=on \
       -m 12G \
       -cpu host,vmx,+kvm_pv_unhalt \
       -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \
       -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
       -vnc :7 \
       -rtc base=localtime,clock=host,driftfix=slew \
       -boot order=cdn,once=c,menu=off,strict=off \
       -enable-kvm \
       -monitor stdio \
      

      2. Initialize the disk in the guest if needs

      3. Run switch active path on the host side, it will change the active path , most case the path be changed in 30S
      ./multipath-switch.sh -m mpatha -c2

      4.Run the Failover cluster manager validation (WSFC) test multiple times.
      > 10 times

      Expected results
      No errors or warnings in the testing

      Actual results
      Sometimes it gets a failed result on "Validate Simultaneous Failover" in test report

        1. multipath-switch.sh
          10 kB
          qing wang
        2. Validate_Simultaneous_Failover.htm
          48 kB
          qing wang
        3. Validate_Simultaneous_Failover_2.htm
          48 kB
          qing wang

              rhn-engineering-bmarzins Benjamin Marzinski
              qingwangrh qing wang
              Benjamin Marzinski Benjamin Marzinski
              Lin Li Lin Li
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: