Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-70283

Adding scsi device to a cluster results in a restart of cluster resources

Linking RHIVOS CVEs to...Migration: Automation ...RHELPRIO AssignedTeam ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhel-8.10, rhel-9.4, rhel-9.5, rhel-9.6, rhel-10.0
    • pacemaker
    • None
    • Yes
    • Low
    • 1
    • rhel-ha
    • 2
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • HA-PCMK Sprint #3: 2025-10-13
    • None
    • None
    • None

      What were you trying to do that didn't work?

      Adding/removing devices to fence_scsi & fence_mpath restart other cluster resources, which shouldn't happen - see https://bugzilla.redhat.com/show_bug.cgi?id=1872376 & https://bugzilla.redhat.com/show_bug.cgi?id=2177996.

      Please provide the package NVR for which the bug is seen:

      found in pacemaker-2.1.9-1.el9 (RHEL9.6), but goes back to pacemaker-2.1.7-1.el9.x86_64 (RHEL9.4). The issue is not present in pacemaker-2.1.6-10.1.el9_3.x86_64 (RHEL9.3).

      How reproducible is this bug?:

      always

      Steps to reproduce

      Have a cluster with shared devices, fence_scsi a some other resources, check time of start operation for the resources, then update a device for fence_scsi and check the start operation again.

      [root@virt-535 ~]# ls -lr /dev/disk/by-id/ | grep -m 3 "sda\|sdb\|sdc"
      lrwxrwxrwx. 1 root root  9 Dec  4 10:45 wwn-0x600140583126160982e4b92b7a8035fd -> ../../sdb
      lrwxrwxrwx. 1 root root  9 Dec  4 10:45 wwn-0x6001405388343a967d54aada774c632f -> ../../sda
      lrwxrwxrwx. 1 root root  9 Dec  4 10:45 wwn-0x6001405027be43412bf4c2989c61174e -> ../../sdc
      [root@virt-535 ~]# pcs stonith create scsi-fencing fence_scsi devices="/dev/disk/by-id/wwn-0x600140583126160982e4b92b7a8035fd" pcmk_host_check="static-list" pcmk_host_list="virt-535 virt-536" pcmk_reboot_action="off" meta provides="unfencing"
      [root@virt-535 ~]# pcs resource create r1 ocf:heartbeat:Dummy
      [root@virt-535 ~]# crm_resource --list-all-operations --resource r1 | grep start
      r1      (ocf:heartbeat:Dummy):   Started: r1_start_0 (node=virt-536, call=24, rc=0, last-rc-change='Wed Dec  4 15:14:12 2024', exec=19ms): complete

      > start time is at 15:14:12.

      [root@virt-535 ~]# pcs stonith update-scsi-devices scsi-fencing add /dev/disk/by-id/wwn-0x6001405388343a967d54aada774c632f
      [root@virt-535 ~]# crm_resource --list-all-operations --resource r1 | grep start
      r1      (ocf:heartbeat:Dummy):   Started: r1_start_0 (node=virt-536, call=28, rc=0, last-rc-change='Wed Dec  4 15:15:22 2024', exec=17ms): complete

      > new start time is 15:15:22.

      Expected results

      Other cluster resources are not restarted when editing scsi devices for fence_scsi & fence_mpath.

      Actual results

      The resources are restarted after editing scsi devices.

      Attachment

      4 CIB files - two from RHEL9.4, before updating the devices for fence_scsi and after. Another two for comparison from version RHEL9.3 (also before and after the disks update), where the problem wasn't present.

      After investigation done by mlisik@redhat.com, it seems that the problem might point to change of value of 

      <nvpair id="status-1-.node-unfenced" name="#node-unfenced"
      value="1733331940"/>

      This can be seen in the diff between RHEL9.4-before.xml and RHEL9.4-after.xml.

        1. RHEL9.4-before.xml
          11 kB
        2. RHEL9.3-before.xml
          11 kB
        3. RHEL9.3-after.xml
          12 kB
        4. RHEL9.4-after.xml
          12 kB
        5. scheduler.log
          685 kB

              rhn-engineering-kwenning Klaus Wenninger
              mmazoure Michal Mazourek
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: