Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-126232

post '--nolocking --lockopt force' usage when sanlock has been "stopped", sanlock may not be able to be start again w/o a reboot

Linking RHIVOS CVEs to...Migration: Automation ...RHELPRIO AssignedTeam ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-9.8
    • sanlock
    • None
    • None
    • None
    • rhel-storage-lvm
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      After the following test scenario setup the environment to test and then verify RHEL-117154|RHEL-108163, it was left with a state where sanlock was not able to be started again. Is there a way to clean up from this with out a reboot?

      kernel-5.14.0-625.el9    BUILT: Wed Oct 15 11:32:28 AM CEST 2025
      lvm2-2.03.33-1.el9    BUILT: Tue Sep 30 02:15:40 PM CEST 2025
      lvm2-libs-2.03.33-1.el9    BUILT: Tue Sep 30 02:15:40 PM CEST 2025
      lvm2-lockd-2.03.33-1.el9    BUILT: Tue Sep 30 02:15:40 PM CEST 2025
      sanlock-4.1.0-1.el9    BUILT: Thu Oct  9 02:00:39 PM CEST 2025
      sanlock-lib-4.1.0-1.el9    BUILT: Thu Oct  9 02:00:39 PM CEST 2025
       
       
      # Scenario that set up this state: 
       
      SCENARIO - force_remove_shared_vdo_vg_wo_global_lock_wo_daemons_running:  Test the new force lockopt remove option when no global lock exists (RHEL-117154|RHEL-108163) 
      Present shared storage view and enable locking on other nodes
      Setting use_lvmlockd to enable
      Setting lvmlocal.conf host_id to 990
      (virt-495.cluster-qe.lab.eng.brq.redhat.com): systemctl start sanlock
      (virt-495.cluster-qe.lab.eng.brq.redhat.com): systemctl start lvmlockd
      adding entry to the devices file for /dev/sda on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --adddev /dev/sda
      creating PV on virt-495.cluster-qe.lab.eng.brq.redhat.com using device /dev/sda
      pvcreate --yes -ff  --nolock /dev/sda
        Physical volume "/dev/sda" successfully created.
      adding entry to the devices file for /dev/sdb on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --adddev /dev/sdb
      creating PV on virt-495.cluster-qe.lab.eng.brq.redhat.com using device /dev/sdb
      pvcreate --yes -ff  --nolock /dev/sdb
        Physical volume "/dev/sdb" successfully created.
      adding entry to the devices file for /dev/sdc on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --adddev /dev/sdc
      creating PV on virt-495.cluster-qe.lab.eng.brq.redhat.com using device /dev/sdc
      pvcreate --yes -ff  --nolock /dev/sdc
        Physical volume "/dev/sdc" successfully created.
      adding entry to the devices file for /dev/sdd on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --adddev /dev/sdd
      creating PV on virt-495.cluster-qe.lab.eng.brq.redhat.com using device /dev/sdd
      pvcreate --yes -ff  --nolock /dev/sdd
        Physical volume "/dev/sdd" successfully created.
      adding entry to the devices file for /dev/sde on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --adddev /dev/sde
      creating PV on virt-495.cluster-qe.lab.eng.brq.redhat.com using device /dev/sde
      pvcreate --yes -ff  --nolock /dev/sde
        Physical volume "/dev/sde" successfully created.
      adding entry to the devices file for /dev/sdf on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --adddev /dev/sdf
      creating PV on virt-495.cluster-qe.lab.eng.brq.redhat.com using device /dev/sdf
      pvcreate --yes -ff  --nolock /dev/sdf
        Physical volume "/dev/sdf" successfully created.
      creating VG on virt-495.cluster-qe.lab.eng.brq.redhat.com using PV(s) /dev/sda
      vgcreate --shared   vdo_sanity_global /dev/sda
        Enabling sanlock global lock
        Logical volume "lvmlock" created.
        Volume group "vdo_sanity_global" successfully created
        VG vdo_sanity_global starting sanlock lockspace
        Starting locking.  Waiting until locks are ready...
      creating VG on virt-495.cluster-qe.lab.eng.brq.redhat.com using PV(s) /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
      vgcreate --shared   vdo_sanity_force_remove /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
        Logical volume "lvmlock" created.
        Volume group "vdo_sanity_force_remove" successfully created
        VG vdo_sanity_force_remove starting sanlock lockspace
        Starting locking.  Waiting until locks are ready...
      lvcreate --yes --type vdo -n vdo_lv -aey -L 25G vdo_sanity_force_remove -V 25G  
      Wiping vdo signature on /dev/vdo_sanity_force_remove/vpool0.
          The VDO volume can address 22 GB in 11 data slabs, each 2 GB.
          It can grow to address at most 16 TB of physical storage in 8192 slabs.
          If a larger maximum size might be needed, use bigger slabs.
        Logical volume "vdo_lv" created.
       
      deactivating LV vdo_lv on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvchange --yes -an  vdo_sanity_force_remove/vdo_lv
      (virt-495.cluster-qe.lab.eng.brq.redhat.com): systemctl stop sanlock
      (virt-495.cluster-qe.lab.eng.brq.redhat.com): systemctl stop lvmlockd
      WARNING: lvmlockd process is not running.
        Reading without shared global lock.
        Reading VG vdo_sanity_force_remove without a lock.
        Reading VG vdo_sanity_global without a lock.
       
      vgremove vdo_sanity_force_remove
      vgremove -ff vdo_sanity_force_remove
      vgremove --nolocking --yes vdo_sanity_force_remove
      vgremove --lockopt force vdo_sanity_force_remove
      Check for new override flag when no global lock exists (RHEL-117154|RHEL-108163)
      vgremove --nolocking --lockopt force --yes vdo_sanity_force_remove
      Volume group "vdo_sanity_force_remove" successfully removed
       
      vgremove --nolocking --lockopt force --yes vdo_sanity_global
      Volume group "vdo_sanity_global" successfully removed
       
      Setting use_lvmlockd to disable
      Disabling lvmlocal.conf use of host_id
      removing pv /dev/sda on virt-495.cluster-qe.lab.eng.brq.redhat.com
        Labels on physical volume "/dev/sda" successfully wiped.
      removing entry from the devices file for /dev/sda on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --deldev /dev/sda
      removing pv /dev/sdb on virt-495.cluster-qe.lab.eng.brq.redhat.com
        Labels on physical volume "/dev/sdb" successfully wiped.
      removing entry from the devices file for /dev/sdb on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --deldev /dev/sdb
      removing pv /dev/sdc on virt-495.cluster-qe.lab.eng.brq.redhat.com
        Labels on physical volume "/dev/sdc" successfully wiped.
      removing entry from the devices file for /dev/sdc on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --deldev /dev/sdc
      removing pv /dev/sdd on virt-495.cluster-qe.lab.eng.brq.redhat.com
        Labels on physical volume "/dev/sdd" successfully wiped.
      removing entry from the devices file for /dev/sdd on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --deldev /dev/sdd
      removing pv /dev/sde on virt-495.cluster-qe.lab.eng.brq.redhat.com
        Labels on physical volume "/dev/sde" successfully wiped.
      removing entry from the devices file for /dev/sde on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --deldev /dev/sde
      removing pv /dev/sdf on virt-495.cluster-qe.lab.eng.brq.redhat.com
        Labels on physical volume "/dev/sdf" successfully wiped.
      removing entry from the devices file for /dev/sdf on virt-495.cluster-qe.lab.eng.brq.redhat.com
      lvmdevices -y --config devices/scan_lvs=1  --deldev /dev/sdf
       
      Searching for alignment inconsistency warnings in /var/log/messages
      
      
      # POST Scenario state
      
      [root@virt-495 ~]# systemctl status sanlock
      Ã sanlock.service - Shared Storage Lease Manager
           Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; preset: disabled)
           Active: failed (Result: timeout) since Tue 2025-11-04 18:12:45 CET; 7min ago
         Duration: 1min 38.258s
             Docs: man:sanlock(8)
          Process: 1824 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
         Main PID: 1828
            Tasks: 5 (limit: 24974)
           Memory: 27.0M (peak: 31.3M)
              CPU: 887ms
           CGroup: /system.slice/sanlock.service
                   ââ1828 /usr/sbin/sanlock daemon
       
      Nov 04 18:08:07 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Starting Shared Storage Lease Manager...
      Nov 04 18:08:07 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Started Shared Storage Lease Manager.
      Nov 04 18:08:07 virt-495.cluster-qe.lab.eng.brq.redhat.com sanlock[1828]: sanlock daemon started 4.1.0 host 7401488b-d1f8-4a72-bc64-7e5a54730b9a.virt-495.cl (virt-495.cluster-qe.lab.eng.brq.redhat.com)
      Nov 04 18:09:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopping Shared Storage Lease Manager...
      Nov 04 18:09:45 virt-495.cluster-qe.lab.eng.brq.redhat.com sanlock[1828]: 2025-11-04 18:09:45 780 [1828]: helper pid 1829 term signal 15
      Nov 04 18:11:15 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
      Nov 04 18:12:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: State 'final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
      Nov 04 18:12:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Failed with result 'timeout'.
      Nov 04 18:12:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Unit process 1828 (sanlock) remains running after unit stopped.
      Nov 04 18:12:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopped Shared Storage Lease Manager.
      [root@virt-495 ~]# systemctl stop sanlock
      [root@virt-495 ~]# echo $?
      0
      [root@virt-495 ~]# systemctl start sanlock
      Job for sanlock.service failed because of unavailable resources or another system error.
      See "systemctl status sanlock.service" and "journalctl -xeu sanlock.service" for details.
      [root@virt-495 ~]# echo $?
      1
      [root@virt-495 ~]# sanlock gets -h 1
      gets error -111
      [root@virt-495 ~]# dmsetup ls
      rhel_virt--495-root     (253:0)
      rhel_virt--495-swap     (253:1)
      [root@virt-495 ~]# sanlock status
      [root@virt-495 ~]# systemctl status sanlock.service
      Ã sanlock.service - Shared Storage Lease Manager
           Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; preset: disabled)
           Active: failed (Result: resources) since Tue 2025-11-04 18:21:00 CET; 2min 21s ago
         Duration: 1min 38.258s
             Docs: man:sanlock(8)
              CPU: 897ms
       
      Nov 04 18:20:59 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Will not start SendSIGKILL=no service of type KillMode=control-group or mixed while processes exist
      Nov 04 18:20:59 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Failed to run 'start' task: Device or resource busy
      Nov 04 18:20:59 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Starting Shared Storage Lease Manager...
      Nov 04 18:21:00 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Failed with result 'resources'.
      Nov 04 18:21:00 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Failed to start Shared Storage Lease Manager.
       
      [root@virt-495 ~]# journalctl -xeu sanlock.service
      ââ A start job for unit sanlock.service has finished successfully.
      ââ 
      ââ The job identifier is 1588.
      Nov 04 18:08:07 virt-495.cluster-qe.lab.eng.brq.redhat.com sanlock[1828]: sanlock daemon started 4.1.0 host 7401488b-d1f8-4a72-bc64-7e5a54730b9a.virt-495.cl (virt-495.cluster-qe.lab.eng.brq.redhat.com)
      Nov 04 18:09:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopping Shared Storage Lease Manager...
      ââ Subject: A stop job for unit sanlock.service has begun execution
      ââ Defined-By: systemd
      ââ Support: https://access.redhat.com/support
      ââ 
      ââ A stop job for unit sanlock.service has begun execution.
      ââ 
      ââ The job identifier is 4692.
      Nov 04 18:09:45 virt-495.cluster-qe.lab.eng.brq.redhat.com sanlock[1828]: 2025-11-04 18:09:45 780 [1828]: helper pid 1829 term signal 15
      Nov 04 18:11:15 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
      Nov 04 18:12:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: State 'final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
      Nov 04 18:12:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Failed with result 'timeout'.
      ââ Subject: Unit failed
      ââ Defined-By: systemd
      ââ Support: https://access.redhat.com/support
      ââ 
      ââ The unit sanlock.service has entered the 'failed' state with result 'timeout'.
      Nov 04 18:12:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Unit process 1828 (sanlock) remains running after unit stopped.
      Nov 04 18:12:45 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopped Shared Storage Lease Manager.
      ââ Subject: A stop job for unit sanlock.service has finished
      ââ Defined-By: systemd
      ââ Support: https://access.redhat.com/support
      ââ 
      ââ A stop job for unit sanlock.service has finished.
      ââ 
      ââ The job identifier is 4692 and the job result is done.
      Nov 04 18:20:59 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Will not start SendSIGKILL=no service of type KillMode=control-group or mixed while processes exist
      Nov 04 18:20:59 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Failed to run 'start' task: Device or resource busy
      Nov 04 18:20:59 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Starting Shared Storage Lease Manager...
      ââ Subject: A start job for unit sanlock.service has begun execution
      ââ Defined-By: systemd
      ââ Support: https://access.redhat.com/support
      ââ 
      ââ A start job for unit sanlock.service has begun execution.
      ââ 
      ââ The job identifier is 7169.
      Nov 04 18:21:00 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: sanlock.service: Failed with result 'resources'.
      ââ Subject: Unit failed
      ââ Defined-By: systemd
      ââ Support: https://access.redhat.com/support
      ââ 
      ââ The unit sanlock.service has entered the 'failed' state with result 'resources'.
      Nov 04 18:21:00 virt-495.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Failed to start Shared Storage Lease Manager.
      ââ Subject: A start job for unit sanlock.service has failed
      ââ Defined-By: systemd
      ââ Support: https://access.redhat.com/support
      ââ 
      ââ A start job for unit sanlock.service has finished with a failure.
      ââ 
      ââ The job identifier is 7169 and the job result is failed.
      [root@virt-495 ~]# ps -elf | grep 7169
      0 S root        3847    1607  0  80   0 -  1604 pipe_r 18:25 pts/0    00:00:00 grep --color=auto 7169
      [root@virt-495 ~]# ps -elf | grep 4692
      0 S root        3875    1607  0  80   0 -  1604 pipe_r 18:28 pts/0    00:00:00 grep --color=auto 4692
      [root@virt-495 ~]# ps -elf | grep sanlock
      0 S root        3879    1607  0  80   0 -  1604 pipe_r 18:28 pts/0    00:00:00 grep --color=auto sanlock
      

              teigland@redhat.com David Teigland
              cmarthal@redhat.com Corey Marthaler
              David Teigland David Teigland
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: