Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-8335

lvconvert -m 0 will always take rimage_0 even if it is out-of-sync

    • lvm2-2.03.28-1.el9
    • None
    • Important
    • rhel-sst-logical-storage
    • ssg_filesystems_storage_and_HA
    • 12
    • 15
    • 1
    • Hide

      Already in release version of RHEL-9.5

      Show
      Already in release version of RHEL-9.5
    • QE ack, Dev ack
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • If docs needed, set a value
    • None

      +++ This bug was initially created as a clone of Bug #2133979 +++

      Description of problem: lvconvert -m 0 will always take rimage_0 even if it is out-of-sync.

      This is a rhel8.6 version of https://bugzilla.redhat.com/show_bug.cgi?id=2133978

      See below /dev/mapper/mpathb(1) is flagged as raid (I)mage out-of-sync but is still used to build the linear volume.

      [root@localhost ~]# lvs -ao +devices
      LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
      ..
      lvr1 tvg rwi-aor-r- 500.00m 100.00 lvr1_rimage_0(0),lvr1_rimage_1(0)
      [lvr1_rimage_0] tvg Iwi-aor-r- 500.00m /dev/mapper/mpathb(1)
      [lvr1_rimage_1] tvg iwi-aor--- 500.00m /dev/sdg(1)
      [lvr1_rmeta_0] tvg ewi-aor-r- 4.00m /dev/mapper/mpathb(0)
      [lvr1_rmeta_1] tvg ewi-aor--- 4.00m /dev/sdg(0)

      [root@localhost ~]# lvconvert -m 0 tvg/lvr1
      Are you sure you want to convert raid1 LV tvg/lvr1 to type linear losing all resilience? [y/n]: y
      Logical volume tvg/lvr1 successfully converted.

      [root@localhost ~]# lvs -ao +devices
      LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
      ..
      lvr1 tvg wi-ao--- 500.00m /dev/mapper/mpathb(1)

      Version-Release number of selected component (if applicable):
      Reported on lvm2-2.02.187-6.el7
      But happens in all releases.

      How reproducible:
      100%

      Steps to Reproduce:
      1. vgcreate tvg /dev/sdd /dev/sdg
      2. lvcreate --type raid1 --mirrors 1 --name lvr1 -l 100%FREE tvg
      3. mkfs.xfs /dev/tvg/lvr1
      4. mount /dev/tvg/lvr1 /mnt
      lvs -ao +devices
      LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
      lvr1 tvg rwi-aor--- 1016.00m 100.00 lvr1_rimage_0(0),lvr1_rimage_1(0)
      [lvr1_rimage_0] tvg iwi-aor--- 1016.00m /dev/sdd(1)
      [lvr1_rimage_1] tvg iwi-aor--- 1016.00m /dev/sdg(1)
      [lvr1_rmeta_0] tvg ewi-aor--- 4.00m /dev/sdd(0)
      [lvr1_rmeta_1] tvg ewi-aor--- 4.00m /dev/sdg(0)
      5. echo 1 > /sys/block/sdd/device/delete
      6. dd if=/dev/zero of=/mnt/tf1 bs=1M count=100 oflag=direct
      7. echo "0 0 2" > /sys/class/scsi_host/host0/scan

      lvr1 tvg rwi-aor-r- 1016.00m 100.00 lvr1_rimage_0(0),lvr1_rimage_1(0)
      [lvr1_rimage_0] tvg Iwi-aor-r- 1016.00m /dev/sda(1)
      [lvr1_rimage_1] tvg iwi-aor--- 1016.00m /dev/sdg(1)
      [lvr1_rmeta_0] tvg ewi-aor-r- 4.00m /dev/sda(0)
      [lvr1_rmeta_1] tvg ewi-aor--- 4.00m /dev/sdg(0)

      8. lvconvert -m 0 tvg/lvr1

      lvr1 tvg wi-ao--- 1016.00m /dev/sda(1)

      Actual results:

      The out-of_sync raid rimage is used to build a linear volume.

      Expected results:

      The command fails or the in sync rimage is used.

      Additional info:

      If I run lvconvert against a resyncing raid1 volume it fails, The same should happen in this case or the in-sync rimage used.

      The out-of-sync rimage being used caused a customer to lose 2 hours of prod data. If the command failed they could have recovered raid1 before converting!

      — Additional comment from Maria on 2023-05-03 13:25:51 UTC —

      Hello Team

      Any updates on this BZ?

      Thank you

      — Additional comment from Heinz Mauelshagen on 2023-05-05 18:55:03 UTC —

      An out-of-sync RAID leg results from a device failure.

      If it's temporary and the device recurs, run "lvchange --refresh $RaidLV" to cause it to be resynchronized.
      Once that's finished, run "lvconvert -m0 $RaidLV" again to downgrade the raid1 to linear.

      Commit d7e922480e04ecfb7c4d8b2d42533699ddef5c34

      — Additional comment from Heinz Mauelshagen on 2023-05-05 18:55:27 UTC —

      An out-of-sync RAID leg results from a device failure.

      If it's temporary and the device recurs, run "lvchange --refresh $RaidLV" to cause it to be resynchronized.
      Once that's finished, run "lvconvert -m0 $RaidLV" again to downgrade the raid1 to linear.

      Commit d7e922480e04ecfb7c4d8b2d42533699ddef5c34

      — Additional comment from Lance Digby on 2023-05-08 02:23:50 UTC —

      Heinz,

      The issue is the lvconvert when run against an out of sync raid1 takes the rimage_0 disk even if it is the out of sync leg! No warning no request for confirmation!

      This is a bug if only one of the two legs is in sync it should use the in sync leg!
      Any code that can use the out of sync leg should fail or write a warning and require a force argument which is the standard on most the other LVM commands!

              mcsontos@redhat.com Marian Csontos
              rhn-support-ldigby Lance Digby
              Heinz Mauelshagen Heinz Mauelshagen
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated: