Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-8334

multisegment RAID1, allocator uses one disk for both legs

    • lvm2-2.03.14-14.el8
    • None
    • Important
    • rhel-sst-logical-storage
    • ssg_filesystems_storage_and_HA
    • 22
    • 24
    • 3
    • QE ack, Dev ack
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Bug Fix
    • Hide
      Cause (the user action or circumstances that trigger the bug):
      lvcreate a raid LV specifying size with `-l` using `%FREE` could result in creating a RAID without redundancy guarantee (2 or more legs on single PV)

      Consequence (what the user experience is when the bug occurs):
      Losing single device may cause a loss of data even though RAID should provide more resilience

      Fix (what has changed to fix the bug; do not include overly technical details):
      When allocating RAID LV the redundancy is checked and command fails when constraints are not met.

      Result (what happens now that the patch is applied):
      lvcreate may fail where it pass before. This is not a regression but an important fix. In case where it would pass the RAID would be non-redundant.

      This is a bug fix, which may cause lvcreate/lvextend fail, where these would pass previously. This additional limitation was introduced to prevent data loss, as the effect is similar as using --alloc anywhere and resulting in RAID LV without redundancy.
      Show
      Cause (the user action or circumstances that trigger the bug): lvcreate a raid LV specifying size with `-l` using `%FREE` could result in creating a RAID without redundancy guarantee (2 or more legs on single PV) Consequence (what the user experience is when the bug occurs): Losing single device may cause a loss of data even though RAID should provide more resilience Fix (what has changed to fix the bug; do not include overly technical details): When allocating RAID LV the redundancy is checked and command fails when constraints are not met. Result (what happens now that the patch is applied): lvcreate may fail where it pass before. This is not a regression but an important fix. In case where it would pass the RAID would be non-redundant. This is a bug fix, which may cause lvcreate/lvextend fail, where these would pass previously. This additional limitation was introduced to prevent data loss, as the effect is similar as using --alloc anywhere and resulting in RAID LV without redundancy.
    • None

      +++ This bug was initially created as a clone of Bug #1518121 +++

      Description of problem:
      When creating RAID1 with 2 legs spanning multiple disks, allocator uses one of the disks for both legs.

      Version-Release number of selected component (if applicable):
      2.02.176

      Affected versions: el7.2, el7.5, not checked other...

      How reproducible:
      100%

      Steps to Reproduce:

      • having 3 8GB disks /dev/sd[abc]

      vgcreate vg /dev/sd[abc]
      lvcreate -n t1 -L 4G vg /dev/sda
      lvcreate -n t2 -L 4G vg /dev/sdb
      lvcreate -n r1 -m 1 -L 6G vg
      lvs -aoname,devices

      Actual results:

      1. lvs -aoname,devices
        LV Devices
        r1 r1_rimage_0(0),r1_rimage_1(0)
        [r1_rimage_0] /dev/sdc(1) <---- sdc is used for both _rimage_0...
        [r1_rimage_0] /dev/sdb(1024)
        [r1_rimage_1] /dev/sda(1025)
        [r1_rimage_1] /dev/sdc(1023) <---- ...as well as for _rimage_1
        [r1_rmeta_0] /dev/sdc(0)
        [r1_rmeta_1] /dev/sda(1024)
        t1 /dev/sda(0)
        t2 /dev/sdb(0)

      Expected results:

      Additional info:

      — Additional comment from Marian Csontos on 2017-11-28 12:53:58 UTC —

      In case of failure such device can not be repaired:

      1. lvconvert --repair vg/r1
        WARNING: Disabling lvmetad cache for repair command.
        WARNING: Not using lvmetad because of repair.
        /dev/vg/r1: read failed after 0 of 4096 at 6442385408: Input/output error
        /dev/vg/r1: read failed after 0 of 4096 at 6442442752: Input/output error
        Couldn't find device with uuid UFK7K0-nGPE-76Rq-F5WC-xGig-UzXP-MnFuDz.
        Attempt to replace failed RAID images (requires full device resync)? [y/n]: y
        Unable to replace all PVs from vg/r1 at once.
        Failed to replace faulty devices in vg/r1.

      TODO: Test with devices in sync.

      Workaround: `lvcreate -n r1 -m 1 -L 6G vg /dev/sd[cab]`

      — Additional comment from Marian Csontos on 2017-11-28 14:23:30 UTC —

      Waited for sync. 100% in sync unable to repair. Also it is possible to read only first 4 GBs, reading from the second segment fails.

      Read from first segment:

      1. dd if=/dev/vg/r1 of=/dev/null skip=4000 count=1 bs=1M
        1+0 records in
        1+0 records out
        1048576 bytes (1.0 MB) copied, 0.00195199 s, 537 MB/s

      Read from second segment fails:

      1. dd if=/dev/vg/r1 of=/dev/null skip=5000 count=1 bs=1M
        dd: error reading ‘/dev/vg/r1’: Input/output error
        0+0 records in
        0+0 records out
        0 bytes (0 B) copied, 0.00201181 s, 0.0 kB/s

      Surprisingly the RAID1 device sanity reports DA:

      1. dmsetup status
        vg-r1_rmeta_1: 0 8192 linear
        vg-r1_rimage_1: 0 8372224 linear
        vg-r1_rimage_1: 8372224 4210688 linear
        vg-t2: 0 8388608 linear
        vg-r1_rmeta_0: 0 8192 linear
        vg-r1_rimage_0: 0 8372224 linear
        vg-r1_rimage_0: 8372224 4210688 linear
        vg-t1: 0 8388608 linear
        vg_stacker_OTzj-root: 0 12582912 linear
        vg-r1: 0 12582912 raid raid1 2 DA 12582912/12582912 idle 0 0 -

      And for reference only the segments:

      1. lvs --segments -aolv_name,pe_ranges,le_ranges
        WARNING: Not using lvmetad because a repair command was run.
        /dev/vg/r1: read failed after 0 of 4096 at 6442385408: Input/output error
        /dev/vg/r1: read failed after 0 of 4096 at 6442442752: Input/output error
        Couldn't find device with uuid jDRXQI-jGSW-BAOG-LB3h-aLhd-8fPb-kHbPZf.
        LV PE Ranges LE Ranges
        r1 r1_rimage_0:0-1535 r1_rimage_1:0-1535 [r1_rimage_0]:0-1535,[r1_rimage_1]:0-1535
        [r1_rimage_0] [unknown]:1-1022 [unknown]:1-1022
        [r1_rimage_0] /dev/sdb:1024-1537 /dev/sdb:1024-1537
        [r1_rimage_1] /dev/sda:1025-2046 /dev/sda:1025-2046
        [r1_rimage_1] [unknown]:1023-1536 [unknown]:1023-1536
        [r1_rmeta_0] [unknown]:0-0 [unknown]:0-0
        [r1_rmeta_1] /dev/sda:1024-1024 /dev/sda:1024-1024

      — Additional comment from Steve D on 2018-10-18 12:12:26 UTC —

      I've just been bitten by this for second time, though I swore I specified PVs manually. 2.02.176 (-4.1ubuntu3) on Ubuntu 18.04.

      I also hit a whole load of scrub errors last night - the data stored in the fs seems fine, it looks like when I extended the RAID1 LV in question, the extensions didn't get synced. Still investigating that one - may have been triggered by me trying to work around this bug.

      Any thoughts / progress?

      — Additional comment from Heinz Mauelshagen on 2019-09-05 11:29:03 UTC —

      This behaves like allocation policy 'anywhere' was applied.
      Reworking the allocator gains importance!

      Creating with 'cling' allocation policy avoids the problem with existing t[12] LVs allocated on sd[ab] and free sdc:

      1. lvcreate -y -nr1 -l251 -m1 vg
        Logical volume "r1" created.
      1. lvs --noh -aoname,segperanges vg
        r1 r1_rimage_0:0-250 r1_rimage_1:0-250
        [r1_rimage_0] /dev/sdc:1-126 <---
        [r1_rimage_0] /dev/sdb:128-252
        [r1_rimage_1] /dev/sda:129-254
        [r1_rimage_1] /dev/sdc:127-251 <--- bogus collocation with rimage_0
        [r1_rmeta_0] /dev/sdc:0-0
        [r1_rmeta_1] /dev/sda:128-128
        t1 /dev/sda:0-127
        t2 /dev/sdb:0-127
      1. lvremove -y vg/r1
        Logical volume "r1" successfully removed
      1. lvcreate -y --alloc cling -nr1 -l100%FREE -m1 vg # could alternatively give the extent count, same result
        Logical volume "r1" created.
      1. lvs --noh -aoname,segperanges vg
        r1 r1_rimage_0:0-125 r1_rimage_1:0-125
        [r1_rimage_0] /dev/sdc:1-126
        [r1_rimage_1] /dev/sda:129-254
        [r1_rmeta_0] /dev/sdc:0-0
        [r1_rmeta_1] /dev/sda:128-128
        t1 /dev/sda:0-127
        t2 /dev/sdb:0-127
      1. lvextend -l+1 vg/r1
        Extending 2 mirror images.
        Insufficient suitable allocatable extents for logical volume r1: 2 more required

      Now alloc_anywhere is needed to make use of the existing space and listing sdc, sdb in that
      sequence (sdb, sdc does not work) to avoid collocation:

      1. lvextend -l+1 vg/r1
        Extending 2 mirror images.
        Insufficient suitable allocatable extents for logical volume r1: 2 more required
      1. lvextend -y --alloc anywhere -l+1 vg/r1
        Extending 2 mirror images.
        Size of logical volume vg/r1 changed from 504.00 MiB (126 extents) to 508.00 MiB (127 extents).
        Logical volume vg/r1 successfully resized.
      1. lvs --noh -aoname,segperanges vg
        r1 r1_rimage_0:0-126 r1_rimage_1:0-126
        [r1_rimage_0] /dev/sdc:1-126 <---
        [r1_rimage_0] /dev/sdb:128-128
        [r1_rimage_1] /dev/sda:129-254
        [r1_rimage_1] /dev/sdc:127-127 <--- Bogus collocation again
        [r1_rmeta_0] /dev/sdc:0-0
        [r1_rmeta_1] /dev/sda:128-128
        t1 /dev/sda:0-127

      In this case, reducing the raid1 in size or pvmove'ing the collocated extents off to another unrelated PV are options.

      1. lvreduce -fy -l-1 vg/r1 WARNING: Reducing active logical volume to 504.00 MiB.
        THIS MAY DESTROY YOUR DATA (filesystem etc.)
        Size of logical volume vg/r1 changed from 508.00 MiB (127 extents) to 504.00 MiB (126 extents).
        Logical volume vg/r1 successfully resized.
      1. lvextend -y -l+1 --alloc anywhere vg/r1 /dev/sdc /dev/sdb
        Extending 2 mirror images.
        Size of logical volume vg/r1 changed from 504.00 MiB (126 extents) to 508.00 MiB (127 extents).
        Logical volume vg/r1 successfully resized.
      1. lvs --noh -aoname,segperanges vg
        r1 r1_rimage_0:0-126 r1_rimage_1:0-126
        [r1_rimage_0] /dev/sdc:1-127
        [r1_rimage_1] /dev/sda:129-254
        [r1_rimage_1] /dev/sdb:128-128
        [r1_rmeta_0] /dev/sdc:0-0
        [r1_rmeta_1] /dev/sda:128-128
        t1 /dev/sda:0-127
        t2 /dev/sdb:0-127

      — Additional comment from Heinz Mauelshagen on 2023-05-10 16:36:36 UTC —

      Fixed in commit 05c2b10c5d0a99993430ffbcef684a099ba810ad

              mcsontos@redhat.com Marian Csontos
              mcsontos@redhat.com Marian Csontos
              lvm-team lvm-team
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: