Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-8287

"Data alignment must not exceed device size." failure when attempting to stack PVs on small virt volumes backed by much larger storage

    • None
    • Low
    • rhel-sst-logical-storage
    • ssg_filesystems_storage_and_HA
    • 5
    • Hide

      It might be relatively hard to decide to select best fallback values and get them through every corner case.

      Show
      It might be relatively hard to decide to select best fallback values and get them through every corner case.
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None

      +++ This bug was initially created as a clone of Bug #1713820 +++

      Description of problem:
      The backing storage that make up this pool volume are 2T PVs. If I create 200M+ virt volumes and stack PVs on those, it works fine, however, virt volumes <200M fail with this error. This also happens on rhel7.7.

      [root@hayes-01 ~]# lvs -a -o +devices
      LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
      POOL snapper_thinp twi-aot--- <8.64t 0.01 12.21 POOL_tdata(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sde1(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdi1(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdf1(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdj1(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdg1(2)
      [POOL_tmeta] snapper_thinp ewi-ao---- 4.00m /dev/sdg1(1)
      PV1 snapper_thinp Vwi-a-t--- 220.00m POOL 50.00
      PV2 snapper_thinp Vwi-a-t--- 456.00m POOL 25.00
      PV3 snapper_thinp Vwi-a-t--- 248.00m POOL 50.00
      PV4 snapper_thinp Vwi-a-t--- 80.00m POOL 0.00
      [lvol0_pmspare] snapper_thinp ewi------- 4.00m /dev/sdg1(0)
      origin snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other1 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other2 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other3 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other4 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other5 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00

      [root@hayes-01 ~]# pvscan --config devices/scan_lvs=1
      PV /dev/sdg1 VG snapper_thinp lvm2 [<1.82 TiB / 465.62 GiB free]
      PV /dev/sde1 VG snapper_thinp lvm2 [<1.82 TiB / 0 free]
      PV /dev/sdi1 VG snapper_thinp lvm2 [<1.82 TiB / 0 free]
      PV /dev/sdf1 VG snapper_thinp lvm2 [<1.82 TiB / 0 free]
      PV /dev/sdj1 VG snapper_thinp lvm2 [<1.82 TiB / 0 free]
      PV /dev/snapper_thinp/PV1 lvm2 [220.00 MiB]
      PV /dev/snapper_thinp/PV2 lvm2 [456.00 MiB]
      PV /dev/snapper_thinp/PV3 lvm2 [248.00 MiB]
      Total: 8 [<9.10 TiB] / in use: 5 [9.09 TiB] / in no VG: 3 [924.00 MiB]

      [root@hayes-01 ~]# pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV4
      /dev/snapper_thinp/PV4: Data alignment must not exceed device size.
      Format-specific initialisation of physical volume /dev/snapper_thinp/PV4 failed.
      Failed to setup physical volume "/dev/snapper_thinp/PV4".

      Version-Release number of selected component (if applicable):
      4.18.0-80.el8.x86_64

      kernel-4.18.0-80.el8 BUILT: Wed Mar 13 07:47:44 CDT 2019
      lvm2-2.03.02-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019
      lvm2-libs-2.03.02-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019
      lvm2-dbusd-2.03.02-6.el8 BUILT: Fri Feb 22 04:50:28 CST 2019
      device-mapper-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019
      device-mapper-libs-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019
      device-mapper-event-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019
      device-mapper-event-libs-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019
      device-mapper-persistent-data-0.7.6-1.el8 BUILT: Sun Aug 12 04:21:55 CDT 2018

      How reproducible:
      Everytime

      — Additional comment from Corey Marthaler on 2019-05-25 00:14:12 UTC —

      — Additional comment from David Teigland on 2019-09-24 18:01:30 UTC —

      The kernel is reporting a strange value for optimimal_io_size:

      #device/dev-type.c:904 Device /dev/snapper_thinp/PV4: queue/minimum_io_size is 262144 bytes.
      #device/dev-type.c:904 Device /dev/snapper_thinp/PV4: queue/optimal_io_size is 144965632 bytes.

      When I run this I see a more ordinary looking value:

      12:58:48.775710 pvcreate[26231] device/dev-type.c:979 Device /dev/foo/thin1: queue/minimum_io_size is 65536 bytes.
      12:58:48.775765 pvcreate[26231] device/dev-type.c:979 Device /dev/foo/thin1: queue/optimal_io_size is 65536 bytes.

      Mike, do you recall any recent dm-thin patches related to this?

      — Additional comment from Mike Snitzer on 2019-09-25 14:23:51 UTC —

      (In reply to David Teigland from comment #2)
      > The kernel is reporting a strange value for optimimal_io_size:
      >
      > #device/dev-type.c:904 Device /dev/snapper_thinp/PV4:
      > queue/minimum_io_size is 262144 bytes.
      > #device/dev-type.c:904 Device /dev/snapper_thinp/PV4:
      > queue/optimal_io_size is 144965632 bytes.
      >
      > When I run this I see a more ordinary looking value:
      >
      > 12:58:48.775710 pvcreate[26231] device/dev-type.c:979 Device
      > /dev/foo/thin1: queue/minimum_io_size is 65536 bytes.
      > 12:58:48.775765 pvcreate[26231] device/dev-type.c:979 Device
      > /dev/foo/thin1: queue/optimal_io_size is 65536 bytes.
      >
      > Mike, do you recall any recent dm-thin patches related to this?

      No, but that doesn't mean there isn't something recent (or not). Something has to explain this...

      Certainly weird.

      — Additional comment from David Teigland on 2019-09-25 14:31:00 UTC —

      Corey, could you cat /sys/block/dm-<minor>/queue/optimal_io_size which corresponds to /dev/snapper_thinp/PV4? Also, could you cat the same value for each of the PVs in that VG (sdg1-sdj1)? This should confirm if it's a kernel issue or userspace.

      — Additional comment from Corey Marthaler on 2019-10-22 16:01:12 UTC —

      kernel-4.18.0-147.8.el8 BUILT: Thu Oct 17 19:20:05 CDT 2019
      lvm2-2.03.05-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      lvm2-libs-2.03.05-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      lvm2-dbusd-2.03.05-5.el8 BUILT: Thu Sep 26 01:43:33 CDT 2019
      device-mapper-1.02.163-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      device-mapper-libs-1.02.163-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      device-mapper-event-1.02.163-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      device-mapper-event-libs-1.02.163-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      device-mapper-persistent-data-0.8.5-2.el8 BUILT: Wed Jun 5 10:28:04 CDT 2019

      [root@hayes-01 ~]# lvs -a -o +devices
      LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
      POOL snapper_thinp twi-aot--- <8.64t 0.01 12.30 POOL_tdata(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdg1(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdh1(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdi1(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdf1(0)
      [POOL_tdata] snapper_thinp Twi-ao---- <8.64t /dev/sdk1(2)
      [POOL_tmeta] snapper_thinp ewi-ao---- 4.00m /dev/sdk1(1)
      PV1 snapper_thinp Vwi-a-t--- 284.00m POOL 33.33
      PV2 snapper_thinp Vwi-a-t--- 344.00m POOL 33.33
      PV3 snapper_thinp Vwi-a-t--- 188.00m POOL 50.00
      PV4 snapper_thinp Vwi-a-t--- 308.00m POOL 33.33
      PV5 snapper_thinp Vwi-a-t--- 324.00m POOL 33.33
      PV6 snapper_thinp Vwi-a-t--- 76.00m POOL 0.00
      [lvol0_pmspare] snapper_thinp ewi------- 4.00m /dev/sdk1(0)
      origin snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other1 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other2 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other3 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other4 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      other5 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      [root@hayes-01 ~]# pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV6
      /dev/snapper_thinp/PV6: Data alignment must not exceed device size.
      Format-specific initialisation of physical volume /dev/snapper_thinp/PV6 failed.
      Failed to setup physical volume "/dev/snapper_thinp/PV6".

      [root@hayes-01 ~]# dmsetup ls
      snapper_thinp-PV4 (253:13)
      snapper_thinp-origin (253:4)
      snapper_thinp-PV3 (253:12)
      snapper_thinp-PV2 (253:11)
      snapper_thinp-PV1 (253:10)
      snapper_thinp-POOL (253:3)
      snapper_thinp-other5 (253:9)
      snapper_thinp-other4 (253:8)
      snapper_thinp-other3 (253:7)
      snapper_thinp-POOL-tpool (253:2)
      snapper_thinp-POOL_tdata (253:1)
      snapper_thinp-other2 (253:6)
      snapper_thinp-POOL_tmeta (253:0)
      snapper_thinp-other1 (253:5)
      snapper_thinp-PV6 (253:15)
      snapper_thinp-PV5 (253:14)

      1. /dev/snapper_thinp/PV6 (Failed)
        [root@hayes-01 ~]# cat /sys/block/dm-15/queue/optimal_io_size
        145752064
      1. /dev/snapper_thinp/PV6 (Passed)
        [root@hayes-01 ~]# cat /sys/block/dm-14/queue/optimal_io_size
        145752064
      1. actual PVs in snapper_thinp
        [root@hayes-01 ~]# cat /sys/block/sdg/queue/optimal_io_size
        0
        [root@hayes-01 ~]# cat /sys/block/sdh/queue/optimal_io_size
        0
        [root@hayes-01 ~]# cat /sys/block/sdi/queue/optimal_io_size
        0
        [root@hayes-01 ~]# cat /sys/block/sdf/queue/optimal_io_size
        0
        [root@hayes-01 ~]# cat /sys/block/sdk/queue/optimal_io_size
        0

      — Additional comment from Mike Snitzer on 2019-10-22 16:22:57 UTC —

      (In reply to Corey Marthaler from comment #5)
      > kernel-4.18.0-147.8.el8 BUILT: Thu Oct 17 19:20:05 CDT 2019
      > lvm2-2.03.05-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      > lvm2-libs-2.03.05-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      > lvm2-dbusd-2.03.05-5.el8 BUILT: Thu Sep 26 01:43:33 CDT 2019
      > device-mapper-1.02.163-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      > device-mapper-libs-1.02.163-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      > device-mapper-event-1.02.163-5.el8 BUILT: Thu Sep 26 01:40:57 CDT 2019
      > device-mapper-event-libs-1.02.163-5.el8 BUILT: Thu Sep 26 01:40:57 CDT
      > 2019
      > device-mapper-persistent-data-0.8.5-2.el8 BUILT: Wed Jun 5 10:28:04 CDT
      > 2019
      >
      >
      >
      > [root@hayes-01 ~]# lvs -a -o +devices
      > LV VG Attr LSize Pool Origin Data% Meta%
      > Move Log Cpy%Sync Convert Devices
      > POOL snapper_thinp twi-aot--- <8.64t 0.01 12.30
      > POOL_tdata(0)
      > [POOL_tdata] snapper_thinp Twi-ao---- <8.64t
      > /dev/sdg1(0)
      > [POOL_tdata] snapper_thinp Twi-ao---- <8.64t
      > /dev/sdh1(0)
      > [POOL_tdata] snapper_thinp Twi-ao---- <8.64t
      > /dev/sdi1(0)
      > [POOL_tdata] snapper_thinp Twi-ao---- <8.64t
      > /dev/sdf1(0)
      > [POOL_tdata] snapper_thinp Twi-ao---- <8.64t
      > /dev/sdk1(2)
      > [POOL_tmeta] snapper_thinp ewi-ao---- 4.00m
      > /dev/sdk1(1)
      > PV1 snapper_thinp Vwi-a-t--- 284.00m POOL 33.33
      >
      > PV2 snapper_thinp Vwi-a-t--- 344.00m POOL 33.33
      >
      > PV3 snapper_thinp Vwi-a-t--- 188.00m POOL 50.00
      >
      > PV4 snapper_thinp Vwi-a-t--- 308.00m POOL 33.33
      >
      > PV5 snapper_thinp Vwi-a-t--- 324.00m POOL 33.33
      >
      > PV6 snapper_thinp Vwi-a-t--- 76.00m POOL 0.00
      >
      > [lvol0_pmspare] snapper_thinp ewi------- 4.00m
      > /dev/sdk1(0)
      > origin snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      >
      > other1 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      >
      > other2 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      >
      > other3 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      >
      > other4 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      >
      > other5 snapper_thinp Vwi-a-t--- 1.00g POOL 0.00
      >
      > [root@hayes-01 ~]# pvcreate --config devices/scan_lvs=1
      > /dev/snapper_thinp/PV6
      >
      > /dev/snapper_thinp/PV6: Data alignment must not exceed device size.
      >
      > Format-specific initialisation of physical volume /dev/snapper_thinp/PV6
      > failed.
      >
      > Failed to setup physical volume "/dev/snapper_thinp/PV6".
      >
      >
      >
      >
      >
      > [root@hayes-01 ~]# dmsetup ls
      >
      > snapper_thinp-PV4 (253:13)
      >
      > snapper_thinp-origin (253:4)
      >
      > snapper_thinp-PV3 (253:12)
      >
      > snapper_thinp-PV2 (253:11)
      >
      > snapper_thinp-PV1 (253:10)
      >
      > snapper_thinp-POOL (253:3)
      >
      > snapper_thinp-other5 (253:9)
      >
      > snapper_thinp-other4 (253:8)
      >
      > snapper_thinp-other3 (253:7)
      >
      > snapper_thinp-POOL-tpool (253:2)
      >
      > snapper_thinp-POOL_tdata (253:1)
      >
      > snapper_thinp-other2 (253:6)
      >
      > snapper_thinp-POOL_tmeta (253:0)
      >
      > snapper_thinp-other1 (253:5)
      >
      > snapper_thinp-PV6 (253:15)
      >
      > snapper_thinp-PV5 (253:14)
      >
      > # /dev/snapper_thinp/PV6 (Failed)
      > [root@hayes-01 ~]# cat /sys/block/dm-15/queue/optimal_io_size
      > 145752064
      >
      > # /dev/snapper_thinp/PV6 (Passed)
      > [root@hayes-01 ~]# cat /sys/block/dm-14/queue/optimal_io_size
      > 145752064

      You meant PV5 (dm-14) Passed.

      > # actual PVs in snapper_thinp
      > [root@hayes-01 ~]# cat /sys/block/sdg/queue/optimal_io_size
      > 0
      > [root@hayes-01 ~]# cat /sys/block/sdh/queue/optimal_io_size
      > 0
      > [root@hayes-01 ~]# cat /sys/block/sdi/queue/optimal_io_size
      > 0
      > [root@hayes-01 ~]# cat /sys/block/sdf/queue/optimal_io_size
      > 0
      > [root@hayes-01 ~]# cat /sys/block/sdk/queue/optimal_io_size
      > 0

      Can you provide the DM table (dmsetup table) output for PV5 and PV6? Just want to make sure we know all the devices they are layering upon.

      DM thinp will establish an optimal_io_size that matches the thin-pool chunksize. So really what 145752064 implies is you've used a really large thin-pool block size right?

      Think the 145752064 is in bytes, so 139MB thin-pool chunksize? Seems weird...

      How did you create the thin-pool?

      — Additional comment from David Teigland on 2019-10-22 16:31:06 UTC —

      PV6 is 76MB, and the optimal_io_size is 139MB, so the device is smaller than the device's optimal_io_size. (Probably not a very realistic config outside of testing.)

      By default, lvm aligns data according to optimal_io_size, which won't work with those values, so I think the pvcreate failure is reasonable. You could disable pvcreate's alignment logic for this unusual config with devices/data_alignment_detection=0. That will likely allow the pvcreate to work.

      — Additional comment from Corey Marthaler on 2019-10-22 16:44:20 UTC —

      Creation steps:

      vgcreate snapper_thinp /dev/sdk1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdf1
      lvcreate --thinpool POOL -l95%FREE --zero n --poolmetadatasize 4M snapper_thinp

      lvcreate --virtualsize 1G -T snapper_thinp/POOL -n origin
      lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other1
      lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other2
      lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other3
      lvcreate -V 1G -T snapper_thinp/POOL -n other4
      lvcreate -V 1G -T snapper_thinp/POOL -n other5

      lvcreate -V 284M -T snapper_thinp/POOL -n PV1
      pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV1
      lvcreate -V 344M -T snapper_thinp/POOL -n PV2
      pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV2
      lvcreate -V 188M -T snapper_thinp/POOL -n PV3
      pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV3
      lvcreate -V 305M -T snapper_thinp/POOL -n PV4
      pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV4
      lvcreate -V 322M -T snapper_thinp/POOL -n PV5
      pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV5
      lvcreate -V 74M -T snapper_thinp/POOL -n PV6
      pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV6
      /dev/snapper_thinp/PV6: Data alignment must not exceed device size.
      Format-specific initialisation of physical volume /dev/snapper_thinp/PV6 failed.
      Failed to setup physical volume "/dev/snapper_thinp/PV6".

      [root@hayes-01 ~]# dmsetup table
      snapper_thinp-PV4: 0 630784 thin 253:2 10
      snapper_thinp-origin: 0 2097152 thin 253:2 1
      snapper_thinp-PV3: 0 385024 thin 253:2 9
      snapper_thinp-PV2: 0 704512 thin 253:2 8
      snapper_thinp-PV1: 0 581632 thin 253:2 7
      snapper_thinp-POOL: 0 18553085952 linear 253:2 0
      snapper_thinp-other5: 0 2097152 thin 253:2 6
      snapper_thinp-other4: 0 2097152 thin 253:2 5
      snapper_thinp-other3: 0 2097152 thin 253:2 4
      snapper_thinp-POOL-tpool: 0 18553085952 thin-pool 253:0 253:1 284672 0 1 skip_block_zeroing
      snapper_thinp-POOL_tdata: 0 3905937408 linear 8:97 2048
      snapper_thinp-POOL_tdata: 3905937408 3905937408 linear 8:113 2048
      snapper_thinp-POOL_tdata: 7811874816 3905937408 linear 8:129 2048
      snapper_thinp-POOL_tdata: 11717812224 3905937408 linear 8:81 2048
      snapper_thinp-POOL_tdata: 15623749632 2929336320 linear 8:161 18432
      snapper_thinp-other2: 0 2097152 thin 253:2 3
      snapper_thinp-POOL_tmeta: 0 8192 linear 8:161 10240
      snapper_thinp-other1: 0 2097152 thin 253:2 2
      snapper_thinp-PV6: 0 155648 thin 253:2 12
      snapper_thinp-PV5: 0 663552 thin 253:2 11

      — Additional comment from Mike Snitzer on 2019-10-22 17:37:57 UTC —

      (In reply to Corey Marthaler from comment #8)
      > Creation steps:
      >
      > vgcreate snapper_thinp /dev/sdk1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdf1
      > lvcreate --thinpool POOL -l95%FREE --zero n --poolmetadatasize 4M
      > snapper_thinp
      >
      ...
      > lvcreate -V 322M -T snapper_thinp/POOL -n PV5
      > pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV5
      > lvcreate -V 74M -T snapper_thinp/POOL -n PV6
      > pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV6
      > /dev/snapper_thinp/PV6: Data alignment must not exceed device size.
      > Format-specific initialisation of physical volume /dev/snapper_thinp/PV6
      > failed.
      > Failed to setup physical volume "/dev/snapper_thinp/PV6".
      >
      >
      > [root@hayes-01 ~]# dmsetup table
      ...
      > snapper_thinp-POOL-tpool: 0 18553085952 thin-pool 253:0 253:1 284672 0 1
      > skip_block_zeroing

      OK, so the underlying LV (PV6) is only 74M but the lvm2 chosen thin-pool blocksize of 139M (284672 * 512 = 145752064 = optimal_io_size) is larger. Hence the error.

      This command is really the one that should've failed: lvcreate -V 74M -T snapper_thinp/POOL -n PV6
      Unless others can see why it makes sense for a logical address space to not even be able to allocate a single block from the underlying thin-pool?

      — Additional comment from Zdenek Kabelac on 2019-10-22 20:10:21 UTC —

      When the chunk size is not specified - and huge sizes of thin-pool 'dataLV' are used - then lvm2 tries to be conforming with
      the requirement that metadata should not exceed 128MiB.

      In this case --poolmetadatasize seems to be even futher restricted by 4MiB during thin-pool creation.

      To meet the size limitation - lvm2 needs to 'scale-up' chunk-size - that's likely how 139M chunk size gets created.

      If user wants to any small chunk size - he can always 'set' his preferred default into lvm.conf
      or simply set the '-c' during thin-pool creation - metadata size will grow accordingly.

      The question is - how 'reasonable' is to expose option_io_size matching chunk size of huge dimensions - when it doesn't seem to have any influence on performance once we are probably beyond 512KiB anyway ?

      Thin-pool chunk size can go upto 2GiB - so reporting this as 'optimal' maybe doesn't look as best approach ?

      — Additional comment from Mike Snitzer on 2019-10-23 13:46:42 UTC —

      (In reply to Zdenek Kabelac from comment #10)
      > When the chunk size is not specified - and huge sizes of thin-pool 'dataLV'
      > are used - then lvm2 tries to be conforming with
      > the requirement that metadata should not exceed 128MiB.
      >
      > In this case --poolmetadatasize seems to be even futher restricted by 4MiB
      > during thin-pool creation.
      >
      > To meet the size limitation - lvm2 needs to 'scale-up' chunk-size - that's
      > likely how 139M chunk size gets created.
      >
      > If user wants to any small chunk size - he can always 'set' his preferred
      > default into lvm.conf
      > or simply set the '-c' during thin-pool creation - metadata size will
      > grow accordingly.
      >
      > The question is - how 'reasonable' is to expose option_io_size matching
      > chunk size of huge dimensions - when it doesn't seem to have any influence
      > on performance once we are probably beyond 512KiB anyway ?
      >
      > Thin-pool chunk size can go upto 2GiB - so reporting this as 'optimal' maybe
      > doesn't look as best approach ?

      Your concern about setting such a large optimal_io_size is valid. I'll think about it. We really only have the single optimal_io_size hint to convey anything useful in terms of block limits. While it maay seem foolish to say "2GB is optimal" it does convey that "this thinp volume's granularity of allocation is 2GB". shrug

      BUT, that is really a tangential concern. IMHO the larger problem is we're allowing the creation of a thin LV whose logical address space is smaller than a single thin-pool chunk.
      I suppose thinp will accommodate it, by simply aallocating a block from the pool and only partially using it, but in general I'm missing: why we want to support this edge case?
      Why not require the logical size to be at least as large as a single thin-pool block? And preferably a multiple of the thin-pool blocksize.

      — Additional comment from Zdenek Kabelac on 2019-10-23 13:59:23 UTC —

      With regards to the size of thinLV - we are ATM more or less limited to the sizes expressible as a multiple of extent_size - which is more strict then block/chunk_size of a thin-pool.

      Personally I'd like to see some way how to actually completely 'free' relation between Virtually sized LV (like thinLV is) and Logically sized LV - ATM we have those two 32bit numbers where number of extent times size of an extent gives the size of LV.

      From users POV - the size of LV should not be probably 'enforcing' users to create LVs them might not 'fit' the need - i.e. if I create LV bigger then I want - i.e. I may cause automatic resize - so there can be cases where at least existing 'extent-size' based allocation might be wanted. But maybe good enough service would be to give 'prompt' to user if he really wants to make an LV smaller then a single chunk of thin-pool and losing rest of chunk. i.e. User may provision space by some metric - and even if he has a 'pool' underneath using bigger chunks - it might be important to provide i.e. per MiB size increments granularity...

      As for optimal-io size - there is also probably impact from 'zeroing' - without zeroing I'd think the optimal-io size can likely by significantly smaller somehow more closely matching _tdata geometry ?

      — Additional comment from Mike Snitzer on 2019-10-23 14:40:56 UTC —

      (In reply to Zdenek Kabelac from comment #12)
      > With regards to the size of thinLV - we are ATM more or less limited to the
      > sizes expressible as a multiple of extent_size - which is more strict then
      > block/chunk_size of a thin-pool.
      >
      > Personally I'd like to see some way how to actually completely 'free'
      > relation between Virtually sized LV (like thinLV is) and Logically sized LV
      > - ATM we have those two 32bit numbers where number of extent times size of
      > an extent gives the size of LV.

      The thin-pool blocksize should be a factor of the extent size (or vice-versa).

      > From users POV - the size of LV should not be probably 'enforcing' users to
      > create LVs them might not 'fit' the need - i.e. if I create LV bigger then
      > I want - i.e. I may cause automatic resize - so there can be cases where at
      > least existing 'extent-size' based allocation might be wanted. But maybe
      > good enough service would be to give 'prompt' to user if he really wants to
      > make an LV smaller then a single chunk of thin-pool and losing rest of
      > chunk. i.e. User may provision space by some metric - and even if he has a
      > 'pool' underneath using bigger chunks - it might be important to provide
      > i.e. per MiB size increments granularity...

      Not seeing why we need to accommodate such an inefficient use of the underlying storage. If you take this to the logical conclusion: you're wasting space because it is completely inaccessible to the user.

      > As for optimal-io size - there is also probably impact from 'zeroing' -
      > without zeroing I'd think the optimal-io size can likely by significantly
      > smaller somehow more closely matching _tdata geometry ?

      I'm not aware of any practical use for tracking optimal_io_size other than what XFS does. It respects the hint when laying out its allocation groups (AGs). So minimum_io_size and optimal_io_size can convey raid stripping (they reflect chunksize and stripesize respectively)... giving upper layers the insight that data layout should be on a thinp blocksize boundary is useful in this context.

      — Additional comment from Zdenek Kabelac on 2019-10-23 14:59:50 UTC —

      (In reply to Mike Snitzer from comment #13)
      > (In reply to Zdenek Kabelac from comment #12)
      > > Personally I'd like to see some way how to actually completely 'free'
      > > relation between Virtually sized LV (like thinLV is) and Logically sized LV
      > > - ATM we have those two 32bit numbers where number of extent times size of
      > > an extent gives the size of LV.
      >
      > The thin-pool blocksize should be a factor of the extent size (or
      > vice-versa).

      We have users using quite huge 'extent_size' (i.e. even 4GiB) as these were 'smart advices' provide by google engine.

      On the other hand granularity of chunk-size is 64KiB multiple.

      So joining these two together will always lead to some 'corner' cases where some space will
      simply be wasted.

      >
      > > From users POV - the size of LV should not be probably 'enforcing' users to
      > > create LVs them might not 'fit' the need - i.e. if I create LV bigger then
      > > I want - i.e. I may cause automatic resize - so there can be cases where at
      > > least existing 'extent-size' based allocation might be wanted. But maybe
      > > good enough service would be to give 'prompt' to user if he really wants to
      > > make an LV smaller then a single chunk of thin-pool and losing rest of
      > > chunk. i.e. User may provision space by some metric - and even if he has a
      > > 'pool' underneath using bigger chunks - it might be important to provide
      > > i.e. per MiB size increments granularity...
      >
      > Not seeing why we need to accommodate such an inefficient use of the
      > underlying storage. If you take this to the logical conclusion: you're
      > wasting space because it is completely inaccessible to the user.

      The main case I had in mind is - user want to provide device with precise size X - as it might be required i.e. to match some particular 'image' size you download out of net.

      You may later use the 'hidden/lost' space by lvextend - but I'd probably not exclude the usage of 'smaller' LVs as there might be requirements to provide i.e.
      100MB LV - even if thin-pool is using 512MiB chunks.

      Telling users the LV is wasting 312MiB in a thin-pool is IMHO reasonable good info,
      and user may decided whether it's good or bad for him.

      (Of course using these huuuge chunk-size is probably corner case on its own....)

      > > As for optimal-io size - there is also probably impact from 'zeroing' -
      > > without zeroing I'd think the optimal-io size can likely by significantly
      > > smaller somehow more closely matching _tdata geometry ?
      >
      > I'm not aware of any practical use for tracking optimal_io_size other than
      > what XFS does. It respects the hint when laying out its allocation groups
      > (AGs). So minimum_io_size and optimal_io_size can convey raid stripping
      > (they reflect chunksize and stripesize respectively)... giving upper layers
      > the insight that data layout should be on a thinp blocksize boundary is
      > useful in this context.

      My 'interpretation' of optimal_io_size here would be - if I go with this size - I'm optimally using bandwith of provided storage - but using 1GiB optimal io size simple doesn't look like it will bring any extra benefit over using i.e. 1MiB (with zeroing disabled)

      Though I'm not really sure how 'widespread' usage of optimal_io_size is...

      — Additional comment from Mike Snitzer on 2019-10-23 18:12:39 UTC —

      (In reply to Zdenek Kabelac from comment #14)
      > (In reply to Mike Snitzer from comment #13)
      > > (In reply to Zdenek Kabelac from comment #12)
      > > > Personally I'd like to see some way how to actually completely 'free'
      > > > relation between Virtually sized LV (like thinLV is) and Logically sized LV
      > > > - ATM we have those two 32bit numbers where number of extent times size of
      > > > an extent gives the size of LV.
      > >
      > > The thin-pool blocksize should be a factor of the extent size (or
      > > vice-versa).
      >
      > We have users using quite huge 'extent_size' (i.e. even 4GiB) as these were
      > 'smart advices' provide by google engine.
      >
      > On the other hand granularity of chunk-size is 64KiB multiple.
      >
      > So joining these two together will always lead to some 'corner' cases where
      > some space will
      > simply be wasted.

      Not if those 2 variables are sized with awareness of the other. Which responsible users do, irresponsible users will rely on lvm2 to have sane defaults.

      > > > From users POV - the size of LV should not be probably 'enforcing' users to
      > > > create LVs them might not 'fit' the need - i.e. if I create LV bigger then
      > > > I want - i.e. I may cause automatic resize - so there can be cases where at
      > > > least existing 'extent-size' based allocation might be wanted. But maybe
      > > > good enough service would be to give 'prompt' to user if he really wants to
      > > > make an LV smaller then a single chunk of thin-pool and losing rest of
      > > > chunk. i.e. User may provision space by some metric - and even if he has a
      > > > 'pool' underneath using bigger chunks - it might be important to provide
      > > > i.e. per MiB size increments granularity...
      > >
      > > Not seeing why we need to accommodate such an inefficient use of the
      > > underlying storage. If you take this to the logical conclusion: you're
      > > wasting space because it is completely inaccessible to the user.
      >
      > The main case I had in mind is - user want to provide device with precise
      > size X - as it might be required i.e. to match some particular 'image' size
      > you download out of net.
      >
      > You may later use the 'hidden/lost' space by lvextend - but I'd probably not
      > exclude the usage of 'smaller' LVs as there might be requirements to
      > provide i.e.
      > 100MB LV - even if thin-pool is using 512MiB chunks.
      >
      > Telling users the LV is wasting 312MiB in a thin-pool is IMHO reasonable
      > good info,
      > and user may decided whether it's good or bad for him.
      >
      > (Of course using these huuuge chunk-size is probably corner case on its
      > own....)

      Fair enough, if you think there utility in it that's fine. I suppose having lvcreate warn would suffice.

      > > > As for optimal-io size - there is also probably impact from 'zeroing' -
      > > > without zeroing I'd think the optimal-io size can likely by significantly
      > > > smaller somehow more closely matching _tdata geometry ?
      > >
      > > I'm not aware of any practical use for tracking optimal_io_size other than
      > > what XFS does. It respects the hint when laying out its allocation groups
      > > (AGs). So minimum_io_size and optimal_io_size can convey raid stripping
      > > (they reflect chunksize and stripesize respectively)... giving upper layers
      > > the insight that data layout should be on a thinp blocksize boundary is
      > > useful in this context.
      >
      > My 'interpretation' of optimal_io_size here would be - if I go with this
      > size - I'm optimally using bandwith of provided storage - but using 1GiB
      > optimal io size simple doesn't look like it will bring any extra benefit
      > over using i.e. 1MiB (with zeroing disabled)
      >
      > Though I'm not really sure how 'widespread' usage of optimal_io_size is...

      optimal_io_size isn't purely about performance of an arbitrary single IO, it also serves as a useful indicator that being aligned on that boundary will yield better results.

      — Additional comment from RHEL Program Management on 2020-09-30 09:50:34 UTC —

      pm_ack is no longer used for this product. The flag has been reset.

      See https://issues.redhat.com/browse/PTT-1821 for additional details or contact lmiksik@redhat.com if you have any questions.

      — Additional comment from RHEL Program Management on 2021-01-02 07:52:26 UTC —

      30-day auto-close warning: This bz has been open for an extended time without being approved for a release (has a release+ or zstream+ flag) . Please consider prioritizing the work appropriately to get it approved for a release, or close the bz. Otherwise, if it is still open on the “Stale date”, it will close automatically (CLOSED WONTFIX).

      — Additional comment from RHEL Program Management on 2021-02-01 07:41:02 UTC —

      After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

      — Additional comment from Pavel Najman on 2021-09-07 11:50:15 UTC —

      Setting the new sst value

              zkabelac@redhat.com Zdenek Kabelac
              cmarthal@redhat.com Corey Marthaler
              Zdenek Kabelac Zdenek Kabelac
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: