Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-6984

Can not recover a host since disk layout recreation script fails

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • rhel-9.4
    • rhel-9.1.0
    • rear
    • rear-2.6-20.el9
    • None
    • Important
    • ZStream
    • rhel-sst-cs-system-management
    • ssg_core_services
    • 14
    • 23
    • 8
    • False
    • Hide

      None

      Show
      None
    • Yes
    • None
    • Approved Blocker
    • Bug Fix
    • Hide
      .ReaR recovery no longer fails on systems with a small thin pool metadata size

      Previously, ReaR did not save the size of the pool metadata volume when saving a layout of an LVM volume group with a thin pool. During recovery, ReaR recreated the pool with the default size even if the system used a non-default pool metadata size.

      As a consequence, when the original pool metadata size was smaller than the default size and no free space was available in the volume group, the layout recreation during system recovery failed with a message in the log similar to these examples:

      ----
      Insufficient free space: 230210 extents needed, but only 230026 available
      ----
      or
      ----
      Volume group "vg" has insufficient free space (16219 extents): 16226 required.
      ----

      With this update, the recovered system has a metadata volume with the same size as the original system. As a result, the recovery of a system with a small thin pool metadata size and no extra free space in the volume group finishes successfully.
      Show
      .ReaR recovery no longer fails on systems with a small thin pool metadata size Previously, ReaR did not save the size of the pool metadata volume when saving a layout of an LVM volume group with a thin pool. During recovery, ReaR recreated the pool with the default size even if the system used a non-default pool metadata size. As a consequence, when the original pool metadata size was smaller than the default size and no free space was available in the volume group, the layout recreation during system recovery failed with a message in the log similar to these examples: ---- Insufficient free space: 230210 extents needed, but only 230026 available ---- or ---- Volume group "vg" has insufficient free space (16219 extents): 16226 required. ---- With this update, the recovered system has a metadata volume with the same size as the original system. As a result, the recovery of a system with a small thin pool metadata size and no extra free space in the volume group finishes successfully.
    • Done
    • None

      Description of problem:

      Can not recover a host from backup iso. On attempt to recover getting a message "disk layout recreation script failed"

      From /var/log/rear/rear-controller-0.log

      2022-12-20 13:19:59.132236169 Creating LVM volume 'vg/lv_thinpool'; Warning: some properties may not be preserved...
      +++ Print 'Creating LVM volume '\''vg/lv_thinpool'\''; Warning: some properties may not be preserved...'
      +++ lvm lvcreate -y --chunksize 65536b --type thin-pool -L 68056776704b --thinpool lv_thinpool vg
      Thin pool volume with chunk size 64.00 KiB can address at most <15.88 TiB of data.
      Volume group "vg" has insufficient free space (16219 extents): 16226 required.

      ....

      +++ LogPrint 'Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.'
      +++ Log 'Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.'
      +++ echo '2022-12-20 13:38:29.466548728 Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.'
      2022-12-20 13:38:29.466548728 Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.
      +++ Print 'Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.'
      +++ wipefs --all --force /dev/mapper/vg-lv_root
      +++ mkfs.xfs -f -m uuid=1cf3d69c-7dfe-40ab-b6a7-e6110912489e -i size=512 -d agcount=28 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/vg-lv_root
      mkfs.xfs: xfs_mkfs.c:2703: validate_datadev: Assertion `cfg->dblocks' failed.
      /var/lib/rear/layout/diskrestore.sh: line 323: 4142 Aborted (core dumped) mkfs.xfs -f -m uuid=1cf3d69c-7dfe-40ab-b6a7-e6110912489e -i size=512 -d agcount=28 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/vg-lv_root 1>&2
      +++ mkfs.xfs -f -i size=512 -d agcount=28 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/vg-lv_root
      mkfs.xfs: xfs_mkfs.c:2703: validate_datadev: Assertion `cfg->dblocks' failed.
      /var/lib/rear/layout/diskrestore.sh: line 323: 4144 Aborted (core dumped) mkfs.xfs -f -i size=512 -d agcount=28 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/vg-lv_root 1>&2

      Version-Release number of selected component (if applicable):
      Relax-and-Recover 2.6 / 2020-06-17
      Red Hat Enterprise Linux release 9.1 (Plow)
      Host is a KVM virtual machine with UEFI, os section
      <os>
      <type arch='x86_64' machine='pc-q35-rhel7.6.0'>hvm</type>
      <loader readonly='yes' secure='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
      <nvram>/var/lib/libvirt/qemu/nvram/controller-0_VARS.fd</nvram>
      <boot dev='hd'/>
      </os>

      How reproducible:
      100%

      Steps to Reproduce:
      1. Backup a host
      2. Try to recover the host from the backup

      Actual results:
      Recovery fails complaining that disk layout recreation script failed

      Expected results:
      Recovery completed successfully

      Additional info:
      local.conf
      export TMPDIR="${TMPDIR-/var/tmp}"
      ISO_DEFAULT="automatic"
      OUTPUT=ISO
      BACKUP=NETFS
      BACKUP_PROG_COMPRESS_OPTIONS=( --gzip)
      BACKUP_PROG_COMPRESS_SUFFIX=".gz"
      OUTPUT_URL=nfs://192.168.24.1/ctl_plane_backups
      ISO_PREFIX=$HOSTNAME-202212201022
      BACKUP_URL=nfs://192.168.24.1/ctl_plane_backups
      BACKUP_PROG_CRYPT_ENABLED=False
      BACKUP_PROG_OPTIONS+=( --anchored --xattrs-include='.' --xattrs )
      BACKUP_PROG_EXCLUDE=( '/data/' '/tmp/' '/ctl_plane_backups/*' )
      EXCLUDE_RECREATE+=( "/dev/cinder-volumes" )
      USING_UEFI_BOOTLOADER=1
      LOGFILE="$LOG_DIR/rear-$HOSTNAME-202212201022.log"

      [cloud-admin@controller-0 ~]$ lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT
      NAME KNAME PKNAME TRAN TYPE FSTYPE LABEL SIZE MOUNTPOINT
      /dev/loop0 /dev/loop0 loop LVM2_member 20.1G
      /dev/vda /dev/vda disk 64G

      -/dev/vda1 /dev/vda1 /dev/vda part vfat MKFS_ESP 16M /boot/efi
      -/dev/vda2 /dev/vda2 /dev/vda part 8M
      -/dev/vda3 /dev/vda3 /dev/vda part ext4 mkfs_boot 500M /boot
      -/dev/vda4 /dev/vda4 /dev/vda part LVM2_member 5G
        -/dev/mapper/vg-lv_thinpool_tmeta /dev/dm-0 /dev/vda4 lvm 8M
        `-/dev/mapper/vg-lv_thinpool-tpool /dev/dm-2 /dev/dm-0 lvm 63.4G
          -/dev/mapper/vg-lv_thinpool /dev/dm-3 /dev/dm-2 lvm 63.4G
          -/dev/mapper/vg-lv_root /dev/dm-4 /dev/dm-2 lvm xfs img-rootfs 10.5G /
          -/dev/mapper/vg-lv_tmp /dev/dm-5 /dev/dm-2 lvm xfs fs_tmp 1.2G /tmp
          -/dev/mapper/vg-lv_var /dev/dm-6 /dev/dm-2 lvm xfs fs_var 37G /var
          -/dev/mapper/vg-lv_log /dev/dm-7 /dev/dm-2 lvm xfs fs_log 3G /var/log
          -/dev/mapper/vg-lv_audit /dev/dm-8 /dev/dm-2 lvm xfs fs_audit 1.1G /var/log/audit
          -/dev/mapper/vg-lv_home /dev/dm-9 /dev/dm-2 lvm xfs fs_home 1.2G /home
        `-/dev/mapper/vg-lv_srv /dev/dm-10 /dev/dm-2 lvm xfs fs_srv 9.4G /srv
      `-/dev/mapper/vg-lv_thinpool_tdata /dev/dm-1 /dev/vda4 lvm 63.4G
      `-/dev/mapper/vg-lv_thinpool-tpool /dev/dm-2 /dev/dm-1 lvm 63.4G
        -/dev/mapper/vg-lv_thinpool /dev/dm-3 /dev/dm-2 lvm 63.4G
        -/dev/mapper/vg-lv_root /dev/dm-4 /dev/dm-2 lvm xfs img-rootfs 10.5G /
        -/dev/mapper/vg-lv_tmp /dev/dm-5 /dev/dm-2 lvm xfs fs_tmp 1.2G /tmp
        -/dev/mapper/vg-lv_var /dev/dm-6 /dev/dm-2 lvm xfs fs_var 37G /var
        -/dev/mapper/vg-lv_log /dev/dm-7 /dev/dm-2 lvm xfs fs_log 3G /var/log
        -/dev/mapper/vg-lv_audit /dev/dm-8 /dev/dm-2 lvm xfs fs_audit 1.1G /var/log/audit
        -/dev/mapper/vg-lv_home /dev/dm-9 /dev/dm-2 lvm xfs fs_home 1.2G /home
      `-/dev/mapper/vg-lv_srv /dev/dm-10 /dev/dm-2 lvm xfs fs_srv 9.4G /srv
      -/dev/vda5 /dev/vda5 /dev/vda part iso9660 config-2 65M
      `-/dev/vda6 /dev/vda6 /dev/vda part LVM2_member 58.5G
      `-/dev/mapper/vg-lv_thinpool_tdata /dev/dm-1 /dev/vda6 lvm 63.4G
      `-/dev/mapper/vg-lv_thinpool-tpool /dev/dm-2 /dev/dm-1 lvm 63.4G
      -/dev/mapper/vg-lv_thinpool /dev/dm-3 /dev/dm-2 lvm 63.4G
      -/dev/mapper/vg-lv_root /dev/dm-4 /dev/dm-2 lvm xfs img-rootfs 10.5G /
      -/dev/mapper/vg-lv_tmp /dev/dm-5 /dev/dm-2 lvm xfs fs_tmp 1.2G /tmp
      -/dev/mapper/vg-lv_var /dev/dm-6 /dev/dm-2 lvm xfs fs_var 37G /var
      -/dev/mapper/vg-lv_log /dev/dm-7 /dev/dm-2 lvm xfs fs_log 3G /var/log
      -/dev/mapper/vg-lv_audit /dev/dm-8 /dev/dm-2 lvm xfs fs_audit 1.1G /var/log/audit
      -/dev/mapper/vg-lv_home /dev/dm-9 /dev/dm-2 lvm xfs fs_home 1.2G /home
      `-/dev/mapper/vg-lv_srv /dev/dm-10 /dev/dm-2 lvm xfs fs_srv 9.4G /srv

      The issue was found during openstack control plane nodes backup and recovery, link to procedure:
      https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/backing_up_and_restoring_the_undercloud_and_control_plane_nodes/assembly_backing-up-the-control-plane-nodes_br-undercloud-ctlplane#proc_creating-a-backup-of-the-control-plane-nodes_backup-ctlplane

      mkdir /tmp/backup-recover-temp/
      cp ./overcloud-deploy/overcloud/config-download/overcloud/tripleo-ansible-inventory.yaml /tmp/backup-recover-temp/tripleo-inventory.yaml

      source /home/stack/stackrc
      openstack overcloud backup --inventory /tmp/backup-recover-temp/tripleo-inventory.yaml --setup-nfs --extra-vars '

      {"tripleo_backup_and_restore_server": 192.168.24.1,"nfs_server_group_name": Undercloud}

      '

      openstack overcloud backup --inventory /tmp/backup-recover-temp/tripleo-inventory.yaml --setup-rear --extra-vars '

      {"tripleo_backup_and_restore_server": 192.168.24.1}

      '

      openstack overcloud backup --inventory /tmp/backup-recover-temp/tripleo-inventory.yaml

            [RHEL-6984] Can not recover a host since disk layout recreation script fails

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (rear bug fix update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHBA-2024:2283

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (rear bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:2283

            Hi rhn-support-pcahyna  

            Hope you are doing well !

            Please provide information in the Release Note Text field if any documentation efforts are required here .

            See How to report a release note for reference.

             

            Thanks

            Mugdha

            Mugdha Soni added a comment - Hi rhn-support-pcahyna    Hope you are doing well ! Please provide information in the Release Note Text field if any documentation efforts are required here . See How to report a release note  for reference.   Thanks Mugdha

            Pavel Cahyna added a comment - Upstream fix now merged https://github.com/rear/rear/pull/3061

            hello romansaf

            Testing with MKFS_XFS_OPTIONS=" " also applies to RHEL-10478, as it is exactly this option that triggers the problem. I don't know how to make the issue visible to you

            I think I figured out how to make RHEL-10478 accessible to you. Can you see it now, please?

            I can run an existing OSP17.1 job that does network backend conversion and restores it back. It uses backup&restore by ReaR. IIUC I should just run it with bigger pools, about 1.5 or 2 TB.

            Yes, please try it when you can - I suspect that you will see a similar issue as in https://bugzilla.redhat.com/show_bug.cgi?id=2232632#c14 despite using OSP compose RHOS-17.1-RHEL-9-20230907.n.1 or later.

            Pavel Cahyna added a comment - hello romansaf Testing with MKFS_XFS_OPTIONS=" " also applies to RHEL-10478, as it is exactly this option that triggers the problem. I don't know how to make the issue visible to you I think I figured out how to make RHEL-10478 accessible to you. Can you see it now, please? I can run an existing OSP17.1 job that does network backend conversion and restores it back. It uses backup&restore by ReaR. IIUC I should just run it with bigger pools, about 1.5 or 2 TB. Yes, please try it when you can - I suspect that you will see a similar issue as in https://bugzilla.redhat.com/show_bug.cgi?id=2232632#c14 despite using OSP compose RHOS-17.1-RHEL-9-20230907.n.1 or later.

            pm-rhel added a comment -

            Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

            pm-rhel added a comment - Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

            BTW, I found an existing bug for handling very small thin pool metadata volume size https://bugzilla.redhat.com/show_bug.cgi?id=2149586

            Roman Safronov added a comment - BTW, I found an existing bug for handling very small thin pool metadata volume size https://bugzilla.redhat.com/show_bug.cgi?id=2149586

            the manual page states that agcount and agsize are mutually exclusive. I tried to use your value of agsize and let the command deduce agcount:

            mkfs.xfs f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agsize=6144b -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm08-guest09-lv_srv

            this passes. The resulting filesystem has these parameters:

            meta-data=/dev/mapper/rhel_kvm-08-guest09-lv_srv isize=512 agcount=399, agsize=6144 blks
            = sectsz=512 attr=2, projid32bit=1
            = crc=1 finobt=1, sparse=1, rmapbt=0
            = reflink=1 bigtime=1 inobtcount=1
            data = bsize=4096 blocks=2451456, imaxpct=25
            = sunit=16 swidth=16 blks
            naming =version 2 bsize=4096 ascii-ci=0, ftype=1
            log =internal log bsize=4096 blocks=2560, version=2
            = sectsz=512 sunit=16 blks, lazy-count=1
            realtime =none extsz=4096 blocks=0, rtextents=0

            Note that it is using agcount=399. I noticed that the size of the log section is different. I tried to match it by adding "-l size=1872b":

            mkfs.xfs f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agsize=6144b -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -l size=1872b -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm08-guest09-lv_srv

            the result has:

            meta-data=/dev/mapper/rhel_kvm-08-guest09-lv_srv isize=512 agcount=399, agsize=6144 blks
            = sectsz=512 attr=2, projid32bit=1
            = crc=1 finobt=1, sparse=1, rmapbt=0
            = reflink=1 bigtime=1 inobtcount=1
            data = bsize=4096 blocks=2451456, imaxpct=25
            = sunit=16 swidth=16 blks
            naming =version 2 bsize=4096 ascii-ci=0, ftype=1
            log =internal log bsize=4096 blocks=1872, version=2
            = sectsz=512 sunit=16 blks, lazy-count=1
            realtime =none extsz=4096 blocks=0, rtextents=0

            so, still a bit different, and agcount=399. My conclusion is that it is not feasible to 100% match all the parameters of the original file system in the recreated file system. Not sure why, maybe your image was created using a different version of mkfs.xfs. And for some reason forcing it to match agcount triggers an assertion, and matching agsize works better.

            This is in some sense analogous to the LVM problem that we discussed first. The combination of parameters deduced from the original layout does not work 100% when creating the new layout.

            Are your images going to be used by customers, or are they produced only for internal use?

            Pavel Cahyna added a comment - the manual page states that agcount and agsize are mutually exclusive. I tried to use your value of agsize and let the command deduce agcount: mkfs.xfs f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agsize=6144b -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm 08 -guest09-lv_srv this passes. The resulting filesystem has these parameters: meta-data=/dev/mapper/rhel_kvm- 08 -guest09-lv_srv isize=512 agcount=399, agsize=6144 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=1 inobtcount=1 data = bsize=4096 blocks=2451456, imaxpct=25 = sunit=16 swidth=16 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Note that it is using agcount=399. I noticed that the size of the log section is different. I tried to match it by adding "-l size=1872b": mkfs.xfs f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agsize=6144b -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -l size=1872b -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm 08 -guest09-lv_srv the result has: meta-data=/dev/mapper/rhel_kvm- 08 -guest09-lv_srv isize=512 agcount=399, agsize=6144 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=1 inobtcount=1 data = bsize=4096 blocks=2451456, imaxpct=25 = sunit=16 swidth=16 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1872, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 so, still a bit different, and agcount=399. My conclusion is that it is not feasible to 100% match all the parameters of the original file system in the recreated file system. Not sure why, maybe your image was created using a different version of mkfs.xfs. And for some reason forcing it to match agcount triggers an assertion, and matching agsize works better. This is in some sense analogous to the LVM problem that we discussed first. The combination of parameters deduced from the original layout does not work 100% when creating the new layout. Are your images going to be used by customers, or are they produced only for internal use?

            contents of /var/lib/rear/layout/xfs/vg-lv_srv.xfs

            meta-data=/dev/mapper/vg-lv_srv isize=512 agcount=400, agsize=6144 blks
            = sectsz=512 attr=2, projid32bit=1
            = crc=1 finobt=1, sparse=1, rmapbt=0
            = reflink=1 bigtime=1 inobtcount=1
            data = bsize=4096 blocks=2453504, imaxpct=25
            = sunit=16 swidth=16 blks
            naming =version 2 bsize=4096 ascii-ci=0, ftype=1
            log =internal log bsize=4096 blocks=1872, version=2
            = sectsz=512 sunit=16 blks, lazy-count=1
            realtime =none extsz=4096 blocks=0, rtextents=0

            Regarding the source of agcount=400 I am not sure, I am just using an environment deployed by CI. IIUC, openstack nodes are provisioned using overcloud-hardened-uefi-full.raw image which has a pre-defined disk layout.

            Roman Safronov added a comment - contents of /var/lib/rear/layout/xfs/vg-lv_srv.xfs meta-data=/dev/mapper/vg-lv_srv isize=512 agcount=400, agsize=6144 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=1 inobtcount=1 data = bsize=4096 blocks=2453504, imaxpct=25 = sunit=16 swidth=16 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1872, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Regarding the source of agcount=400 I am not sure, I am just using an environment deployed by CI. IIUC, openstack nodes are provisioned using overcloud-hardened-uefi-full.raw image which has a pre-defined disk layout.

            This seems to be an unrelated problem. I tried the problematic command

            mkfs.xfs f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agcount=400 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm08-guest09-lv_srv

            and it dumps core for me as well, so it is easily reproducible.

            According to the assertion, it seems that it does not like "-d agcount=400". Indeed, when I change "-d agcount=400" to "-d agcount=40", the command passes.

            Now, the question is how agcount=400 got there. Can you please provide the content of /var/lib/rear/layout/xfs/vg-lv_srv.xfs ? I suppose it will also have "agcount=400". Assuming this is the case, the question is, how did it get there? The file seems to be merely the output of "xfs_info /srv". If that's the case, how could /srv have been created with agcount=400 if mkfs.xfs rejects this value? Has the VM in question been upgraded from an earlier version of RHEL? I am thinking that maybe the filesystem was created with an older version of mkfs.xfs that allowed this and the assertion was added to the code later.

            This could also explain the thin pool problem, because a similar question arises: how could the thin pool have been created with such a small metadata volume, when the default is a larger metadata volume? Maybe it was created when the default was different, and new LVM with the same parameters now creates a different layout? Or have you provided the (small) metadata volume size manually when creating the volume for the first time?

            Pavel Cahyna added a comment - This seems to be an unrelated problem. I tried the problematic command mkfs.xfs f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agcount=400 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm 08 -guest09-lv_srv and it dumps core for me as well, so it is easily reproducible. According to the assertion, it seems that it does not like "-d agcount=400". Indeed, when I change "-d agcount=400" to "-d agcount=40", the command passes. Now, the question is how agcount=400 got there. Can you please provide the content of /var/lib/rear/layout/xfs/vg-lv_srv.xfs ? I suppose it will also have "agcount=400". Assuming this is the case, the question is, how did it get there? The file seems to be merely the output of "xfs_info /srv". If that's the case, how could /srv have been created with agcount=400 if mkfs.xfs rejects this value? Has the VM in question been upgraded from an earlier version of RHEL? I am thinking that maybe the filesystem was created with an older version of mkfs.xfs that allowed this and the assertion was added to the code later. This could also explain the thin pool problem, because a similar question arises: how could the thin pool have been created with such a small metadata volume, when the default is a larger metadata volume? Maybe it was created when the default was different, and new LVM with the same parameters now creates a different layout? Or have you provided the (small) metadata volume size manually when creating the volume for the first time?

            I tried specified in comment 19, the command passed but the diskrestore.sh script still fails, see the attached rear-controller-0.log_20221222

            Roman Safronov added a comment - I tried specified in comment 19, the command passed but the diskrestore.sh script still fails, see the attached rear-controller-0.log_20221222

              rhn-support-pcahyna Pavel Cahyna
              romansaf Roman Safronov
              Pavel Cahyna Pavel Cahyna
              Jakub Haruda Jakub Haruda
              Mugdha Soni Mugdha Soni
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: