Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-76917

[RHEL9] Reinstalling a system with software raid fails when creating the /boot/efi raid: "Device or resource busy"

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • rhel-9.7
    • rhel-9.4, rhel-9.5
    • python-blivet
    • python-blivet-3.6.0-27.el9
    • No
    • Moderate
    • rhel-storage-management
    • ssg_platform_storage
    • 15
    • 17
    • 3
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None

      What were you trying to do that didn't work?

      When a system is configured in Software Raid, you need to wipe everything first before reinstalling. This can be done using a %pre script:

      %pre --log=/tmp/ks-pre.log --erroronfail
      echo "stopping LVM and raids"
      vgchange -a n
      mdadm --stop --scan
      
      echo "wiping disks"
      wipefs -af /dev/sda
      wipefs -af /dev/sdb
      

      Unfortunately wiping the disks is not sufficient, because the boot partition (/boot/efi on UEFI, /boot} on BIOS) has its metadata created at end of the partition ({{metadata=1.0.
      Due to some race occurring when creating the software raid for that partition, the following error shows up:

      Configuring storage
      Creating disklabel on /dev/sdb
      Creating mdmember on /dev/sdb1
      Creating mdmember on /dev/sdb3
      Creating mdmember on /dev/sdb2
      Creating disklabel on /dev/sda
      Creating mdmember on /dev/sda1
      Creating xfs on /dev/md/boot
      Creating mdmember on /dev/sda3
      Creating lvmpv on /dev/md/os_pv
      Creating xfs on /dev/mapper/rhel-root
      Creating swap on /dev/mapper/rhel-swap
      Creating mdmember on /dev/sda2
      
      An unknown error has occured, look at the /tmp/anaconda-tb* file(s) for more details
      
      Traceback (most recent call first):
        File "/usr/lib/python3.9/site-packages/dasbus/client/handler.py", line 497, in _handle_method_error
          raise exception from None
        File "/usr/lib/python3.9/site-packages/dasbus/client/handler.py", line 477, in _get_method_reply
          return self._handle_method_error(error)
        File "/usr/lib/python3.9/site-packages/dasbus/client/handler.py", line 444, in _call_method
          return self._get_method_reply(
        File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/common/task/__init__.py", line 47, in sync_run_task
          task_proxy.Finish()
        File "/usr/lib64/python3.9/site-packages/pyanaconda/installation_tasks.py", line 527, in run_task
          sync_run_task(self._task_proxy)
        File "/usr/lib64/python3.9/site-packages/pyanaconda/installation_tasks.py", line 496, in start 
          self.run_task()
        File "/usr/lib64/python3.9/site-packages/pyanaconda/installation_tasks.py", line 311, in start 
          item.start()
        File "/usr/lib64/python3.9/site-packages/pyanaconda/installation_tasks.py", line 311, in start 
          item.start()
        File "/usr/lib64/python3.9/site-packages/pyanaconda/installation_tasks.py", line 311, in start 
          item.start()
        File "/usr/lib64/python3.9/site-packages/pyanaconda/installation.py", line 399, in run_installation
          queue.start()
        File "/usr/lib64/python3.9/threading.py", line 917, in run
          self._target(*self._args, **self._kwargs)
        File "/usr/lib64/python3.9/site-packages/pyanaconda/threading.py", line 275, in run
          threading.Thread.run(self)
      dasbus.error.DBusError: Process reported exit code 1: mdadm: super1.x cannot open /dev/sdb2: Device or resource busy
      mdadm: /dev/sdb2 is not suitable for this array.
      mdadm: create aborted
      

      The only solution I found was to dd if=/dev/zero the disks instead of just wiping the signatures, but this operation can be very long when having large disks.

      It would be great to find another solution, any idea would be appreciated.
      Maybe the wipe should be done from Anaconda itself, automatically, at the time the disks are rebuilt.

      What is the impact of this issue to you?

      Cannot reinstall automatically the system

      Please provide the package NVR for which the bug is seen:

      anaconda-34.25.4.9

      How reproducible is this bug?:

      Always

      Steps to reproduce

      1. Install a system using the kickstart attached (I used 2 x 20GB SCSI disks, 8 vCPU and 4GB of memory)
      2. Reinstall the system using the kickstart

      Expected results

      No failure to reinstall

      Actual results

      Error message

      Additional information

      I'm not completely sure about the race, but I believe it's due to having a udevadm settle just before creating the array:

      INFO:program:Running... udevadm settle --timeout=300
      DEBUG:program:Return code: 0
      DEBUG:blivet:                PartitionDevice.setup: sda2 ; orig: False ; status: True ; controllable: True ;
      DEBUG:blivet:                MDRaidMember.create: device: /dev/sda2 ; type: mdmember ; status: False ;
      INFO:program:Running... udevadm settle --timeout=300
      DEBUG:program:Return code: 0
      DEBUG:blivet:                PartitionDevice.update_sysfs_path: sda2 ; status: True ;
      DEBUG:blivet:sda2 sysfs_path set to /sys/devices/pci0000:00/0000:00:02.2/0000:03:00.0/virtio2/host0/target0:0:0/0:0:0:2/block/sda/sda2
      INFO:blivet:executing action: [174] create device mdarray efiboot (id 170)
      DEBUG:blivet:                MDRaidArrayDevice.create: efiboot ; status: False ;
      DEBUG:blivet:                    MDRaidArrayDevice.setup_parents: name: efiboot ; orig: False ;
      DEBUG:blivet:                      PartitionDevice.setup: sda2 ; orig: False ; status: True ; controllable: True ;
      DEBUG:blivet:                      MDRaidMember.setup: device: /dev/sda2 ; type: mdmember ; status: False ;
      DEBUG:blivet:                      PartitionDevice.setup: sdb2 ; orig: False ; status: True ; controllable: True ;
      DEBUG:blivet:                      MDRaidMember.setup: device: /dev/sdb2 ; type: mdmember ; status: False ;
      DEBUG:blivet:                  MDRaidArrayDevice._create: efiboot ; status: False ;
      DEBUG:blivet:non-existent RAID raid1 size == 510 MiB
      INFO:program:Running [59] mdadm --create /dev/md/efiboot --run --level=raid1 --raid-devices=2 --metadata=1.0 --bitmap=internal /dev/sda2 /dev/sdb2 ...
      INFO:program:stdout[59]: 
      INFO:program:stderr[59]: mdadm: super1.x cannot open /dev/sdb2: Device or resource busy
      mdadm: /dev/sdb2 is not suitable for this array.
      mdadm: create aborted 
      

      For sure, the failure always occurs on the 2nd member of the RAID, which tends to confirm some "auto-enablement" of the 2nd member as the ancient array.

              vtrefny@redhat.com Vojtěch Trefný
              rhn-support-rmetrich Renaud Métrich
              Vojtěch Trefný Vojtěch Trefný
              Release Test Team Release Test Team
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: