Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-36688

boom: fix cache reference counting bug

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • rhel-8.8.0, rhel-8.9.0, rhel-8.10
    • boom-boot
    • None
    • None
    • Important
    • Patch
    • 7a134cf302a2a3998144a222a98453d8ea16d3a4
    • rhel-sst-logical-storage
    • ssg_filesystems_storage_and_HA
    • 2
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None
    • None
    • All
    • None

      What were you trying to do that didn't work?

      Upgrading from RHEL8 to RHEL9 using leapp and boom can lead to a situation where boom incorrectly deletes the backup boot images used by the snapshot boot entry, making the snapshot unbootable.

      This happens due to an interaction between the RHEL9 kexec-tools package update and  boot entries managed by boom that are created with the boom create --backup option to cache boot images.

      The kdumpctl command by default modifies all BLS *.conf files in /boot/loader/entries to include the RHEL9 crashkernel= parameter via the posttrans scriptlet:

       posttrans scriptlet (using /bin/sh):
      # Try to reset kernel crashkernel value to new default value or set up
      # crasherkernel value for new install
      #
      # Note
      #  1. Skip ostree systems as they are not supported.
      #  2. For Fedora 36 and RHEL9, "[ $1 == 1 ]" in posttrans scriptlet means both install and upgrade;
      #     For Fedora > 36, "[ $1 == 1 ]" only means install and "[ $1 == 2 ]" means upgrade
      if [ ! -f /run/ostree-booted ] && [ $1 == 1 -o $1 == 2 ]; then
        kdumpctl _reset-crashkernel-after-update
        :
      fi
      

       

      This causes the boot_id for the entry to change, which causes boom to treat it as a foreign (read-only) entry. This triggers a bug in boom-1.6.0 and earlier that incorrectly reference counts the cached images, causing them to have a reference count of zero and triggering automatic cleanup:

      [root@localhost ~]# boom list -VV 
      DEBUG - reading boom configuration from '/boot/boom/boom.conf'
      DEBUG - Found global.boot_path
      DEBUG - Found global.boom_path
      DEBUG - Found legacy.enable
      DEBUG - Found legacy.sync
      ...
      BootID  Version                      Name                     RootDevice                             Options                                                                                                                                                                                                  MachineID 
      1e1a9b4 4.18.0-513.5.1.el8_9.x86_64  Red Hat Enterprise Linux /dev/mapper/rhel-root                  root=/dev/mapper/rhel-root ro resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet $tuned_params crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M                                 b1609d18cd
      4ea37b9 4.18.0-513.24.1.el8_9.x86_64 Red Hat Enterprise Linux /dev/mapper/rhel-root                  root=/dev/mapper/rhel-root ro resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet $tuned_params crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M                                 b1609d18cd
      e22dd61 5.14.0-362.24.1.el9_3.x86_64 Red Hat Enterprise Linux /dev/mapper/rhel-root                  root=/dev/mapper/rhel-root ro resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet rd.plymouth=0 plymouth.enable=0 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M $tuned_params b1609d18cd
      6e810cc 4.18.0-513.24.1.el8_9.x86_64 Red Hat Enterprise Linux /dev/rhel/root_snapshot_before_changes root=/dev/rhel/root_snapshot_before_changes ro rd.lvm.lv=rhel/root_snapshot_before_changes                                                                                                               b1609d18cd
      DEBUG - Loading cache entries from '/boot/boom/cache/cacheindex.json'
      DEBUG - Loaded 2 cache paths and 2 images
      INFO - Removed 2 unused cache entries
      

      The boot images used by the snapshot boot entry have now been deleted:

      [root@localhost ~]# boom show 6e810cc
      Boot Entry (boot_id=6e810cc)
        title Root LV snapshot before changes
        machine-id b1609d18cd704009b4e3f4142ec64eba
        version 4.18.0-513.24.1.el8_9.x86_64
        linux /vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0
        initrd /initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0
        options root=/dev/rhel/root_snapshot_before_changes ro rd.lvm.lv=rhel/root_snapshot_before_changes
      [root@localhost ~]# ls /boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0 /boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0
      ls: cannot access '/boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0': No such file or directory
      ls: cannot access '/boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0': No such file or directory
      

      Please provide the package NVR for which bug is seen:

      boom-boot-1.3-2.el8.noarch

      How reproducible:

      100%

      Steps to reproduce

      1.  create a boot entry using boom create --backup:

      [root@localhost ~]# boom create --backup --title "Test" --root-lv rhel/root
      Created entry with boot_id 049424b:
        title Test
        machine-id b1609d18cd704009b4e3f4142ec64eba
        version 4.18.0-513.24.1.el8_9.x86_64
        linux /vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0
        initrd /initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0
        options root=/dev/rhel/root ro rd.lvm.lv=rhel/root 

      2. Modify the boot entry file, e.g. addding crashkernel= parameter:

      [root@localhost ~]# cat /boot/loader/entries/b1609d18cd704009b4e3f4142ec64eba-049424b-4.18.0-513.24.1.el8_9.x86_64.conf
      #OsIdentifier: 43747d3888b663d2bc88efd35d0813159a84d291
      title Test
      machine-id b1609d18cd704009b4e3f4142ec64eba
      version 4.18.0-513.24.1.el8_9.x86_64
      linux /vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0
      initrd /initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0
      options root=/dev/rhel/root ro rd.lvm.lv=rhel/root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
       

      3. Run boom list:

      [root@localhost ~]# boom list
      BootID  Version                      Name                     RootDevice                            
      1e1a9b4 4.18.0-513.5.1.el8_9.x86_64  Red Hat Enterprise Linux /dev/mapper/rhel-root                 
      4ea37b9 4.18.0-513.24.1.el8_9.x86_64 Red Hat Enterprise Linux /dev/mapper/rhel-root                 
      e22dd61 5.14.0-362.24.1.el9_3.x86_64 Red Hat Enterprise Linux /dev/mapper/rhel-root                 
      235556a 4.18.0-513.24.1.el8_9.x86_64 Red Hat Enterprise Linux /dev/rhel/root     

      (note the boot_id has changed from 049424b to 235556a)

      4. Check for backup boot images in /boot:

      [root@localhost ~]# ls /boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0 /boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0
       

      Expected results

      [root@localhost ~]# ls /boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0 /boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0
      /boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0  /boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0
       

      Actual results

      ls: cannot access '/boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64.boom0': No such file or directory
      ls: cannot access '/boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img.boom0': No such file or directory 

       

      Upstream status

      This is fixed in commit 7a134cf:

      commit 7a134cf302a2a3998144a222a98453d8ea16d3a4
      Author: Bryn M. Reeves <bmr@redhat.com>
      Date:   Wed May 8 18:43:52 2024 +0100    Revert "boom.cache: ignore foreign boot entries when reference counting"
          
          This reverts commit f9704eea7b973863ea5a2bf6ad13cd37abce64f6.
          
          The reason for ignoring foreign boot entries when determining reference
          counts no longer exists (a spurious warning when creating entries with
          --backup).
          
          This commit causes problems if a boom-managed boot entry is modified
          outside of boom's control (e.g. by kdumctl/grubby appending a modified
          crashkernel argument): since boom sees the entry as foreign the images
          used by it end up with a reference count of zero and are automatically
          removed.
          
          Revert the above commit to prevent this behaviour.
          
          Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
       

      Impact on RHEL9

      boom-boot-1.5 and later on RHEL9 automatically initialise the options template from the current /proc/cmdline:

      commit 66c2cc0dbc250fc8abb8d0a999a98da3d9e9b076
      Author: Bryn M. Reeves <bmr@redhat.com>
      Date:   Mon Apr 17 16:32:40 2023 +0100
      
          Implement automatic --os-options templates from /proc/cmdline
          
          If no --os-options is given when creating a profile with --from-host
          attempt to automatically generate an options template from the running
          system's /proc/cmdline.
          
          Resolves: #14
          
          Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
      
      $ git describe --contains 66c2cc0dbc250fc8abb8d0a999a98da3d9e9b076
      1.5~3
      

      This substantially reduces the impact of this bug on RHEL9 systems since boom managed boot entries will already include the default crashkernel syntax set by kdumpctl:

      [root@localhost ~]# boom profile create --from-host
      Created profile with os_id d5bded8:
        OS ID: "d5bded84c4f37fc29568e83ea2cb2b1dfc3d5789",
        Name: "Red Hat Enterprise Linux", Short name: "rhel",
        Version: "9.4 (Plow)", Version ID: "9.4",
        Kernel pattern: "/vmlinuz-%{version}", Initramfs pattern: "/initramfs-%{version}.img",
        Root options (LVM2): "rd.lvm.lv=%{lvm_root_lv}",
        Root options (BTRFS): "rootflags=%{btrfs_subvolume}",
        Options: "root=%{root_device} ro %{root_opts} crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/swap rhgb quiet",
        Title: "%{os_name} %{os_version_id} (%{version})",
        Optional keys: "grub_users grub_arg grub_class id", UTS release pattern: "el9"
      

              lvm-team lvm-team
              rhn-support-bmr Bryn Reeves
              lvm-team lvm-team
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: