Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-56796

kernel-install corrupts custom boot entries via grub2 scripts (rhel9 - 10 upgrades)

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • rhel-10.0.beta, rhel-10.0
    • grub2
    • None
    • Yes
    • Important
    • rhel-sst-desktop-firmware-bootloaders
    • ssg_display
    • 3
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • All
    • None

      What were you trying to do that didn't work?

      Upgrading a RHEL9.5 system to RHEL10 Beta with a custom boot entry to boot into a snapshot of the prior system state. Following the upgrade the options key of all BLS entries in /boot/loader/entries are re-set to the default from /etc/default/grub. This breaks booting into the snapshot environment.

      This affects the documented procedure for carrying out major upgrades with snapshots, as well as known customer workflows using backups/copies of the root filesystem:

      https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/managing_storage_devices/managing-system-upgrades-with-snapshots_managing-storage-devices

      RHEL-49868

      Please provide the package NVR for which bug is seen:

      grub2-common-2.06-125.el10.noarch

      How reproducible:

      100%

      Steps to reproduce

      1.  On a system installed to LVM take a snapshot of the root file system and create a custom boot entry that uses the snapshot as the root device.
      2. Start an in place upgrade from RHEL9.5 to RHEL10 Beta using leapp: https://docs.google.com/document/d/1-zPaLjNdJ3BSWiSTDFzShtWcR2jwmqBp7VkcTAZhcHw/edit
      3. The upgrade will fail due to a hang in os-brober which blocks the leapp DNF transaction from completing when a snapshot of the root fs exists (RHEL-56629)
      4. Reboot and attempt to boot into the snapshot boot entry to recover the system

      Expected results

      Custom boot entries are not modified: snapshot boot entry boots using the configured snapshot device as the root device allowing recovery of the pre-upgrade system.

      Actual results

      The root= option has been reset to the root logical volume, causing the system to attempt to boot into the broken RHEL10 Beta installation which fails.

       

      Additional information

      This happens due to the following code in /usr/lib/kernel/install.d/20-grub.install (grub2-common):

              if [[ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ]] || [[ ! -f /sbin/new-kernel-pkg ]]; then
                  if [[ -f /etc/kernel/cmdline ]]; then
                      if [[ /etc/kernel/cmdline -ot /etc/default/grub ]]; then
                          # user modified /etc/default/grub manually; sync
                          grub2-mkconfig -o /etc/grub2.cfg
                      fi
       

      The file /etc/default/grub is provided by grub2-tools: when the package is updated during the upgrade transaction this updates the file timestamp, causing the -ot comparison with /etc/kernel/cmdline to evaluate as true (even though the file has not been modified).

      grub2-mkconfig then runs /etc/grub.d/10_linux which will re-write the options key for all BLS snippets in /boot/loader/entries unless --no-grubenv-update is given:

      update_bls_cmdline()
      {
          local cmdline="root=${LINUX_ROOT_DEVICE} ro ${GRUB_CMDLINE_LINUX} ${GRUB_CMDLINE_LINUX_DEFAULT}"
          local -a files=($(get_sorted_bls))    if [ -w /etc/kernel ] &&
                 [[ ! -f /etc/kernel/cmdline ||
                        /etc/kernel/cmdline -ot /etc/default/grub ]]; then
              # anaconda has the correct information to create this during install;
              # afterward, grubby will take care of syncing on updates.  If the user
              # has modified /etc/default/grub, try to cope.
              echo "$cmdline" > /etc/kernel/cmdline
          fi    for bls in "${files[@]}"; do
              local options="${cmdline}"
              if [ -z "${bls##*debug*}" ]; then
                  options="${options} ${GRUB_CMDLINE_LINUX_DEBUG}"
              fi
              options="$(echo "${options}" | sed -e 's/\//\\\//g')"
              sed -i -e "s/^options.*/options ${options}/" "${blsdir}/${bls}.conf"
          done
      }
       

      The grub2-mkconfig behaviour was reported as a bug against Fedora 37:

      https://bugzilla.redhat.com/show_bug.cgi?id=2120845

       

      But this was closed EOL with no changes - at the time it was not such a problem as the only component that ran grub2-mkconfig automatically was Anaconda during a new installation (which also deletes all files under /boot/loader/entries anyway).

       

      The code in 20-grub.install was added in dist-git commit fc76aed in 2022 (Fedora 36) but the side effects went unnoticed at the time:

       

      commit fc76aed5333f56dd05400521a35b944a5df52ebc
      Author: Robbie Harwood <rharwood@redhat.com>
      Date:   Wed Aug 17 15:03:25 2022 +0000    Fix duplicated args and cope with /etc/default/grub modification
          
          Signed-off-by: Robbie Harwood <rharwood@redhat.com> 

      There's no bug attached to the Fedora commit and I can't find any mailing list discussion or other rationale for this change. It's not clear what it was intending to fix.

       

      The overall sequence of events in the RHEL10 upgrade case is:

      dnf updates grub2-tools which updates the /etc/default/grub timestamp

      dnf runs kernel-core %posttrans scriptlet

        scriptlet runs "kernel-install add" for the RHEL10 kernel

          kernel-install runs /usr/lib/kernel/install.d/20-grub.install

              20-grub.install sees that /etc/default/grub is newer than /etc/kernel/cmdline and runs grub2-mkconfig
                grub2-mkconfig runs /etc/grub.d/10_linux which rewrites /boot/loader/entries/*.conf

                grub2-mkconfig runs os-prober

                     os-prober then hangs when run by grub2-mkconfig and blocks the entire upgrade process

       

      Workaround

      The problem can be worked around by updating the /etc/kernel/cmdline timestamp to some time in the future before beginning the upgrade, e.g.:

      # touch --date=tomorrow /etc/kernel/cmdline 

      This prevents 20-grub.install from running grub2-mkconfig, avoiding the hang in os-prober and preventing the custom boot entry from being overwritten.

              bootloader-eng-team bootloader -eng-team
              rhn-support-bmr Bryn Reeves
              bootloader -eng-team bootloader -eng-team
              Release Test Team Release Test Team
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: