-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
rhel-10.0.beta, rhel-10.0
-
None
-
Yes
-
Important
-
rhel-sst-desktop-firmware-bootloaders
-
ssg_display
-
3
-
False
-
-
None
-
None
-
None
-
None
-
-
All
-
None
What were you trying to do that didn't work?
Upgrading a RHEL9.5 system to RHEL10 Beta with a custom boot entry to boot into a snapshot of the prior system state. Following the upgrade the options key of all BLS entries in /boot/loader/entries are re-set to the default from /etc/default/grub. This breaks booting into the snapshot environment.
This affects the documented procedure for carrying out major upgrades with snapshots, as well as known customer workflows using backups/copies of the root filesystem:
Please provide the package NVR for which bug is seen:
grub2-common-2.06-125.el10.noarch
How reproducible:
100%
Steps to reproduce
- On a system installed to LVM take a snapshot of the root file system and create a custom boot entry that uses the snapshot as the root device.
- Start an in place upgrade from RHEL9.5 to RHEL10 Beta using leapp: https://docs.google.com/document/d/1-zPaLjNdJ3BSWiSTDFzShtWcR2jwmqBp7VkcTAZhcHw/edit
- The upgrade will fail due to a hang in os-brober which blocks the leapp DNF transaction from completing when a snapshot of the root fs exists (RHEL-56629)
- Reboot and attempt to boot into the snapshot boot entry to recover the system
Expected results
Custom boot entries are not modified: snapshot boot entry boots using the configured snapshot device as the root device allowing recovery of the pre-upgrade system.
Actual results
The root= option has been reset to the root logical volume, causing the system to attempt to boot into the broken RHEL10 Beta installation which fails.
Additional information
This happens due to the following code in /usr/lib/kernel/install.d/20-grub.install (grub2-common):
if [[ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ]] || [[ ! -f /sbin/new-kernel-pkg ]]; then if [[ -f /etc/kernel/cmdline ]]; then if [[ /etc/kernel/cmdline -ot /etc/default/grub ]]; then # user modified /etc/default/grub manually; sync grub2-mkconfig -o /etc/grub2.cfg fi
The file /etc/default/grub is provided by grub2-tools: when the package is updated during the upgrade transaction this updates the file timestamp, causing the -ot comparison with /etc/kernel/cmdline to evaluate as true (even though the file has not been modified).
grub2-mkconfig then runs /etc/grub.d/10_linux which will re-write the options key for all BLS snippets in /boot/loader/entries unless --no-grubenv-update is given:
update_bls_cmdline() { local cmdline="root=${LINUX_ROOT_DEVICE} ro ${GRUB_CMDLINE_LINUX} ${GRUB_CMDLINE_LINUX_DEFAULT}" local -a files=($(get_sorted_bls)) if [ -w /etc/kernel ] && [[ ! -f /etc/kernel/cmdline || /etc/kernel/cmdline -ot /etc/default/grub ]]; then # anaconda has the correct information to create this during install; # afterward, grubby will take care of syncing on updates. If the user # has modified /etc/default/grub, try to cope. echo "$cmdline" > /etc/kernel/cmdline fi for bls in "${files[@]}"; do local options="${cmdline}" if [ -z "${bls##*debug*}" ]; then options="${options} ${GRUB_CMDLINE_LINUX_DEBUG}" fi options="$(echo "${options}" | sed -e 's/\//\\\//g')" sed -i -e "s/^options.*/options ${options}/" "${blsdir}/${bls}.conf" done }
The grub2-mkconfig behaviour was reported as a bug against Fedora 37:
https://bugzilla.redhat.com/show_bug.cgi?id=2120845
But this was closed EOL with no changes - at the time it was not such a problem as the only component that ran grub2-mkconfig automatically was Anaconda during a new installation (which also deletes all files under /boot/loader/entries anyway).
The code in 20-grub.install was added in dist-git commit fc76aed in 2022 (Fedora 36) but the side effects went unnoticed at the time:
commit fc76aed5333f56dd05400521a35b944a5df52ebc Author: Robbie Harwood <rharwood@redhat.com> Date: Wed Aug 17 15:03:25 2022 +0000 Fix duplicated args and cope with /etc/default/grub modification Signed-off-by: Robbie Harwood <rharwood@redhat.com>
There's no bug attached to the Fedora commit and I can't find any mailing list discussion or other rationale for this change. It's not clear what it was intending to fix.
The overall sequence of events in the RHEL10 upgrade case is:
dnf updates grub2-tools which updates the /etc/default/grub timestamp
dnf runs kernel-core %posttrans scriptlet
scriptlet runs "kernel-install add" for the RHEL10 kernel
kernel-install runs /usr/lib/kernel/install.d/20-grub.install
20-grub.install sees that /etc/default/grub is newer than /etc/kernel/cmdline and runs grub2-mkconfig
grub2-mkconfig runs /etc/grub.d/10_linux which rewrites /boot/loader/entries/*.conf
grub2-mkconfig runs os-prober
os-prober then hangs when run by grub2-mkconfig and blocks the entire upgrade process
Workaround
The problem can be worked around by updating the /etc/kernel/cmdline timestamp to some time in the future before beginning the upgrade, e.g.:
# touch --date=tomorrow /etc/kernel/cmdline
This prevents 20-grub.install from running grub2-mkconfig, avoiding the hang in os-prober and preventing the custom boot entry from being overwritten.