Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54594

RHEL 8 aleph aarch64 RHCOS will fail to boot 4.19 (RHEL 9.6) without bootloader update

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.19
    • RHCOS
    • None
    • Proposed
    • Integration & Delivery - 269, CoreOS West - 270
    • 2
    • Proposed
    • Bug Fix
    • Hide
      Cause: The GRUB bootloader is not yet automatically updated on RHCOS nodes.

      Consequence: When nodes born in RHEL 8 (OCP 4.11/4.12) update to RHEL 9.6 (OCP 4.19), GRUB will not be able to load the kernel as it uses a format that is not supported by older GRUB versions.

      Fix: Force a bootloader update on boot for 4.18 nodes.

      Result: The GRUB bootloader is updated in 4.18 before the update to 4.19, thus making sure that the node will boot after the update.
      Show
      Cause: The GRUB bootloader is not yet automatically updated on RHCOS nodes. Consequence: When nodes born in RHEL 8 (OCP 4.11/4.12) update to RHEL 9.6 (OCP 4.19), GRUB will not be able to load the kernel as it uses a format that is not supported by older GRUB versions. Fix: Force a bootloader update on boot for 4.18 nodes. Result: The GRUB bootloader is updated in 4.18 before the update to 4.19, thus making sure that the node will boot after the update.
    • None
    • None
    • None
    • None

      I have a blocker for 4.19 (RHEL 9.6) where aarch64 nodes started all the way back on RHEL8 (i.e. 4.11/4.12) will fail to boot after upgrading. This is because they changed the format of the aarch64 kernel file in RHEL9.6 and the old bootloader from the initial install (which we don't automatically update) isn't able to handle it.

      This is something that we dealt with upstream in Fedora CoreOS way back when the change happened in the upstream kernel so fortunately we know how to deal with this (or at least we know how we dealt with it for FCOS).

      This came to light when I was brought in on a thread where @Jeff Young had questions about the different format of the kernel file and it triggered my memory on this and I asked him to perform a test which ended up in proving the bug exists and upgraded systems won't boot.

      The resulting GRUB prompt just looks like:

      error: ../../grub-core/loader/arm64/linux.c:58:invalid magic number.
      error: ../../grub-core/loader/arm64/linux.c:278:you need to load the kernel
      first.  
              
      Press any key to continue...
      

      Upstream we were able to automatically upgrade systems by running a bootloader update as part of the previous release before moving them to the release where they would have trouble.

              rhn-gps-dmabe Dusty Mabe
              rhn-gps-dmabe Dusty Mabe
              None
              None
              Huijing Hei Huijing Hei
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: