Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55144

RHEL 8 aleph aarch64 RHCOS will fail to boot 4.19 (RHEL 9.6) without bootloader update

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.19
    • RHCOS
    • Quality / Stability / Reliability
    • True
    • Hide

      None

      Show
      None
    • 5
    • None
    • None
    • None
    • Proposed
    • None
    • Done
    • Bug Fix
    • Hide
      * Previously, the GRUB bootloader was not automatically updated on {op-system} nodes. As a result, when nodes were created on {op-system-base} 8 and were subsequently updated to {op-system-base}, GRUB could not load the kernel as it uses a format that is not supported by older GRUB versions. With this release, a GRUB bootloader update is forced on nodes during updates to {product-title} 4.18 so that the issue does not occur on {product-title} {product-version}. (link:https://issues.redhat.com/browse/OCPBUGS-55144[OCPBUGS-55144])
      Show
      * Previously, the GRUB bootloader was not automatically updated on {op-system} nodes. As a result, when nodes were created on {op-system-base} 8 and were subsequently updated to {op-system-base}, GRUB could not load the kernel as it uses a format that is not supported by older GRUB versions. With this release, a GRUB bootloader update is forced on nodes during updates to {product-title} 4.18 so that the issue does not occur on {product-title} {product-version}. (link: https://issues.redhat.com/browse/OCPBUGS-55144 [ OCPBUGS-55144 ])
    • None
    • None
    • None
    • None

      I have a blocker for 4.19 (RHEL 9.6) where aarch64 nodes started all the way back on RHEL8 (i.e. 4.11/4.12) will fail to boot after upgrading. This is because they changed the format of the aarch64 kernel file in RHEL9.6 and the old bootloader from the initial install (which we don't automatically update) isn't able to handle it.

      This is something that we dealt with upstream in Fedora CoreOS way back when the change happened in the upstream kernel so fortunately we know how to deal with this (or at least we know how we dealt with it for FCOS).

      This came to light when I was brought in on a thread where @Jeff Young had questions about the different format of the kernel file and it triggered my memory on this and I asked him to perform a test which ended up in proving the bug exists and upgraded systems won't boot.

      The resulting GRUB prompt just looks like:

      error: ../../grub-core/loader/arm64/linux.c:58:invalid magic number.
      error: ../../grub-core/loader/arm64/linux.c:278:you need to load the kernel
      first.  
              
      Press any key to continue...
      

      Upstream we were able to automatically upgrade systems by running a bootloader update as part of the previous release before moving them to the release where they would have trouble.

              Unassigned Unassigned
              rhn-gps-dmabe Dusty Mabe
              None
              None
              Michael Nguyen Michael Nguyen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: