Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-83169

Grub crashes when doing a "ls" after a "connectefi pciroot" on QEMU/KVM

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • rhel-9.5
    • grub2
    • None
    • No
    • Low
    • rhel-bootloader
    • ssg_core_services
    • 5
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      When executing the following sequence, a crash occurs, due to bad handling of memory allocation failure by caller:

      grub> set pager=0
      grub> ls
      grub> connectefi pciroot
      grub> set debug=all
      grub> ls
      [...]
      (lvm/rhel-root) kern/device.c:37: opening device lvm/rhel-root
      kern/disk.c:196: Opening `lvm/rhel-root'...
      kern/disk.c:288: Opening `lvm/rhel-root' succeeded.
      disk/efi/efidisk.c:606: reading 0x40 sectors at the sector 0x72d000 from hd0 
      kern/efi/mm.c:625: grub_get_mem_attrs(0x7c87a000, ...) -> 0x18
      kern/efi/mm.c:641: grub_get_mem_attrs(0x7ef6c000, ...) -> 0x18
      kern/efi/mm.c:662: grub_get_mem_attrs(0x7ef6c000, ...) -> 0x18
      kern/efi/mm.c:673: detected stack from 0x7ef6c000 to 0x7ef6cfff
      kern/efi/mm.c:625: grub_get_mem_attrs(0x7c87a000, ...) -> 0x18
      kern/efi/mm.c:641: grub_get_mem_attrs(0x7ef6c000, ...) -> 0x18
      kern/efi/mm.c:662: grub_get_mem_attrs(0x7ef6c000, ...) -> 0x18
      kern/efi/mm.c:673: detected stack from 0x7ef6c000 to 0x7ef6cfff
      kern/mm.c:165: Using memory for heap: start=0x1780000, end=0x78ff5000
      kern/mm.c:191: Can we extend into region above? 0x1780000 + 77875000 + 0 ?=? 0x78ff5000
      kern/mm.c:196: Yes: extending a region: (0x78ff5000 -> 0x7aff5000) -> (0x1780000 -> 0x7aff5000)
      kern/mm.c:165: Using memory for heap: start=0x7b015000, end=0x7c68f000
      kern/mm.c:191: Can we extend into region above? 0x7b015000 + 167a000 + 0 ?=? 0x1780000
      kern/mm.c:240: Can we extend into region below? 0x1780000 + 40 + 79874fc0 + 0 ?=? 0x7b015000
      kern/mm.c:274: No: considering a new region at 0x7b015000 of size 167a000
      kern/mm.c:165: Using memory for heap: start=0x100000, end=0x80b000
      kern/mm.c:191: Can we extend into region above? 0x100000 + 70b000 + 0 ?=? 0x7b015000
      kern/mm.c:240: Can we extend into region below? 0x7b015000 + 40 + 1679fc0 + 0 ?=? 0x100000
      kern/mm.c:191: Can we extend into region above? 0x100000 + 70b000 + 0 ?=? 0x1780000
      kern/mm.c:240: Can we extend into region below? 0x1780000 + 40 + 79874fc0 + 0 ?=? 0x100000
      kern/mm.c:274: No: considering a new region at 0x100000 of size 70b000
      kern/mm.c:165: Using memory for heap: start=0x7ee00000, end=0x7ef4e000
      kern/mm.c:191: Can we extend into region above? 0x7ee00000 + 14e000 + 0 ?=? 0x100000
      kern/mm.c:240: Can we extend into region below? 0x100000 + 40 + 70afc0 + 0 ?=? 0x7ee00000
      kern/mm.c:191: Can we extend into region above? 0x7ee00000 + 14e000 + 0 ?=? 0x7b015000
      kern/mm.c:240: Can we extend into region below? 0x7b015000 + 40 + 1679fc0 + 0 ?=? 0x7ee00000
      kern/mm.c:191: Can we extend into region above? 0x7ee00000 + 14e000 + 0 ?=? 0x1780000
      kern/mm.c:240: Can we extend into region below? 0x1780000 + 40 + 79874fc0 + 0 ?=? 0x7ee00000
      kern/mm.c:274: No: considering a new region at 0x7ee00000 of size 14e000
      kern/efi/mm.c:176: allocate_pages(2, 1, 0x88, 0x000000007eaed000) = 0x800000000000000e
      !!!! X64 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000 !!!!
      RIP  - 000000007D522C72, CS  - 0000000000000038, RFLAGS - 0000000000210297
      RAX  - 000000007C87353E, RCX - 000000007D523C18, RDX - 0000000064E2CE20
      RBX  - 000000007C8751AE, RSP - 000000007EF6C7C8, RBP - 000000007EF6C860
      RSI  - 000000007D523C18, RDI - 000000007D522C20
      R8   - 000000000072D000, R9  - 0000000000008000, R10 - 000000007CE2CE20
      R11  - 000000007EF6BF90, R12 - 0000000000000040, R13 - 000000007D42FE18
      R14  - 000000007CE00728, R15 - 000000007CE00730
      DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
      [...]
      

      I believe the code in mm.c doesn't handle allocation failure properly, but I didn't check entirely yet. The faulty code seems to be the dereferencing of r which is NULL due to the allocation failure that just happened, when populating h on line 291:

      157 /* Initialize a region starting from ADDR and whose size is SIZE,
      158    to use it as free space.  */
      159 void
      160 grub_mm_init_region (void *addr, grub_size_t size)
      161 {
       :
      281   /* Allocate a region from the head.  */
      282   r = (grub_mm_region_t) ALIGN_UP ((grub_addr_t) addr, GRUB_MM_ALIGN);
      283 
      284   /* If this region is too small, ignore it.  */
      285   if (size < GRUB_MM_ALIGN + (char *) r - (char *) addr + sizeof (*r))
      286     return;
      287 
      288   size -= (char *) r - (char *) addr + sizeof (*r);
      289 
      290   h = (grub_mm_header_t) (r + 1);
      291   h->next = h;
       :
      

      But actually this is not the real issue here. The real issue is that using connectefi pciroot leads to lacking memory.
      For some reason the EFI function AllocatePages() returns GRUB_EFI_NOT_FOUND, despite having the EFI memory map show a proper free region of proper size:

      [...]
      conv-mem  000000007eaed000-000000007eb74fff 00000088    544KiB UC WC WT WB
      [...]
      

      The code allocates memory through using AllocateAddress type, with the memory starting at proper location:

      kern/efi/mm.c:176: allocate_pages(2, 1, 0x88, 0x000000007eaed000) = 0x800000000000000e
      
      172   b = grub_efi_system_table->boot_services;
      173   status = efi_call_4 (b->allocate_pages, alloctype, memtype, pages, &ret);
      174   if (status != GRUB_EFI_SUCCESS)
      175     {
      176       grub_dprintf ("efi",
      177                     "allocate_pages(%d, %d, 0x%0lx, 0x%016lx) = 0x%016lx\n",
      178                     alloctype, memtype, pages, address, status);
      179       grub_error (GRUB_ERR_OUT_OF_MEMORY, N_("out of memory"));
      180       return NULL;
      181     }
      
       491 enum grub_efi_allocate_type
       492   {
       493     GRUB_EFI_ALLOCATE_ANY_PAGES,
       494     GRUB_EFI_ALLOCATE_MAX_ADDRESS,
       495     GRUB_EFI_ALLOCATE_ADDRESS,       <<<<<<
       496     GRUB_EFI_MAX_ALLOCATION_TYPE
       497   };
       498 typedef enum grub_efi_allocate_type grub_efi_allocate_type_t;
      

      Weirdly, the issue only occurs if doing a ls first, then connectefi pciroot and ls.
      If we skip the initial ls, this works.
      This occurs despite lsefimmap showing exact same maps (see attached files as well).
      This tends to indicate that executing a ls allocates some memory in the region but the EFI map is not updated to reflect this, causing the next allocate_pages() to fail.
      But this is not that satisfactory because executing ls initially doesn't show any allocation of memory (because I guess it was done already while in the Grub menu).

      What is the impact of this issue to you?

      None at the moment, since I found this through investigating some other issue with connectefi command.

      Please provide the package NVR for which the bug is seen:

      grub2-efi-x64-2.06-92.el9.x86_64 (GA)
      grub2-efi-x64-2.06-93.el9_5.x86_64 (Latest)

      How reproducible is this bug?:

      Always

      Steps to reproduce

      1. Install a QEMU/KVM with RHEL9.5
      2. Update the system (or not, doesn't matter since all Grub packages fail similarly)
      3. At the Grub menu, get to the prompt and type
        grub> ls
        grub> connectefi pciroot
        grub> ls
        

      Expected results

      Listing of partitions.

      Actual results

      LVM partition "(lvm/rhel-root) " then a crash

              bootloader-eng-team bootloader -eng-team
              rhn-support-rmetrich Renaud Métrich
              bootloader -eng-team bootloader -eng-team
              Release Test Team Release Test Team
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: