-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
rhel-9.5
-
None
-
No
-
Low
-
rhel-bootloader
-
ssg_core_services
-
5
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
What were you trying to do that didn't work?
When executing the following sequence, a crash occurs, due to bad handling of memory allocation failure by caller:
grub> set pager=0 grub> ls grub> connectefi pciroot grub> set debug=all grub> ls [...] (lvm/rhel-root) kern/device.c:37: opening device lvm/rhel-root kern/disk.c:196: Opening `lvm/rhel-root'... kern/disk.c:288: Opening `lvm/rhel-root' succeeded. disk/efi/efidisk.c:606: reading 0x40 sectors at the sector 0x72d000 from hd0 kern/efi/mm.c:625: grub_get_mem_attrs(0x7c87a000, ...) -> 0x18 kern/efi/mm.c:641: grub_get_mem_attrs(0x7ef6c000, ...) -> 0x18 kern/efi/mm.c:662: grub_get_mem_attrs(0x7ef6c000, ...) -> 0x18 kern/efi/mm.c:673: detected stack from 0x7ef6c000 to 0x7ef6cfff kern/efi/mm.c:625: grub_get_mem_attrs(0x7c87a000, ...) -> 0x18 kern/efi/mm.c:641: grub_get_mem_attrs(0x7ef6c000, ...) -> 0x18 kern/efi/mm.c:662: grub_get_mem_attrs(0x7ef6c000, ...) -> 0x18 kern/efi/mm.c:673: detected stack from 0x7ef6c000 to 0x7ef6cfff kern/mm.c:165: Using memory for heap: start=0x1780000, end=0x78ff5000 kern/mm.c:191: Can we extend into region above? 0x1780000 + 77875000 + 0 ?=? 0x78ff5000 kern/mm.c:196: Yes: extending a region: (0x78ff5000 -> 0x7aff5000) -> (0x1780000 -> 0x7aff5000) kern/mm.c:165: Using memory for heap: start=0x7b015000, end=0x7c68f000 kern/mm.c:191: Can we extend into region above? 0x7b015000 + 167a000 + 0 ?=? 0x1780000 kern/mm.c:240: Can we extend into region below? 0x1780000 + 40 + 79874fc0 + 0 ?=? 0x7b015000 kern/mm.c:274: No: considering a new region at 0x7b015000 of size 167a000 kern/mm.c:165: Using memory for heap: start=0x100000, end=0x80b000 kern/mm.c:191: Can we extend into region above? 0x100000 + 70b000 + 0 ?=? 0x7b015000 kern/mm.c:240: Can we extend into region below? 0x7b015000 + 40 + 1679fc0 + 0 ?=? 0x100000 kern/mm.c:191: Can we extend into region above? 0x100000 + 70b000 + 0 ?=? 0x1780000 kern/mm.c:240: Can we extend into region below? 0x1780000 + 40 + 79874fc0 + 0 ?=? 0x100000 kern/mm.c:274: No: considering a new region at 0x100000 of size 70b000 kern/mm.c:165: Using memory for heap: start=0x7ee00000, end=0x7ef4e000 kern/mm.c:191: Can we extend into region above? 0x7ee00000 + 14e000 + 0 ?=? 0x100000 kern/mm.c:240: Can we extend into region below? 0x100000 + 40 + 70afc0 + 0 ?=? 0x7ee00000 kern/mm.c:191: Can we extend into region above? 0x7ee00000 + 14e000 + 0 ?=? 0x7b015000 kern/mm.c:240: Can we extend into region below? 0x7b015000 + 40 + 1679fc0 + 0 ?=? 0x7ee00000 kern/mm.c:191: Can we extend into region above? 0x7ee00000 + 14e000 + 0 ?=? 0x1780000 kern/mm.c:240: Can we extend into region below? 0x1780000 + 40 + 79874fc0 + 0 ?=? 0x7ee00000 kern/mm.c:274: No: considering a new region at 0x7ee00000 of size 14e000 kern/efi/mm.c:176: allocate_pages(2, 1, 0x88, 0x000000007eaed000) = 0x800000000000000e !!!! X64 Exception Type - 06(#UD - Invalid Opcode) CPU Apic ID - 00000000 !!!! RIP - 000000007D522C72, CS - 0000000000000038, RFLAGS - 0000000000210297 RAX - 000000007C87353E, RCX - 000000007D523C18, RDX - 0000000064E2CE20 RBX - 000000007C8751AE, RSP - 000000007EF6C7C8, RBP - 000000007EF6C860 RSI - 000000007D523C18, RDI - 000000007D522C20 R8 - 000000000072D000, R9 - 0000000000008000, R10 - 000000007CE2CE20 R11 - 000000007EF6BF90, R12 - 0000000000000040, R13 - 000000007D42FE18 R14 - 000000007CE00728, R15 - 000000007CE00730 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 [...]
I believe the code in mm.c doesn't handle allocation failure properly, but I didn't check entirely yet. The faulty code seems to be the dereferencing of r which is NULL due to the allocation failure that just happened, when populating h on line 291:
157 /* Initialize a region starting from ADDR and whose size is SIZE,
158 to use it as free space. */
159 void
160 grub_mm_init_region (void *addr, grub_size_t size)
161 {
:
281 /* Allocate a region from the head. */
282 r = (grub_mm_region_t) ALIGN_UP ((grub_addr_t) addr, GRUB_MM_ALIGN);
283
284 /* If this region is too small, ignore it. */
285 if (size < GRUB_MM_ALIGN + (char *) r - (char *) addr + sizeof (*r))
286 return;
287
288 size -= (char *) r - (char *) addr + sizeof (*r);
289
290 h = (grub_mm_header_t) (r + 1);
291 h->next = h;
:
But actually this is not the real issue here. The real issue is that using connectefi pciroot leads to lacking memory.
For some reason the EFI function AllocatePages() returns GRUB_EFI_NOT_FOUND, despite having the EFI memory map show a proper free region of proper size:
[...] conv-mem 000000007eaed000-000000007eb74fff 00000088 544KiB UC WC WT WB [...]
The code allocates memory through using AllocateAddress type, with the memory starting at proper location:
kern/efi/mm.c:176: allocate_pages(2, 1, 0x88, 0x000000007eaed000) = 0x800000000000000e 172 b = grub_efi_system_table->boot_services; 173 status = efi_call_4 (b->allocate_pages, alloctype, memtype, pages, &ret); 174 if (status != GRUB_EFI_SUCCESS) 175 { 176 grub_dprintf ("efi", 177 "allocate_pages(%d, %d, 0x%0lx, 0x%016lx) = 0x%016lx\n", 178 alloctype, memtype, pages, address, status); 179 grub_error (GRUB_ERR_OUT_OF_MEMORY, N_("out of memory")); 180 return NULL; 181 } 491 enum grub_efi_allocate_type 492 { 493 GRUB_EFI_ALLOCATE_ANY_PAGES, 494 GRUB_EFI_ALLOCATE_MAX_ADDRESS, 495 GRUB_EFI_ALLOCATE_ADDRESS, <<<<<< 496 GRUB_EFI_MAX_ALLOCATION_TYPE 497 }; 498 typedef enum grub_efi_allocate_type grub_efi_allocate_type_t;
Weirdly, the issue only occurs if doing a ls first, then connectefi pciroot and ls.
If we skip the initial ls, this works.
This occurs despite lsefimmap showing exact same maps (see attached files as well).
This tends to indicate that executing a ls allocates some memory in the region but the EFI map is not updated to reflect this, causing the next allocate_pages() to fail.
But this is not that satisfactory because executing ls initially doesn't show any allocation of memory (because I guess it was done already while in the Grub menu).
What is the impact of this issue to you?
None at the moment, since I found this through investigating some other issue with connectefi command.
Please provide the package NVR for which the bug is seen:
grub2-efi-x64-2.06-92.el9.x86_64 (GA)
grub2-efi-x64-2.06-93.el9_5.x86_64 (Latest)
How reproducible is this bug?:
Always
Steps to reproduce
- Install a QEMU/KVM with RHEL9.5
- Update the system (or not, doesn't matter since all Grub packages fail similarly)
- At the Grub menu, get to the prompt and type
grub> ls grub> connectefi pciroot grub> ls
Expected results
Listing of partitions.
Actual results
LVM partition "(lvm/rhel-root) " then a crash