Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-58654

[RHEL-10][Azure] kdump fails to generate vmcore when reserve 192MB memory

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhel-10.0.beta
    • clevis
    • None
    • Moderate
    • sst_security_special_projects
    • ssg_security
    • 20
    • None
    • QE ack
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None

      After triggering an interrupt, system crashes then reboots. But there's no vmcore generated.
       
       

      # kdumpctl status
      kdump: Kdump is operational # kdumpctl showmem
      kdump: Reserved 192MB memory for crash kernel 
      # cat /proc/cmdline
      BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-390.el9.x86_64 root=UUID=f7fb0e03-b6c3-4b03-bf37-b6f0f7bc2ccd ro console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M 
      # systemctl status kdump
      ● kdump.service - Crash recovery kernel arming
           Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: en>
           Active: active (exited) since Wed 2023-12-06 06:38:26 UTC; 11min ago
          Process: 1334 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCC>
         Main PID: 1334 (code=exited, status=0/SUCCESS)
              CPU: 1.662s

       

      Our QE discovers that this is caused be installing too many packages with too small reserved memory. I.E. with below packages installed, and the reserved kdump crash memory equal to 192MB (by default with machine memory of total 3.5G), kdump fails to generate vmcore.

      tar net-tools bind-utils dracut-fips dracut-fips-aesni wget bc hyperv-daemons WALinuxAgent cloud-init cloud-utils-growpart gdisk git llvm clang elfutils-devel make NetworkManager-cloud-setup dhcp-client fio insights-client nvme-cli iperf3 cockpit-packagekit hyperv-tools numactl sshpass rng-tools redhat-cloud-client-configuration qemu-kvm tang cryptsetup clevis clevis-luks clevis-dracut WALinuxAgent-cvm kernel-devel kernel-modules-extra kernel-headers kernel-tools kernel-debug-core kernel-debug librdmacm-devel libcap-devel librdmacm librdmacm-utils libibverbs-utils libibverbs numactl-devel make gcc dpdk python3-devel dos2unix mdadm ninja-build meson ocaml ocaml-ocamlbuild openssl-devel cmake perl libcurl-devel protobuf-devel rpm-build createrepo yum-utils boost-devel hostname pciutils lsof 

      These packages are all crucial to our automated tests. Note that we found out it's not because any one of the above packages but all of them at once. We tried splitting them into two halves. Installing either half does not trigger the issue.

      Once configured with crashkernel=256M, vmcore is generated okay. RHEL 8.10 does not have this issue.

      Please provide the package NVR for which bug is seen:

      RHEL-9: 5.14.0-394.el9.x86_64

      RHEL-10: 

      kernel-6.11.0-0.rc5.22.el10.x86_64
      kexec-tools-2.0.29-1.el10.x86_64

      clevis-20-4.el10.x86_64

      How reproducible:

      100%

      Steps to reproduce

      1. echo c > /proc/sysrq-trigger 

      Expected results

      kdump should generate vmcore under /var/crash/.

      Actual results

      [ 2.078406] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00 
      [ 2.078408] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-394.el9.x86_64 #1 
      [ 2.078410] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 
      [ 2.078411] Call Trace: 
      [ 2.078413] <TASK> 
      [ 2.078415] dump_stack_lvl+0x34/0x48 
      [ 2.078422] panic+0xfd/0x2f7 
      [ 2.078425] do_exit.cold+0x15/0x15 
      [ 2.078427] do_group_exit+0x2d/0x90 
      [ 2.078430] __x64_sys_exit_group+0x14/0x20 
      [ 2.078432] do_syscall_64+0x5c/0x90 
      [ 2.078437] ? exc_page_fault+0x62/0x150 
      [ 2.078439] entry_SYSCALL_64_after_hwframe+0x72/0xdc 
      [ 2.078442] RIP: 0033:0x7f420a884aed 
      [ 2.078445] Code: ff ff ff ff 64 89 02 44 89 c0 c3 66 90 f3 0f 1e fa 48 8b 35 25 43 0e 00 ba e7 00 00 00 eb 07 66 0f 1f 44 00 00 f4 89 d0 0f 05 <48> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e 
      [ 2.078446] RSP: 002b:00007ffd1f85d5d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 
      [ 2.078449] RAX: ffffffffffffffda RBX: 00007f420a9659e0 RCX: 00007f420a884aed 
      [ 2.078450] RDX: 00000000000000e7 RSI: ffffffffffffff80 RDI: 000000000000007f 
      [ 2.078451] RBP: 000000000000007f R08: 0000000000000000 R09: 0000000000000028 
      [ 2.078453] R10: 00007ffd1f85d460 R11: 0000000000000246 R12: 00007f420a9659e0 
      [ 2.078454] R13: 00007f420a96af00 R14: 0000000000000001 R15: 00007f420a96aee8 
      [ 2.078456] </TASK>

       

            scorreia@redhat.com Sergio Correia
            litian@redhat.com Li Tian
            Sergio Correia Sergio Correia
            SSG Security QE SSG Security QE
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: