Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-6076

[RHEL-9.3] BUG: kernel NULL pointer dereference, address: 0000000000000004

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • rhel-9.3.0
    • openmpi
    • None
    • None
    • rhel-net-drivers
    • ssg_networking
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:

      During RDMA OPENMPI testing, the RDMA server side console reported the following traceback and crashed.

      [ 5619.703370] BUG: kernel NULL pointer dereference, address: 0000000000000004
      [ 5619.704044] #PF: supervisor read access in kernel mode
      [ 5619.704649] #PF: error_code(0x0000) - not-present page
      [ 5619.705273] PGD 0 P4D 0
      [ 5619.705758] Oops: 0000 1 PREEMPT SMP PTI
      [ 5619.706331] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G I ------- — 5.14.0-316.el9.x86_64 #1
      [ 5619.707327] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
      [ 5619.708129] RIP: 0010:fwevtq_handler+0x3d/0x140 [cxgb4]
      [ 5619.708889] Code: 3c a5 0f 85 9f 00 00 00 48 8b 8a e8 01 00 00 41 8b 00 0f c8 25 ff ff 01 00 2b 81 18 84 00 00 48 8b 89 28 84 00 00 48 8b 34 c1 <8b> 46 04 48 83 46 20 01 85 c0 74 1a 48 8d be 90 00 00 00 f0 48 0f
      [ 5619.711309] RSP: 0018:ffffb39f0038cd58 EFLAGS: 00010202
      [ 5619.712167] RAX: 00000000000003c2 RBX: ffff9ae1d6307ec0 RCX: ffff9ae1c1530000
      [ 5619.713548] RDX: ffff9ae1d6307ec0 RSI: 0000000000000000 RDI: ffff9ae1d6307ec0
      [ 5619.714939] RBP: 0000000000000040 R08: ffff9ae1d6a100c8 R09: a096aa9ce0080af0
      [ 5619.716375] R10: 0000000000000008 R11: ffffffff9aa060c0 R12: 0000000000000000
      [ 5619.717817] R13: ffff9ae1d6300000 R14: ffff9ae1d6a100f0 R15: ffff9ae1d6307ec0
      [ 5619.719298] FS: 0000000000000000(0000) GS:ffff9ae8aed00000(0000) knlGS:0000000000000000
      [ 5619.720422] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5619.721848] CR2: 0000000000000004 CR3: 00000001484d8006 CR4: 00000000001706e0
      [ 5619.723406] Call Trace:
      [ 5619.724285] <IRQ>
      [ 5619.725506] process_responses+0x3bd/0x4c0 [cxgb4]
      [ 5619.726933] ? ip_list_rcv+0x135/0x160
      [ 5619.727921] ? __netif_receive_skb_list_core+0x29f/0x2c0
      [ 5619.729009] ? netif_receive_skb_list_internal+0x1e4/0x300
      [ 5619.730085] napi_rx_handler+0x13/0x100 [cxgb4]
      [ 5619.731572] __napi_poll+0x2a/0x170
      [ 5619.732932] net_rx_action+0x233/0x2f0
      [ 5619.733986] __do_softirq+0xca/0x2ac
      [ 5619.735036] __irq_exit_rcu+0xb9/0xf0
      [ 5619.736078] common_interrupt+0x80/0xa0
      [ 5619.737112] </IRQ>
      [ 5619.738389] <TASK>
      [ 5619.767379] rupt+0x22/0x40
      [ 5619.839853] RIP: 0010:cpuidle_enter_state+0xd2/0x400
      [ 5619.841030] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 69 56 8b ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 11 03 00 00 31 ff e8 22 90 91 ff fb 45 85 f6 <0f> 88 15 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d 04 82 49
      [ 5619.844253] RSP: 0018:ffffb39f000dbe80 EFLAGS: 00000202
      [ 5619.845495] RAX: ffff9ae8aed30bc0 RBX: 0000000000000001 RCX: 000000000000001f
      [ 5619.847253] RDX: 0000000000000000 RSI: 0000000025bb8b00 RDI: 0000000000000000
      [ 5619.849049] RBP: ffff9ae1c1431800 R08: 0000051c706c31af R09: 0000000000000018
      [ 5619.850837] R10: 0000000000000082 R11: 00000000000000c6 R12: ffffffff9aec34c0
      [ 5619.852659] R13: 0000051c706c31af R14: 0000000000000001 R15: 0000000000000000
      [ 5619.854552] cpuidle_enter+0x29/0x40
      [ 5619.855843] cpuidle_idle_call+0xfa/0x160
      [ 5619.857154] do_idle+0x78/0xe0
      [ 5619.858778] cpu_startup_entry+0x19/0x20
      [ 5619.860106] start_secondary+0x10d/0x130
      [ 5619.861421] secondary_startup_64_no_verify+0xe5/0xeb
      [ 5619.862768] </TASK>
      [ 5619.864229] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi 8021q garp mrp stp llc rfkill sunrpc ext4 mbcache mi_msghandler gpio_ich lpc_ich acpi_power_meter ie31200_edac drm_shmem_helper drm_kms_helper rapl syscopyarea intel_cstate sysfillrect sysimgblt intel_uncore fb_sys_fops drm fuse xfs libcrc32c csiostor sd_mod t10_pi sg ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata cxgb4 ghash_clmulni_intel tg3 serio_raw hpwdt tls scsi_transport_fc
      [ 5620.375027] CR2: 0000000000000004
      [ 0.000000] Linux version 5.14.0-316.el9.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-39.el9) #1 SMP PREEMPT_DYNAMIC Fri May 19 13:18:40 EDT 2023
      [ 0.000000] The list of certified hardware and cloud instances for Red Hat Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com.
      [ 0.000000] Command line: elfcorehdr=0xdd000000 BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-316.el9.x86_64 ro intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 console=tty0 rd_NO_PLYMOUTH resume=UUID=69aba99a-5910-4b3d-9ed6-28c216879007 console=ttyS1,115200n81 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0 disable_cpu_apicid=0 hpwdt.pretimeout=0 hpwdt.kdumptimeout=0
      [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
      [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
      [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
      [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
      [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
      [ 0.000000] signal: max sigframe size: 1776
      [ 0.000000] BIOS-provided physical RAM map:
      [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
      [ 0.000000] BIOS-e820: [mem 0x0000000000001000-0x00000000000997ff] usable
      [ 0.000000] BIOS-e820: [mem 0x0000000000099800-0x0000000000099bff] reserved
      [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
      [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
      [ 0.000000] BIOS-e820: [mem 0x00000000dd001000-0x00000000ecffffff] usable
      [ 0.000000] BIOS-e820: [mem 0x0000e4000-0x00000000eddedfff] ACPI data
      [ 0.000000] BIOS-e820: [mem 0x00000000eddee00-0x00000000f3ffffff] reserved
      [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fee0ffff] reserved
      [ 0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] reserved
      [ 0.000000] NX (Execute Disable) protection: active
      [ 0.000000] SMBIOS 2.7 present.
      [ 0.000000] DMI: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
      [ 0.000000] tsc: Fast TSC calibration using PIT
      [ 0.000000] tsc: Detected 3392.277 MHz processor

      Please, refer to the following console log:

      http://lab-02.hosts.prod.upshift.rdu2.redhat.com/beaker/logs/recipes/13968+/13968366/console.log

      RDMA OPENMPI test beaker job ID:
      https://beaker.engineering.redhat.com/jobs/7890428 [ RS:11794138 / R:13968366 ]

      Version-Release number of selected component (if applicable):

      Clients: rdma-perf-06
      Servers: rdma-dev-13

      DISTRO=RHEL-9.3.0-20230521.45

      + [23-05-25 12:02:40] cat /etc/redhat-release
      Red Hat Enterprise Linux release 9.3 Beta (Plow)

      + [23-05-25 12:02:40] uname -a
      Linux rdma-dev-13.rdma.lab.eng.rdu2.redhat.com 5.14.0-316.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri May 19 13:18:40 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

      + [23-05-25 12:02:40] cat /proc/cmdline
      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-316.el9.x86_64 root=UUID=7c453c45-2eaa-4f0f-afbb-79a6d9e70ca3 ro intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 console=tty0 rd_NO_PLYMOUTH crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=69aba99a-5910-4b3d-9ed6-28c216879007 console=ttyS1,115200n81

      + [23-05-25 12:02:40] rpm -q rdma-core linux-firmware
      rdma-core-44.0-2.el9.x86_64
      linux-firmware-20230404-134.el9.noarch

      + [23-05-25 12:02:40] tail /sys/class/infiniband/cxgb4_0/fw_ver
      1.27.1.0

      + [23-05-25 12:02:40] lspci
      + [23-05-25 12:02:40] grep -i -e ethernet -e infiniband -e omni -e ConnectX
      03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5717 Gigabit Ethernet PCIe (rev 10)
      03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5717 Gigabit Ethernet PCIe (rev 10)
      05:00.0 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
      05:00.1 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
      05:00.2 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
      05:00.3 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
      05:00.4 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller

      How reproducible:
      So far, only once...

      Steps to Reproduce:
      1. Refer to the above beaker job for OPENMPI tests
      2.
      3.

      Actual results:

      Expected results:

      Additional info:

              kheib Kamal Heib
              bchae Brian Chae (Inactive)
              infiniband-qe infiniband-qe infiniband-qe infiniband-qe
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: