Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-25773

RHEL 9.4 NVMe/FC host had nvme process crash due to a segmentation fault during path recovery

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • rhel-9.4.z
    • rhel-9.4
    • nvme-cli
    • None
    • None
    • rhel-sst-storage-io
    • ssg_filesystems_storage_and_HA
    • 4
    • Dev ack
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      While running a pre-release version RHEL 9.4 (kernel-5.14.0-408.el9.x86_64, nvme-cli-2.6-4.el9.x86_64) I had an nvme process crash due to a segmentation fault during path recovery resulting in that path never returning:

      ----------------------------------------------------------------------------------------------------------------------------------------

      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com systemd[1]: Started NVMf auto-connect scan upon nvme discovery controller Events.
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2035d039ea3ef43c - nn-0x200000109b5828d8:pn-0x100000109b5828d8 combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2045d039ea3ef43c - nn-0x200000109b5828d8:pn-0x100000109b5828d8 combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2065d039ea3ef43c - nn-0x200000109b5828d8:pn-0x100000109b5828d8 combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2075d039ea3ef43c - nn-0x200000109b5828d8:pn-0x100000109b5828d8 combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2085d039ea3ef43c - nn-0x200000109b5828d8:pn-0x100000109b5828d8 combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme7: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme[821866]: segfault at fffffffffffffffb ip 00007faf92cb19be sp 00007ffe4504e670 error 5 in libc.so.6[7faf92c28000+175000] likely on CPU 49 (core 1, socket 1)
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: Code: f9 ff 48 89 fd e9 b7 fd ff ff 66 90 f3 0f 1e fa 48 85 ff 0f 84 9b 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 42 d4 14 00 <48> 8b 47 f8 64 8b 2b a8 02 75 37 48 8b 15 c8 d3 14 00 64 48 83 3a
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com systemd[1]: Started NVMf auto-connect scan upon nvme discovery controller Events.
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com systemd[1]: nvmf-connect@-device\x3dnone\ttransport\x3dfc\ttraddr\x3dnn-0x2004d039ea3ef43c:pn-0x2054d039ea3ef43c\ttrsvcid\x3dnone\t-host-traddr\x3dnn-0x200000109bce655b:pn-0x100000109bce655b.service: Deactivated successfully.
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com systemd[1]: nvmf-connect@-device\x3dnone\ttransport\x3dfc\ttraddr\x3dnn-0x2004d039ea3ef43c:pn-0x2054d039ea3ef43c\ttrsvcid\x3dnone\t-host-traddr\x3dnn-0x200000109b5828d8:pn-0x100000109b5828d8.service: Deactivated successfully.
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme5: NVME-FC{4}: create association : host wwpn 0x100000109bce655a  rport wwpn 0x2014d039ea3ef43c: NQN "nqn.2014-08.org.nvmexpress.discovery"
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme5: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme5: NVME-FC{4}: controller connect complete
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme5: NVME-FC{4}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme7: NVME-FC{6}: create association : host wwpn 0x100000109bce655a  rport wwpn 0x2014d039ea3ef43c: NQN "nqn.1992-08.com.netapp:6000.6d039ea0003ef43c00000000587f2b7c"
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme7: NVME-FC{6}: controller connect complete
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme7: NVME-FC{6}: new ctrl: NQN "nqn.1992-08.com.netapp:6000.6d039ea0003ef43c00000000587f2b7c"
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2024d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2034d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2044d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2054d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2064d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2074d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2084d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2025d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2035d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2045d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2055d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2065d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2075d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x2004d039ea3ef43c:pn-0x2085d039ea3ef43c - nn-0x200000109bce655a:pn-0x100000109bce655a combination not found
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com kernel: nvme nvme5: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com systemd[1]: Created slice Slice /system/systemd-coredump.
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com systemd[1]: Started Process Core Dump (PID 821884/UID 0).
      Feb 14 23:37:55 ictam07s01h01.ict.englab.netapp.com systemd[1]: nvmf-connect@-device\x3dnone\ttransport\x3dfc\ttraddr\x3dnn-0x2004d039ea3ef43c:pn-0x2014d039ea3ef43c\ttrsvcid\x3dnone\t-host-traddr\x3dnn-0x200000109bce655a:pn-0x100000109bce655a.service: Deactivated successfully.
      Feb 14 23:37:57 ictam07s01h01.ict.englab.netapp.com systemd-coredump[821926]: Process 821866 (nvme) of user 0 dumped core.

                                                                                    Stack trace of thread 821866:
                                                                                    #0  0x00007faf92cb19be free (libc.so.6 + 0xb19be)
                                                                                    #1  0x00007faf92f636f6 nvme_ctrl_scan_namespaces.isra.0 (libnvme.so.1 + 0x1c6f6)
                                                                                    #2  0x00007faf92f642fd nvme_scan_ctrl (libnvme.so.1 + 0x1d2fd)
                                                                                    #3  0x00007faf92f5d046 nvme_scan_topology (libnvme.so.1 + 0x16046)
                                                                                    #4  0x000055fd7759d620 nvmf_discover (nvme + 0x17620)
                                                                                    #5  0x000055fd775e1314 handle_plugin (nvme + 0x5b314)
                                                                                    #6  0x000055fd7759a9da main (nvme + 0x149da)
                                                                                    #7  0x00007faf92c3feb0 __libc_start_call_main (libc.so.6 + 0x3feb0)
                                                                                    #8  0x00007faf92c3ff60 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x3ff60)
                                                                                    #9  0x000055fd7759ab65 _start (nvme + 0x14b65)
                                                                                    ELF object binary architecture: AMD x86-64
      Feb 14 23:37:57 ictam07s01h01.ict.englab.netapp.com systemd[1]: nvmf-connect@-device\x3dnone\ttransport\x3dfc\ttraddr\x3dnn-0x2004d039ea3ef43c:pn-0x2014d039ea3ef43c\ttrsvcid\x3dnone\t-host-traddr\x3dnn-0x200000109b5828d7:pn-0x100000109b5828d7.service: Main process exited, code=dumped, status=11/SEGV
      Feb 14 23:37:57 ictam07s01h01.ict.englab.netapp.com systemd[1]: nvmf-connect@-device\x3dnone\ttransport\x3dfc\ttraddr\x3dnn-0x2004d039ea3ef43c:pn-0x2014d039ea3ef43c\ttrsvcid\x3dnone\t-host-traddr\x3dnn-0x200000109b5828d7:pn-0x100000109b5828d7.service: Failed with result 'core-dump'.
      Feb 14 23:37:57 ictam07s01h01.ict.englab.netapp.com systemd[1]: systemd-coredump@0-821884-0.service: Deactivated successfully.

      ----------------------------------------------------------------------------------------------------------------------------------------

      This was while connected to a NetApp E-Series array using NVMe/FC. The array was undergoing a testcase in which its Controller Firmware was upgraded and then downgraded over and over again for 12 hours straight. This involves rebooting one redundant controller and then the other and causes path losses to that controller when it reboots and paths recovering as it boots back up with the new firmware. It was on one of these boots that the nvme process crashed.

      A systemd coredump was collected and has been attached to this bug.

      This particular host was using a Broadcom LPe32002 using 14.2.455.11 firmware running through a Cisco Switch for this NVMe path.

      Please provide the package NVR for which bug is seen:

      kernel-5.14.0-408.el9.x86_64

      nvme-cli-2.6-4.el9.x86_64

      How reproducible:

      Unclear. It's only been hit once so far.

      Steps to reproduce

      1. Install latest RHEL 9.4 nightly
      2. Connect to NetApp E-Series array using NVMe/FC
      3. Reboot E-Series controllers until a path fails to return properly

      Expected results

      All paths return successfully

      Actual results

      One path fails to recover due to the owning NVMe process failing.

              mlombard@redhat.com Maurizio Lombardi
              cxskaggs Clayton Skaggs
              Clayton Skaggs
              NetApp Confidential Group
              Maurizio Lombardi Maurizio Lombardi
              Marco Patalano Marco Patalano
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: