Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-8063

NVMe controllers are not reconnecting for 9mins or more during an initiator outage test

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: Generate New Ti...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • rhel-8.6.0
    • nvme-cli
    • None
    • Important
    • rhel-storage-io-2
    • ssg_filesystems_storage_and_HA
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:
      After disrupting the initiator link for two minutes, the NVME controllers and namespaces are not getting recovered on a timely basis after the lpfc driver successfully recovers the FC logins and successfully reregisters the remote ports. The recovery of NVME controllers and namespaces can take between 9 minutes to 61 minutes.

      Version-Release number of selected component (if applicable):
      Issue is seen on:

      Linux dhcp-10-231-139-179 4.18.0-372.9.1.el8.x86_64 #1 SMP Fri Apr 15 22:12:19 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

      [root@dhcp-10-231-139-179 ~]# cat /etc/os-release
      NAME="Red Hat Enterprise Linux"
      VERSION="8.6 (Ootpa)"
      ID="rhel"
      ID_LIKE="fedora"
      VERSION_ID="8.6"
      PLATFORM_ID="platform:el8"
      PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"

      How reproducible:
      Always. Time to reproduction is about 10 minutes

      Steps to Reproduce:
      1. Map few SCSI and NVMe NS from a target to both the HBA ports. Enable
      multipath and NVMe ANA to detect the multipath devices. The IBM9500 target is in use for ECD, but any target capable of FCP and NVME will do.

      Zone config:

      Zone1: HBA Port0 + SCSI Tgt Port0 + NVMe Tgt Port0
      Zone2: HBA Port1 + SCSI Tgt Port0 + NVMe Tgt Port0

      [root@dhcp-10-231-133-36 ~]# nvme list-subsys
      nvme-subsys0 - NQN=nqn.1986-03.com.ibm:nvme:2145.00000204E0607C1E
      \
      +- nvme0 fc traddr=nn-0x5005076813003e0f:pn-0x50050768131b3e0f
      host_traddr=nn-0x200000109bf67eba:pn-0x100000109bf67eba live
      +- nvme1 fc traddr=nn-0x5005076813003e0f:pn-0x50050768131b3e0f
      host_traddr=nn-0x200000109bf67ebb:pn-0x100000109bf67ebb live

      2. Do a port shut from Cisco64G switch. Enable the port after a sleep of 120
      secs. Again, this is not a switch issue so any vendor should be OK.

      3. SCSI luns got detected, but NVMe controllers did not detected even after
      waiting for more than ~10 minutes.

      Actual results:
      NVME paths do not show up for long periods of time.

      Expected results:
      SCSI and NVME pathing should recover in a reasonable amount of time.

      Additional info:

              mlombard@redhat.com Maurizio Lombardi
              paely Paul Ely
              Broadcom ECD Confidential Group
              Maurizio Lombardi Maurizio Lombardi
              Yi Zhang Yi Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: