• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • rhel-8.6.0.z
    • None
    • Critical
    • sst_network_drivers
    • ssg_networking
    • None
    • False
    • Red Hat Enterprise Linux
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      Hot reset firmware setting and failed. This issue didn't occured on rhel8.10.

      Please provide the package NVR for which bug is seen:

      RHEL-8.6.0-updates-20231213.16

      ethtool -i ens1f0
      driver: mlx5_core
      version: 4.18.0-372.82.1.rt7.241.el8_6.x
      firmware-version: 16.35.3006 (MT_0000000080)
      expansion-rom-version: 
      bus-info: 0000:17:00.0
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: no
      supports-register-dump: no
      supports-priv-flags: yes

      1. cat /proc/cmdline 
        BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-372.82.1.rt7.241.el8_6.x86_64 root=/dev/mapper/rhel_dell-per75003-root ro intel_iommu=on ksdevice=bootif pci=realloc crashkernel=auto resume=/dev/mapper/rhel_dellper750-03-swap rd.lvm.lv=rhel_dell-per750-03/root rd.lvm.lv=rhel_dell-per750-03/swap console=ttyS0,115200n81

        How reproducible: 100%

        Steps to reproduce

      1.  mstfwreset -y -d 0000:17:00.0 reset
      Minimal reset level for device, 0000:17:00.0:3: Driver restart and PCI reset
      Continue with reset?[y/N] y
      -I- Sending Reset Command To Fw             -Done
      -I- Stopping Driver                         -Done
      -I- Resetting PCI                           -Done
      -I- Starting Driver                         -Failed
      -E- Failed to start driver! please start driver manually.
       

      dmesg log

      [   61.487145] mlx5_core 0000:17:00.0: E-Switch: cleanup
      [   65.302109] mlx5_core 0000:17:00.1: E-Switch: cleanup
      [   81.287427] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
      [   81.287431] {1}[Hardware Error]: event severity: recoverable
      [   81.287433] {1}[Hardware Error]:  Error 0, type: fatal
      [   81.287435] {1}[Hardware Error]:   section_type: PCIe error
      [   81.287436] {1}[Hardware Error]:   port_type: 4, root port
      [   81.287437] {1}[Hardware Error]:   version: 3.0
      [   81.287438] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
      [   81.287441] {1}[Hardware Error]:   device_id: 0000:16:04.0
      [   81.287442] {1}[Hardware Error]:   slot: 1
      [   81.287443] {1}[Hardware Error]:   secondary_bus: 0x17
      [   81.287444] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347c
      [   81.287446] {1}[Hardware Error]:   class_code: 000406
      [   81.287447] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0003
      [   81.287448] {1}[Hardware Error]:   aer_uncor_status: 0x00002000, aer_uncor_mask: 0x01310000
      [   81.287449] {1}[Hardware Error]:   aer_uncor_severity: 0x044ef030
      [   81.287450] {1}[Hardware Error]:   TLP Header: ffffffff ffffffff ffffffff ffffffff
      [   81.287528] pcieport 0000:16:04.0: AER: aer_status: 0x00002000, aer_mask: 0x01310000
      [   81.287531] pcieport 0000:16:04.0:    [13] FCP                    (First)
      [   81.287533] pcieport 0000:16:04.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
      [   81.287535] pcieport 0000:16:04.0: AER: aer_uncor_severity: 0x044ef030
      [   81.287538] pci 0000:17:00.0: AER: can't recover (no error_detected callback)
      [   81.287539] pci 0000:17:00.1: AER: can't recover (no error_detected callback)
      [   82.304812] pcieport 0000:16:04.0: AER: Root Port link has been reset (0)
      [   82.304842] pcieport 0000:16:04.0: AER: device recovery failed
      [   82.371161] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   82.371195] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   82.382019] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   82.400715] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   82.400986] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   82.504163] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   82.504198] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   82.514694] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   82.533360] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   82.533615] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   82.636753] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   82.636789] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   82.647120] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   82.666190] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   82.666450] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   82.769570] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   82.769605] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   82.779849] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   82.798652] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   82.798925] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   82.902052] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   82.902087] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   82.912338] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   82.931177] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   82.931429] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   83.034554] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   83.034588] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   83.044819] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   83.063535] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   83.063786] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   83.166944] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   83.166978] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   83.177197] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   83.196038] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   83.196291] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   83.299418] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   83.299453] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   83.309611] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   83.328394] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   83.328652] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   83.431797] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   83.431832] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   83.441980] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   83.460737] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   83.461011] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   83.564115] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   83.564150] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   83.574281] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   83.592942] mlx5_core 0000:17:00.0: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   83.593214] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   83.696314] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   83.696349] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   83.706521] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   83.725349] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   83.725645] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   83.828770] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   83.828817] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   83.838951] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   83.857571] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   83.857881] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   83.960985] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   83.961020] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   83.971221] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   83.990114] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   83.990409] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   84.093495] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   84.093530] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   84.103681] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   84.122382] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   84.122673] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   84.225821] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   84.225856] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   84.235987] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   84.254519] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   84.254856] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   84.357940] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   84.357975] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   84.368098] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   84.386851] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   84.387146] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   84.490440] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   84.490475] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   84.500658] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   84.519242] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   84.519538] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   84.622661] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   84.622696] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   84.632863] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   84.651507] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   84.651837] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   84.754963] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   84.754998] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   84.765103] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   84.783628] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   84.783939] mlx5_core: probe of 0000:17:00.1 failed with error -5
      [   84.887047] mlx5_core 0000:17:00.1: firmware version: 16.35.3006
      [   84.887082] mlx5_core 0000:17:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   84.897259] mlx5_core 0000:17:00.1: mlx5_function_setup:1028:(pid 5): enable hca failed
      [   84.916227] mlx5_core 0000:17:00.1: probe_one:1499:(pid 5): mlx5_init_one failed with error code -5
      [   84.916523] mlx5_core: probe of 0000:17:00.1 failed with error -5
       

      I try rollback to stock kernel but still no luck.

      [root@dell-per750-03 ~]# cat /proc/cmdline 
      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-372.85.1.el8_6.x86_64 root=/dev/mapper/rhel_dell--per750--03-root ro intel_iommu=on ksdevice=bootif pci=realloc crashkernel=auto resume=/dev/mapper/rhel_dell--per750--03-swap rd.lvm.lv=rhel_dell-per750-03/root rd.lvm.lv=rhel_dell-per750-03/swap console=ttyS0,115200n81
      [root@dell-per750-03 ~]# mstfwreset -y -d 0000:17:00.0 resetMinimal reset level for device, 0000:17:00.0:3: Driver restart and PCI reset
      Continue with reset?[y/N] y
      -I- Sending Reset Command To Fw             -Done
      -I- Stopping Driver                         -Done
      -I- Resetting PCI                           -Done
      -I- Starting Driver                         -Failed
      -E- Failed to start driver! please start driver manually.
       
      dmesg log
      [   55.961991] mlx5_core 0000:17:00.0: E-Switch: cleanup
      [   59.039966] mlx5_core 0000:17:00.1: E-Switch: cleanup
      [   75.006469] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
      [   75.014731] {1}[Hardware Error]: event severity: recoverable
      [   75.020391] {1}[Hardware Error]:  Error 0, type: fatal
      [   75.025530] {1}[Hardware Error]:   section_type: PCIe error
      [   75.031103] {1}[Hardware Error]:   port_type: 4, root port
      [   75.036587] {1}[Hardware Error]:   version: 3.0
      [   75.041120] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
      [   75.047299] {1}[Hardware Error]:   device_id: 0000:16:04.0
      [   75.052785] {1}[Hardware Error]:   slot: 1
      [   75.056886] {1}[Hardware Error]:   secondary_bus: 0x17
      [   75.062024] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347c
      [   75.068637] {1}[Hardware Error]:   class_code: 000406
      [   75.073689] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0003
      [   75.081427] {1}[Hardware Error]:   aer_uncor_status: 0x00002000, aer_uncor_mask: 0x01310000
      [   75.089773] {1}[Hardware Error]:   aer_uncor_severity: 0x044ef030
      [   75.095869] {1}[Hardware Error]:   TLP Header: ffffffff ffffffff ffffffff ffffffff
      [   75.103456] pcieport 0000:16:04.0: AER: aer_status: 0x00002000, aer_mask: 0x01310000
      [   75.111208] pcieport 0000:16:04.0:    [13] FCP                    (First)
      [   75.117999] pcieport 0000:16:04.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
      [   75.126256] pcieport 0000:16:04.0: AER: aer_uncor_severity: 0x044ef030
      [   75.132782] pci 0000:17:00.0: AER: can't recover (no error_detected callback)
      [   75.139914] pci 0000:17:00.1: AER: can't recover (no error_detected callback)
      [   76.218664] pcieport 0000:16:04.0: AER: Root Port link has been reset (0)
      [   76.225473] pcieport 0000:16:04.0: AER: device recovery failed
      [   76.276184] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   76.282231] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   76.300373] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 949): enable hca failed
      [   76.326060] mlx5_core 0000:17:00.0: probe_one:1499:(pid 949): mlx5_init_one failed with error code -5
      [   76.335552] mlx5_core: probe of 0000:17:00.0 failed with error -5
      [   76.444169] mlx5_core 0000:17:00.0: firmware version: 16.35.3006
      [   76.450208] mlx5_core 0000:17:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
      [   76.467889] mlx5_core 0000:17:00.0: mlx5_function_setup:1028:(pid 949): enable hca failed
      [   76.493764] mlx5_core 0000:17:00.0: probe_one:1499:(pid 949): mlx5_init_one failed with error code -5
      [   76.503223] mlx5_core: probe of 0000:17:00.0 failed with error -5
      

      Expected results

      reset successed

      Actual results

      reset failed
       

            atzin AMIR TZIN
            mhou@redhat.com Minxi Hou
            AMIR TZIN AMIR TZIN
            Minxi Hou Minxi Hou
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: