Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-28776

Network Packet drop observed on workers with SR-IOV NIC `Intel(R) Ethernet 10G 2P X710`

XMLWordPrintable

    • Critical
    • No
    • 5
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Customer is observing packet drops on workers with Intel(R) Ethernet 10G 2P X710 network card. The packet drops are observed when a submariner gateway pod with vxlan interface is active on the worker.
      No packet drops are observed on the worker with Mellanox cards.

      On troubleshooting we found below logs in dmesg and we suspect the below might be a cause of the issue.:

      "i40e 0000:b1:00.1: The driver for the device detected a newer version of the NVM image v1.15 than expected v1.9. Please install the most recent version of the network driver."
      
      • Checking output from `omantel-lspci.txt` shows the NIC being used is an embedded NIC on Dell but an Intel X710 NIC and this is a supported SR-IOV NIC but we are facing packet loss.
      • Customer was having query, about whether the drivers need to be updated please share the procedure for the same. For this, we asked them to reach out to the vendor and the vendor(Dell) informed them:
       Advised to upgrade the OS on one of the hosts to a supported version, for instance to RHEL 8.9. [your current kernel is 8.6 ].
      Intel firmware should be at latest [Family 22.5.0 Vers 22.5.7], and os driver would be the one built in kernel RHEL 8.9.
      

       

      • Upgrading to OCPv4.13 is not an option for them due to timescales as their application is incompatible with OCPv4.13. Hence, they are looking to install the latest version of the driver that is compatible with OpenShift 4.12.28 using the Kernel Module Operator.  This is an attempt, simultaneously we are doing to resolve the issue of packet drop. However,  I am not sure if that will suffice and hence raised a bug as well since we are observing packet loss on SR-IOV x710 NIC which is a pretty stable NIC.

      Version-Release number of selected component (if applicable):

      4.12.28    

      How reproducible:

       

      Steps to Reproduce:

         
          

      Actual results:

      Customer is observing "packet loss" issues with all the servers with Intel(R) Ethernet 10G 2P X710 network card and no packet drops is observed with Mellanox network cards.

      Expected results:

          Packet loss should not happen as x710 NIC is a very stable SR-IOV NIC and we should not be having any packet loss on this NIC.

      Additional info:

          

            rhn-engineering-ivecera Ivan Vecera
            rhn-support-mmarkand Mridul Markandey
            Zhanqi Zhao Zhanqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: