Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14140

4.11.z: [Clone of OCPBugs-8287] Power cycle node and MAC address of NIC not available when VDU application starts on Intel E810-C Nic

    XMLWordPrintable

Details

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Copy from Release Note text in OCPBUGS-2180.
    • Hide
      5/15: Copy from status of https://issues.redhat.com/browse/OCPBUGS-2180
      3/13: 4.10 copy from OCPBUGSM-47835 / OCPBUGS-9898
      Show
      5/15: Copy from status of https://issues.redhat.com/browse/OCPBUGS-2180 3/13: 4.10 copy from OCPBUGSM-47835 / OCPBUGS-9898

    Description

      Description of problem:


      Bug to backport the bugfix in BZ-2117049 to OpenShift 4.10 as it is required from Ericsson.


      Description of problem:
      The VDU application is deployed and carriers on cell are enabled on SNO.
      A cold boot or power cycle occurs. The platform and VDU application start up at the same time. The VDU application fails to start completely because the MAC address of the VFs on the E-810C NIC's MAC are not available when the VDU application starts.

      During VDU application start-up, the baseband pod uses the rft_dpdk_getport utility to query the MAC address of the llscu VF. If the MAC address is not available, there is a core dump.

      This used to work before the kernel updates picked up new content from Intel in 4.9.37/4.10.17.

      Application pod state and core dump:

      eric-ran-du-baseband-bf6669bd-ksjjv 4/5 CrashLoopBackOff 18 (66s ago) 11h

      core.rft_dpdk_getpor.0.ee6850a4002649698f2770c8080b90d1.84295.1659446944000000.lz4

      Version-Release number of selected component (if applicable):
      SNO clusters v4.9.37 or 4.10.24

      How reproducible:
      Reproducible within customer environment

      Actual results:
      The Baseband pod is in CrashLoopBackOff

      Expected results:
      The baseband pods should spin up without failing or causing delay

      Additional info:
      There was a related case 03089320 which was closed in February with a mitigation fix in the February 15th version of the SR-IOV operator. The real fix from Intel was not available at the time.

      03089320 – SNO: After reboot node, application pods stuck in CreateContainerConfigError state - endpoint not found openshift.io/pci_sriov_net_*

      Version-Release number of selected component (if applicable):

      
      

      How reproducible:

      
      

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      
      

      Expected results:

      
      

      Additional info:

      
      

      Attachments

        Issue Links

          Activity

            People

              swsehgal@redhat.com Swati Sehgal
              rhn-support-igarciam Ignacio Garcia Medina
              Shereen Haj Shereen Haj
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: