Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14591

bf2-switch-mode.sh fails if multiple devices are detected

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done-Errata
    • Major
    • None
    • 4.14.0
    • Networking / SR-IOV
    • None
    • No
    • 3
    • NHE Sprint 237
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      In the event a user has multiple devices when attempting to switch a bf2 to nic mode, bf2-switch-mode will return terminate with exit code 120 without applying changes. This will generate the misleading error message "Device is not a Bluefield2"

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Everytime

      Steps to Reproduce:

      1. Create cluster with multiple sriov capable devices
      2. Attempt to switch BF2 to NIC mode using /bindata/scripts/bf2-switch-mode.sh
      3. 
      

      Actual results:

      May 31 01:59:05 helix10.lab.eng.tlv2.redhat.com bash[8162]: Found device: 0000:5e:00.0
      May 31 01:59:05 helix10.lab.eng.tlv2.redhat.com bash[8162]: 0000:d8:00.0
      May 31 01:59:05 helix10.lab.eng.tlv2.redhat.com bash[8162]: Device is not a Bluefield2
      May 31 01:59:05 helix10.lab.eng.tlv2.redhat.com bash[8162]: time="2023-05-31T01:59:05Z" level=fatal msg="execing command in container: command terminated with exit code 120"
      May 31 01:59:05 helix10.lab.eng.tlv2.redhat.com sudo[7656]: pam_unix(sudo:session): session closed for user root
      May 31 01:59:05 helix10.lab.eng.tlv2.redhat.com systemd[1]: Finished Switch BlueField2 card to NIC/DPU mode.

      Expected results:

      Jun 05 09:44:59 worker-226 bash[7668]: Switching to NIC mode.
      Jun 05 09:44:59 worker-226 kubenswrapper[5229]: I0605 09:44:59.303114    5229 kubelet.go:2457] "SyncLoop (PLEG): event for pod" pod="openshift-multus>
      Jun 05 09:44:59 worker-226 bash[7668]: Device #1:
      Jun 05 09:44:59 worker-226 bash[7668]: ----------
      Jun 05 09:44:59 worker-226 bash[7668]: Device type:    BlueField2
      Jun 05 09:44:59 worker-226 bash[7668]: Name:           MBF2H332A-AEEO_Ax_Bx
      Jun 05 09:44:59 worker-226 bash[7668]: Description:    BlueField-2 P-Series DPU 25GbE Dual-Port SFP56; PCIe Gen4 x8; Crypto Enabled; 16GB on-board DD>
      Jun 05 09:44:59 worker-226 bash[7668]: Device:         0000:ca:00.0
      Jun 05 09:44:59 worker-226 bash[7668]: Configurations:                                      Next Boot       New
      Jun 05 09:44:59 worker-226 bash[7668]:          INTERNAL_CPU_MODEL                          EMBEDDED_CPU(1) EMBEDDED_CPU(1)
      Jun 05 09:44:59 worker-226 bash[7668]:          INTERNAL_CPU_PAGE_SUPPLIER                  ECPF(0)         EXT_HOST_PF(1)
      Jun 05 09:44:59 worker-226 bash[7668]:          INTERNAL_CPU_ESWITCH_MANAGER                ECPF(0)         EXT_HOST_PF(1)
      Jun 05 09:44:59 worker-226 bash[7668]:          INTERNAL_CPU_IB_VPORT0                      ECPF(0)         EXT_HOST_PF(1)
      Jun 05 09:44:59 worker-226 bash[7668]:          INTERNAL_CPU_OFFLOAD_ENGINE                 ENABLED(0)      DISABLED(1)
      Jun 05 09:44:59 worker-226 bash[7668]:  Apply new Configuration? (y/n) [n] : y
      Jun 05 09:44:59 worker-226 bash[7668]: Applying... Done!
      Jun 05 09:44:59 worker-226 bash[7668]: -I- Please reboot machine to load new configurations.
      Jun 05 09:44:59 worker-226 bash[7668]: Minimal reset level for device, 0000:ca:00.0:
      Jun 05 09:44:59 worker-226 bash[7668]: 3: Driver restart and PCI reset
      Jun 05 09:44:59 worker-226 bash[7668]: Continue with reset?[y/N] y
      Jun 05 09:44:59 worker-226 bash[7668]: -I- Sending Reset Command To Fw             -Done
      Jun 05 09:45:06 worker-226 bash[7668]: -I- Stopping Driver                         -Done
       Jun 05 09:45:07 worker-226 bash[7668]: -I- Resetting PCI                           -Done
      Jun 05 09:45:15 worker-226 bash[7668]: -I- Starting Driver                         -Done
      Jun 05 09:45:15 worker-226 bash[7668]: -I- FW was loaded successfully.
      Jun 05 09:45:15 worker-226 bash[7668]: Switched to nic mode.

       

       

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              sdaniele@redhat.com Salvatore Daniele
              sdaniele@redhat.com Salvatore Daniele
              Zhanqi Zhao Zhanqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: