Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4585

SR-IOV VFs may get reseted after being allocated by other pods

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • 4.12.0
    • Networking / SR-IOV
    • None
    • CNF Network Sprint 228
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      In the scenario where pods with SR-IOV interface from the same resource pool are created and deleted a few times,
      the underlying VF may end up with a default configuration instead of the desired one (i.e: no MAC address, no VLAN).
      
      U/S issue: https://github.com/k8snetworkplumbingwg/sriov-cni/issues/219

      Version-Release number of selected component (if applicable):

      SR-IOV CNI: v2.6
      SR-IOV device plugin: v3.3
      Multus: v3.8
      Kubernetes: v1.22.2

      How reproducible:

      17%

      Steps to Reproduce:

      1. Terminal 1: Monitor SR-IOV VFs state on node 'worker1'
      2. Terminal 2:
        2.1 Create a pod 'test1' test with:
          nodeSelector to node 'worker1.'
          SR-IOV interface with MAC address '02:02:02:02:02'.
        2.2 Wait for the pod 'test1' to be ready.
        2.3 Delete pod 'test1' in background and immediately create a similar pod 'test2'.
        2.4 Wait for pod 'test2' to be ready.
      
      After a few iterations 'test2' pod is Running but looking at the node VFs (terminal 2) it shows that it's not configured with the desired MAC address.
      
      Scripts and manifests I used for reproducing the issue: https://gist.github.com/ormergi/3ddbf901ddc95baf316b604994285a69

       

      Actual results:

      Pod ended up with unexpected MAC address on its VF.

      Expected results:

      The pod underlying VF is to be configured correctly.

      Additional info:

      On Kubevirt CI SR-IOV tests lane some tests are failing due to VMs SR-IOV interfaces ends up with unexpected MAC address.
      
      It seems that when the VM underlying pod is deleted, CNI cmdDEL 
      command is executed, it will reset the VF whether it's been allocated 
      by another pod or not.
      Also, it's not guaranteed that as soon as a pod is disposed its underlying VF is reseted.
      
      It takes 1-2 seconds for it to reset, which seem odd because I would expect all its resource to be free when it's gone.

       

      Attachments

        Issue Links

          Activity

            People

              carlosgoncalves Carlos Goncalves
              omergi@redhat.com Or Mergi
              Zhanqi Zhao Zhanqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: