Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-16909

Race condition when applying and removing SRIOV policy in quick succession

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.12
    • Networking / SR-IOV
    • None
    • No
    • 5
    • NHE Sprint 240, NHE Sprint 242, NHE Sprint 243, NHE Sprint 247
    • 4
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When removing a policy with vfio-pci and adding it back while it is being applied, the sriov config daemon might not notice that it has to reboot the node to add kargs

      Version-Release number of selected component (if applicable):

      4.12

      How reproducible:

      sometimes

      Steps to Reproduce:

      1. set up cluster with 2 workers
      2. apply policy that targets those two workers
      3. While it is applied to one node and second node starts to drain, remove the policy and wait
      4. Add the policy back 

      Actual results:

      The generic plugin inside config daemon does not correctly ensure kargs are there

      Expected results:

      config daemon should ensure that the kargs are in the expected state

      Additional info:

      tryEnableIommuInKernelArgs in pkg/plugins/generic/generic_plugin.go should add both intel_iommu=on and iommu=pt, and it should also ensure that it is there throughout it's lifetime if needed.  

            wizhao@redhat.com William Zhao
            bnemeth@redhat.com Balazs Nemeth
            Ying Wang Ying Wang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: