Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60887

[release-4.19] `k8s.v1.cni.cncf.io/network-status` annotation is missing interface details for host-devices bound to the `vfio-pci` driver.

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • Done
    • Bug Fix
    • Hide
      Before this update, when a `host-device` NetworkAttachmentDefinition targets a userspace-bound NIC, then the CNI Add command didn't return a correct result. As a consequence, the `k8s.v1.cni.cncf.io/network-status` annotation was missing the information about the NetworkAttachmentDefinition. With this update, the host-device CNI plugin correctly reports the command result.
      Show
      Before this update, when a `host-device` NetworkAttachmentDefinition targets a userspace-bound NIC, then the CNI Add command didn't return a correct result. As a consequence, the `k8s.v1.cni.cncf.io/network-status` annotation was missing the information about the NetworkAttachmentDefinition. With this update, the host-device CNI plugin correctly reports the command result.
    • None
    • None
    • None
    • None

      Description of problem:

      When a pod is configured with two host-device network attachments, one for a NIC using a standard kernel driver (e.g., i40e) and another for a device bound to the vfio-pci driver, the pod's k8s.v1.cni.cncf.io/network-status annotation fails to report the interface details for the vfio-pci device. 
      
      The network attachment for the vfio-pci device is listed by name, but its corresponding JSON object is empty, lacking the interface and mac fields. This behavior worked correctly in OCP 4.16 but is broken in 4.17 and 4.18, indicating a regression.
      

      Version-Release number of selected component (if applicable):

      Affected versions: 4.18.20, 4.17.35
      
      Working version: 4.16.43
      Kernel: 5.14.0-427.76.1.el9_4.x86_64

      How reproducible:

      Always    

      Steps to Reproduce:

          1. On an OCP 4.17+ cluster, configure a worker node with at least two PCI devices.
          2. Bind one device to a standard kernel driver (e.g., i40e) and the other to the vfio-pci driver.
          3. Create two NetworkAttachmentDefinition manifests using the host-device CNI plugin, one for each PCI device, specifying their pciBusID.
          4. Create a pod and annotate it to attach both networks using k8s.v1.cni.cncf.io/networks.
          5. Inspect the pod's annotations via oc describe pod <pod-name>.     

      Actual results:

          The k8s.v1.cni.cncf.io/network-status annotation only shows interface details for the i40e device. The entry for the vfio-pci device is present but empty.

      Expected results:

          The k8s.v1.cni.cncf.io/network-status annotation should contain a complete entry for both the i4e0 and the vfio-pci devices, including the interface name for the vfio-pci device, as was the behavior in OCP 4.16.

      Additional info:

      Analysis points to a regression introduced in Multus CNI. The Multus logs confirm that the host-device plugin for the vfio-pci device returns a CNI result that does not contain an interfaces array or a sandbox path.
      
      Suspected commit causing regression: https://github.com/openshift/multus-cni/commit/bc6c8d5c76c775c3e8088631e7660e8a927b62c6
      
      The host-device CNI plugin intentionally does not populate the Sandbox field for DPDK/VFIO drivers.
      
      A fix has been merged for OCP 4.19 which ensures the Sandbox field is populated, resolving the issue.
      
      Fix commit: https://github.com/openshift/containernetworking-plugins/commit/bd8da2000a9539fb6f39c4b2a209b687a63a3e7d

              pliurh Peng Liu
              rhn-support-arbhagat Arpit Bhagat
              None
              Andrea Panattoni
              Ying Wang Ying Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: