Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1592

VDUSE + Kubevirt: Sometimes tx queue is not ready

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • openvswitch3.5
    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given a nested-virtualization cluster running OCP 4.19 nightly with OVS 3.5 vduse build,

      When a KubeVirt VM using a VDUSE-backed vhost interface is started (and restarted) 20 times,

      Then both vhost RX/TX queues for the interface transition to enabled and the guest observes bidirectional connectivity on 100% of boots without restarting OVS.

      Show
      Given a nested-virtualization cluster running OCP 4.19 nightly with OVS 3.5 vduse build, When a KubeVirt VM using a VDUSE-backed vhost interface is started (and restarted) 20 times, Then both vhost RX/TX queues for the interface transition to enabled and the guest observes bidirectional connectivity on 100% of boots without restarting OVS.
    • rhel-9
    • None
    • rhel-net-ovs-dpdk
    • ssg_networking
    • FDP-OVS/DPDK Sprint 8
    • 1

      We’ve identified an issue that happens sometimes in the nested virtualization environment and RT kernel (it has not been reproduced in other environments).

      Sometimes the VM comes up and, although it can transmit packets, it cannot receive packets.

      From the guest PoV, you can clearly see that port statistics show 0 received packets.

      In the host, OVS logs show:

       

      2025-08-12T08:28:43.614Z|00169|netdev_dpdk(ovs_vhost3)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/dev/vduse/vduse18' changed to 'disabled'
      2025-08-12T08:28:43.614Z|00170|netdev_dpdk(ovs_vhost3)|INFO|State of queue 1 ( rx_qid 0 ) of vhost device '/dev/vduse/vduse18' changed to 'enabled'
      

       

      Note "disabled" in tx queue (RX from the guest’s perspective).

       

      In such cases restarting OVS typically restores the traffic.

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

       

      Network down

       

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      PoC of VDUSE and Kubevirt
      OCP 4.19.0-0.nightly-2025-08-05-174154
      kernel 5.14.0-570.32.1.el9_6.x86_64+rt
      openvswitch3.5-3.5.1-44.el9.vduse.1.x86_64
      Kubevirt HyperConverged 4.19.1

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      New issue in feature in active development

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      I'd say around 33% of the times I spin up a VM

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      https://docs.google.com/document/d/1nEP3dt3Kssh-7rVl3yEvBQtixY0pYRAyj4R78w-7aR8/edit?tab=t.0

       Expected Behavior: Describe what should happen under normal circumstances.

      VM should have connectivity both rx and tx

       Observed Behavior: Explain what actually happens.

      VM only has tx connectivity, no rx.

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

      Restarting OVS seems to re-establish connectivity

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

              mcoqueli@redhat.com Maxime Coquelin
              amorenoz@redhat.com Adrian Moreno
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: