-
Task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
None
-
-
This task is tracking the test case writing activities to cover the bug described below.
I observed unexpectedly low throughput in a trafficgen benchmark using Networker node with OVS-DPDK with TRex as the generator. The binary search converged on a maximum sustainable rate of ~2.08% of line rate, yielding an aggregate of ~3.09 MPPS across 4 directions (bi-directional on two device pairs) with zero packet loss during a 120-second validation trial. However, attempts to push beyond this (e.g., 2.081%-2.099%) result in packet loss exceeding 0.002%.
This ~3 MPPS aggregate seems very low for a DPDK-accelerated setup, especially with 64-byte UDP packets and isolated CPUs.
For context:
Per-direction: ~0.77 MPPS, ~371 Mbps L2, ~519 Mbps L1.
Aggregate L2 throughput: ~1.48 Gbps.
Environment:
- OS/Platform: RHOSO-18.0.8 (RHEL-like, kernel 5.14.0-427.13.1.el9_4.x86_64).
- OVN/OVS Version: OVS 3.5.1-5.el9fdp.
- Topology: TRex Traffic Generator → OVN-DPDK Networker Node → OVN Logical Router (LR) → DUT Compute Node with 2 Grout VMs (handling application-level routing and return traffic). Bi-directional traffic via paired ports (0:1 and 2:3).
- Hardware: Assumed high-end (e.g., 100Gbps NICs based on TRex config), with CPU isolation (isolcpus=2-7, nohz_full=2-7, 8 CPUs total, hugepages=8x1GB).
- Benchmark Tool: bench-trafficgen (from https://github.com/perftool-incubator/bench-trafficgen/tree/main/trafficgen), using TRex in software mode off, Mellanox support off.
- Config Highlights:
- Packet: 64-byte UDP, 1024 flows (varying src/dst ports).
- Loss Tolerance: 0.002%.
- Run ID: trafficgen-
{}[2025-08-26_06:20:01_UTC-bcc87a7a-6c85-41cb-8c85-a09d2efc9c44|http://storage.scalelab.redhat.com/psahoo/PerfTaskLog/debuging/networkernode/bug/perfdrop/trafficgen--2025-08-26_06%3A20%3A01_UTC--bcc87a7a-6c85-41cb-8c85-a09d2efc9c44/].
Actual Results:
- Max sustainable: ~3.09 MPPS aggregate with 0% loss.
- Loss starts at ~2.081% rate, with symmetric drops in outbound/inbound directions.
- No TX queue full in TRex, but potential OVS/VM bottlenecks (e.g., upcalls, PMD saturation).
- Profiler gaps: Missing OVS files (e.g., pmd-perf-show, dpctl-dump-flows) indicate incomplete diagnostics.
Attachments: