Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-28654

[2196224] [DPDK checkup] Packet loss when running VM/traffic generator on specific nodes

XMLWordPrintable

    • High
    • No

      Description of problem:
      When running DPDK checkup, there are some nodes, that when the traffic generator andthe VM are scheduled on, the checkup ends with packet loss.

      Version-Release number of selected component (if applicable):
      CNV 4.13.0
      container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-37

      How reproducible:
      Most of the times (on specific nodes).

      Steps to Reproduce:
      1. Create namespace for the job, and change context to the new namespace.
      $ oc create ns dpdk-checkup-ns
      $ oc project dpdk-checkup-ns

      2. Label the worker nodes with "worker-dpdk" label.

      3. Apply the resources manifests in the attached file in their numeric order:
      $ oc apply -f 1-dpdk-checkup-resources.yaml
      $ oc apply -f 2-dpdk-checkup-scc.yaml
      ...
      change the resources according to your cluster.

      Please note:
      Due to https://bugzilla.redhat.com/show_bug.cgi?id=2193235, you cannot set which nodes will be used for scheduling the VM and the traffic generator.
      Therefore, you must W/A it by either uncordoning 2 workers and leaving only one as schedulable, or removing the "dpdk-workers" label from 2 nodes and keeping it on only one node.

      4. After the job is completed - check the ConfigMap:
      $ oc get cm dpdk-checkup-config -o yaml
      ...
      status.failureReason: 'not all generated packets had reached DPDK VM: Sent from
      traffic generator: 480000000; Received on DPDK VM: 110323573'
      status.result.DPDKRxPacketDrops: "0"
      status.result.DPDKRxTestPackets: "110323573"
      status.result.DPDKTxPacketDrops: "0"
      status.result.DPDKVMNode: cnv-qe-infra-06.cnvqe2.lab.eng.rdu2.redhat.com
      status.result.trafficGeneratorInErrorPackets: "0"
      status.result.trafficGeneratorNode: cnv-qe-infra-06.cnvqe2.lab.eng.rdu2.redhat.com
      status.result.trafficGeneratorOutputErrorPackets: "0"
      status.result.trafficGeneratorTxPackets: "480000000"
      status.startTimestamp: "2023-05-08T09:49:24Z"
      status.succeeded: "false"

      Actual results:
      <BUG> Note these fields:
      status.failureReason: 'not all generated packets had reached DPDK VM: Sent from
      traffic generator: 480000000; Received on DPDK VM: 110323573'
      status.succeeded: "false"

      Expected results:
      Successful job, no packet loss.

      Additional info:
      1. The diff between Tx bytes and Rx byte can be seen in the job log:
      $ $ oc logs dpdk-checkup-8nhz9
      ...
      2023/05/08 10:08:47 GetPortStats JSON: {
      "id": "a7mhi4qm",
      "jsonrpc": "2.0",
      "result":

      { "ibytes": 0, "ierrors": 0, "ipackets": 0, "m_cpu_util": 0.0, "m_total_rx_bps": 0.0, "m_total_rx_pps": 0.0, "m_total_tx_bps": 4063406080.0, "m_total_tx_pps": 7469495.5, "obytes": 32640000000, "oerrors": 0, "opackets": 480000000 }

      }
      2023/05/08 10:08:48 GetPortStats JSON: {
      "id": "ntnu7u0h",
      "jsonrpc": "2.0",
      "result": {
      "ibytes": 30720000000,
      "ierrors": 844,
      "ipackets": 480000000,
      "m_cpu_util": 0.0,
      "m_total_rx_bps": 1902393984.0,
      "m_total_rx_pps": 3715611.0,
      "m_total_tx_bps": 0.0,
      "m_total_tx_pps": 0.0,
      "obytes": 0,
      "oerrors": 0,
      "opackets": 0

      (compare the obytes in the first summary with the ibytes in the second summary).

      2. The issue was found on 2 separate clusters bm01-cnvqe2-rdu2 and bm02-cnvqe2-rdu2.
      On bm01-cnvqe2 the problematic node is cnv-qe-infra-06.cnvqe2.lab.eng.rdu2.redhat.com
      On bm02-cnvqe2 the checkup cannot currently run, so I'm not sure which was the problematic node(s).

              ralavi@redhat.com Ram Lavi
              ysegev@redhat.com Yossi Segev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: