-
Bug
-
Resolution: Done-Errata
-
Blocker
-
None
Description of problem:
When running DPDK checkup, there are some nodes, that when the traffic generator andthe VM are scheduled on, the checkup ends with packet loss.
Version-Release number of selected component (if applicable):
CNV 4.13.0
container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-37
How reproducible:
Most of the times (on specific nodes).
Steps to Reproduce:
1. Create namespace for the job, and change context to the new namespace.
$ oc create ns dpdk-checkup-ns
$ oc project dpdk-checkup-ns
2. Label the worker nodes with "worker-dpdk" label.
3. Apply the resources manifests in the attached file in their numeric order:
$ oc apply -f 1-dpdk-checkup-resources.yaml
$ oc apply -f 2-dpdk-checkup-scc.yaml
...
change the resources according to your cluster.
Please note:
Due to https://bugzilla.redhat.com/show_bug.cgi?id=2193235, you cannot set which nodes will be used for scheduling the VM and the traffic generator.
Therefore, you must W/A it by either uncordoning 2 workers and leaving only one as schedulable, or removing the "dpdk-workers" label from 2 nodes and keeping it on only one node.
4. After the job is completed - check the ConfigMap:
$ oc get cm dpdk-checkup-config -o yaml
...
status.failureReason: 'not all generated packets had reached DPDK VM: Sent from
traffic generator: 480000000; Received on DPDK VM: 110323573'
status.result.DPDKRxPacketDrops: "0"
status.result.DPDKRxTestPackets: "110323573"
status.result.DPDKTxPacketDrops: "0"
status.result.DPDKVMNode: cnv-qe-infra-06.cnvqe2.lab.eng.rdu2.redhat.com
status.result.trafficGeneratorInErrorPackets: "0"
status.result.trafficGeneratorNode: cnv-qe-infra-06.cnvqe2.lab.eng.rdu2.redhat.com
status.result.trafficGeneratorOutputErrorPackets: "0"
status.result.trafficGeneratorTxPackets: "480000000"
status.startTimestamp: "2023-05-08T09:49:24Z"
status.succeeded: "false"
Actual results:
<BUG> Note these fields:
status.failureReason: 'not all generated packets had reached DPDK VM: Sent from
traffic generator: 480000000; Received on DPDK VM: 110323573'
status.succeeded: "false"
Expected results:
Successful job, no packet loss.
Additional info:
1. The diff between Tx bytes and Rx byte can be seen in the job log:
$ $ oc logs dpdk-checkup-8nhz9
...
2023/05/08 10:08:47 GetPortStats JSON: {
"id": "a7mhi4qm",
"jsonrpc": "2.0",
"result":
}
2023/05/08 10:08:48 GetPortStats JSON: {
"id": "ntnu7u0h",
"jsonrpc": "2.0",
"result": {
"ibytes": 30720000000,
"ierrors": 844,
"ipackets": 480000000,
"m_cpu_util": 0.0,
"m_total_rx_bps": 1902393984.0,
"m_total_rx_pps": 3715611.0,
"m_total_tx_bps": 0.0,
"m_total_tx_pps": 0.0,
"obytes": 0,
"oerrors": 0,
"opackets": 0
(compare the obytes in the first summary with the ibytes in the second summary).
2. The issue was found on 2 separate clusters bm01-cnvqe2-rdu2 and bm02-cnvqe2-rdu2.
On bm01-cnvqe2 the problematic node is cnv-qe-infra-06.cnvqe2.lab.eng.rdu2.redhat.com
On bm02-cnvqe2 the checkup cannot currently run, so I'm not sure which was the problematic node(s).
- is blocked by
-
CNV-28721 [2196459] [DPDK checkup] Pods are scheduled on reserved instead of isolated CPUs
- Closed
- links to
-
RHEA-2023:116760 OpenShift Virtualization 4.15.0 Images