-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.20.0, 4.20
-
None
-
False
-
-
None
-
Important
-
Yes
-
x86_64
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
In perf-scale tests combining control- and data-plane workload operation in OCP, we measured a significant throughput degradation in OVN when churning already only 10% of the pods (same node). More concrete, out of a max achievable throughput of 925Gbps (as reported by the data-plane measurement tool), with OVN and without churning it achieves 830 Gbps (~89%) , where during churn it can go as low as 750Gbps (~81%). This happens when churning control-plane workload pods on the same node as the data-plane workload and both workloads also share the same NIC. The impact increases with churn levels (%), since it takes longer to churn higher numbers of pods. We tested churn values between 10% and 100%.
Version-Release number of selected component (if applicable):
- OCP 4.20.0
- Platform: Bare-metal (Dell r640 servers)
How reproducible:
- Deploy control-plane load workload: kube-burner-ocp with rds-core;
- Deploy data-plane load workload: crucible with TCP stream/bulk flows;
- Churn pods/namespaces from the control-plane workload while the data-plane workload is running;
- Collect metrics from the control-plane workload;
- Collect metrics from the data-plane workload (throughput/s);
Steps to Reproduce:
Follow the documented steps to deploy the workload tools and environment here.
Actual results:
The results are reported in slides here and the spreadsheet, raw data, here.
Expected results:
None to negligible impact when churning pods.
Additional info:
In the attached report we also have data-plane tests with MACVLAN and SRIOV for comparison, where the goal of this exercise was to quantify if control-plane worklaod operations can impact data-plane workload.