Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-73758

OVN throughput degradation when churning pods

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • x86_64
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      In perf-scale tests combining control- and data-plane workload operation in OCP, we measured a significant throughput degradation in OVN when churning already only 10% of the pods (same node). More concrete, out of a max achievable throughput of 925Gbps (as reported by the data-plane measurement tool), with OVN and without churning it achieves 830 Gbps (~89%) , where during churn it can go as low as 750Gbps (~81%). This happens when churning control-plane workload pods on the same node as the data-plane workload and both workloads also share the same NIC. The impact increases with churn levels (%), since it takes longer to churn higher numbers of pods. We tested churn values between 10% and 100%.

      Version-Release number of selected component (if applicable):

      • OCP 4.20.0
      • Platform: Bare-metal (Dell r640 servers)

       

      How reproducible:

      • Deploy control-plane load workload: kube-burner-ocp with rds-core;
      • Deploy data-plane load workload: crucible with TCP stream/bulk flows;
      • Churn pods/namespaces from the control-plane workload while the data-plane workload is running;
        • Collect metrics from the control-plane workload;
        • Collect metrics from the data-plane workload (throughput/s);

       

      Steps to Reproduce:

      Follow the documented steps to deploy the workload tools and environment here.

       

      Actual results:

      The results are reported in slides here and the spreadsheet, raw data, here.

       

      Expected results:

      None to negligible impact when churning pods.

       

      Additional info:

      In the attached report we also have data-plane tests with MACVLAN and SRIOV for comparison, where the goal of this exercise was to quantify if control-plane worklaod operations can impact data-plane workload.

              bbennett@redhat.com Ben Bennett
              rh-ee-sferlinr Simone Ferlin-Reiter
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: