Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-34645

Dataplane performance drop for pod's traffic when egress ip is hosted on different node for the pod

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      We have run dataplane tests for scenarios where pod is using egress ip and the results are documented at https://docs.google.com/document/d/1l-uhLQSAxdc340pqa8UeX8rHsYqD8WH4qDvWa-Rs424/edit?usp=sharing 

      Test involves client pod sending traffic to external server using netperf https://github.com/HewlettPackard/netperf tool.

      We have 3 scenarios

      1. Node_IP: This test doesn't create egress IP for client pod. So by default, OVN uses  node IP address for the traffic from client pod to external server.
      2. Local_EIP: This test creates egress IP for the client pod. Both the egress IP and client pod are located on the same node. For example, if client pod is on node1, egress IP is also created on node1. OVN uses egress IP  for the traffic from client pod to external server.
      3. Remote_EIP: This test creates egress IP for the client pod. However client pod and egress IP will be hosted on different nodes For example, if client pod is on node1, egress IP is created on node2. OVN uses egress IP  for the traffic from client pod to external server. 

      Performance drop observations:

      1. Throughput is similar in scenario 1 and 2, however  30% more throughput with scenario 2 compared to scenario 3
      2. 100% more requests rate achieved with scenario 2 compared to scenario 3
      3. 270% higher latency per request  with scenario 3 compared to scenario 2
      4. 70% more connection rate achieved with scenario 2 compared to scenario 3

      So we have throughput, RPS and connections/second performance drop with scenario 3 i.e when pod and egress IP are on different hosts. 

      Version-Release number of selected component (if applicable): 4.14.16

      How reproducible: Always

      Steps to Reproduce:

      1. create client pod and egress ip on same node and send traffic from client pod to external server using netperf tool

      2. create client pod and egress ip on different  nodes and send traffic from client pod to external server using netperf tool

      Actual results:

      Drop in throughput, RPS and connections/second. Higher latencies

      1. Throughput is similar in scenario 1 and 2, however  30% more throughput with scenario 2 compared to scenario 3
      2. 100% more requests rate achieved with scenario 2 compared to scenario 3
      3. 270% higher latency per request  with scenario 3 compared to scenario 2
      4. 70% more connection rate achieved with scenario 2 compared to scenario 3

      Expected results:

      Egress IP placement on nodes  should have very less impact on dataplane performance (i.e throughput, RPS and connections/second).

      Additional info:

      • We have run tests on Performance team's scale lab baremetal hardware. All OCP nodes are baremetal nodes. An external baremetal node (external to OCP but part of same scale lab allocation) is used to host server which  receives client pod's traffic.  All OCP nodes and external baremetal server hosted on the same rack, so connected to same switch.

            sdn-team-bot sdn-team bot
            vkommadi@redhat.com VENKATA ANIL kumar KOMMADDI
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: