Loading...

XML

Word

Printable

Description of problem:

We have run dataplane tests for scenarios where pod is using egress ip and the results are documented at https://docs.google.com/document/d/1l-uhLQSAxdc340pqa8UeX8rHsYqD8WH4qDvWa-Rs424/edit?usp=sharing

Test involves client pod sending traffic to external server using netperf https://github.com/HewlettPackard/netperf tool.

We have 3 scenarios

Node_IP: This test doesn't create egress IP for client pod. So by default, OVN uses node IP address for the traffic from client pod to external server.
Local_EIP: This test creates egress IP for the client pod. Both the egress IP and client pod are located on the same node. For example, if client pod is on node1, egress IP is also created on node1. OVN uses egress IP for the traffic from client pod to external server.
Remote_EIP: This test creates egress IP for the client pod. However client pod and egress IP will be hosted on different nodes For example, if client pod is on node1, egress IP is created on node2. OVN uses egress IP for the traffic from client pod to external server.

Performance drop observations:

Throughput is similar in scenario 1 and 2, however 30% more throughput with scenario 2 compared to scenario 3
100% more requests rate achieved with scenario 2 compared to scenario 3
270% higher latency per request with scenario 3 compared to scenario 2
70% more connection rate achieved with scenario 2 compared to scenario 3

So we have throughput, RPS and connections/second performance drop with scenario 3 i.e when pod and egress IP are on different hosts.

Version-Release number of selected component (if applicable): 4.14.16

How reproducible: Always

Steps to Reproduce:

1. create client pod and egress ip on same node and send traffic from client pod to external server using netperf tool

2. create client pod and egress ip on different nodes and send traffic from client pod to external server using netperf tool

Actual results:

Drop in throughput, RPS and connections/second. Higher latencies

Throughput is similar in scenario 1 and 2, however 30% more throughput with scenario 2 compared to scenario 3
100% more requests rate achieved with scenario 2 compared to scenario 3
270% higher latency per request with scenario 3 compared to scenario 2
70% more connection rate achieved with scenario 2 compared to scenario 3

Expected results:

Egress IP placement on nodes should have very less impact on dataplane performance (i.e throughput, RPS and connections/second).

Additional info:

We have run tests on Performance team's scale lab baremetal hardware. All OCP nodes are baremetal nodes. An external baremetal node (external to OCP but part of same scale lab allocation) is used to host server which receives client pod's traffic. All OCP nodes and external baremetal server hosted on the same rack, so connected to same switch.