Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10335

Reduced throughput when communication between application pod-to-pod is tested between bare-metal worker nodes & OCS Storage nodes

XMLWordPrintable

    • -
    • Critical
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated

      Description of problem:

      Reduced throughput when communication between application pod-to-pod is tested between bare-metal worker nodes & OCS Storage nodes using iperf.
      
      They have tested the same on pods in hostNetwork & they are getting the expected thoroughput of 20 Gbps but when they use clusterNetwork & OVN-K cni is used there is huge drop in thoroughput i.e 6-9 Gbps & hence close to 40 % drop

      Version-Release number of selected component (if applicable):

      OCP v4.11

      How reproducible:

      Occuring on customer environment

      Steps to Reproduce:

      They are using below steps refferd from this KBase--> https://access.redhat.com/articles/5233541
      
      1. create a project and run 2 iperf test pod
      
      # oc new-project iperf-test  
      # oc patch namespace iperf-test -p '{"metadata": {"annotations": {"openshift.io/node-selector": ""}}}'
      # oc create service account iperf
      # oc adm policy add-scc-to-user privileged iperf
      # oc create service nodeport iperf-host  --tcp 5201:5201
      
      
      2. Pick a node and label it with  iperf=server and another node label it with iperf=client 
      
      oc label  node NODEA iperf=server
      oc label  node NODEB iperf=client 
      
      
      3. Start iperf server pods 
      
      # oc run iperf3-sdn-server --serviceaccount=iperf  --image=quay.io/kinvolk/iperf3 --overrides='{"spec": {"tolerations": [{"operator": "Exists"}],"nodeSelector":{"iperf": "server"}}}' -- iperf3 -s
      # oc run iperf3-host-server --serviceaccount=iperf  --labels app=iperf-host --image=quay.io/kinvolk/iperf3 --overrides='{"spec": {"hostNetwork": true,"tolerations": [{"operator": "Exists"}], "nodeSelector":{"iperf": "server"}}}' -- iperf3 -s
      
      * TAKE NOTE of IPs 
      # oc get pods,service -o wide 
      
      
      4. Start client pods to test: 
      
      # oc run --rm -it iperf3-sdn-client --serviceaccount=iperf  --image=quay.io/kinvolk/iperf3 --overrides='{"spec": {"tolerations": [{"operator": "Exists"}],"nodeSelector":{"iperf": "client"}}}'
      $ iperf3 -c POD_IP_iperf3-sdn-server
      
      # oc run --rm -it iperf3-host-client --serviceaccount=iperf  --image=quay.io/kinvolk/iperf3 --overrides='{"spec": {"hostNetwork": true,"tolerations": [{"operator": "Exists"}],"nodeSelector":{"iperf": "client"}}}'
      $ iperf3 -c POD_IP_iperf3-host-server -p NODE_PORT
      
      
      5. Clean up 
      # oc delete project iperf-test
      oc label  node NODEA iperf-
      oc label  node NODEB iperf-

      Actual results:

      They are seeing the below post iperf test
      
      [aovamos@vlobepx133 ~]$ oc rsh iperf3-sdn-client
      / # iperf3 -c 10.131.5.55
      Connecting to host 10.131.5.55, port 5201
      [  5] local 10.130.4.25 port 60728 connected to 10.131.5.55 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec   969 MBytes  8.12 Gbits/sec  295    429 KBytes
      [  5]   1.00-2.00   sec   985 MBytes  8.26 Gbits/sec   29    324 KBytes
      [  5]   2.00-3.00   sec   742 MBytes  6.23 Gbits/sec  131    326 KBytes
      [  5]   3.00-4.00   sec   652 MBytes  5.47 Gbits/sec  324    387 KBytes
      [  5]   4.00-5.00   sec   808 MBytes  6.78 Gbits/sec   87    324 KBytes
      [  5]   5.00-6.00   sec   837 MBytes  7.02 Gbits/sec    0    366 KBytes
      [  5]   6.00-7.00   sec   847 MBytes  7.11 Gbits/sec    0    373 KBytes
      [  5]   7.00-8.00   sec   825 MBytes  6.92 Gbits/sec    0    394 KBytes
      [  5]   8.00-9.00   sec   861 MBytes  7.23 Gbits/sec    0    396 KBytes
      [  5]   9.00-10.00  sec   784 MBytes  6.58 Gbits/sec  262    294 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  8.12 GBytes  6.97 Gbits/sec  1128             sender
      [  5]   0.00-10.04  sec  8.11 GBytes  6.94 Gbits/sec                  receiver
      
      iperf Done.

      Expected results:

      They should get somewhere around 15 Gbps that seems a legit drop as OVN-k will be using Geneve tunnel & encapsulation is expected to cause drop in thoroughput. The below is for pods in hostnetwork
      
      [aovamos@vlobepx133 ~]$ oc rsh iperf3-host-client
      ~ $ iperf3 -c 10.176.39.14 5201
      Connecting to host 10.176.39.14, port 5201
      [  5] local 10.176.39.12 port 38650 connected to 10.176.39.14 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec  2.35 GBytes  20.2 Gbits/sec    0   1.44 MBytes
      [  5]   1.00-2.00   sec  2.76 GBytes  23.7 Gbits/sec    0   1.44 MBytes
      [  5]   2.00-3.00   sec  2.74 GBytes  23.6 Gbits/sec    0   1.44 MBytes
      [  5]   3.00-4.00   sec  2.73 GBytes  23.4 Gbits/sec    0   1.44 MBytes
      [  5]   4.00-5.00   sec  2.73 GBytes  23.4 Gbits/sec    0   1.44 MBytes
      [  5]   5.00-6.00   sec  2.74 GBytes  23.5 Gbits/sec    0   1.44 MBytes
      [  5]   6.00-7.00   sec  2.72 GBytes  23.4 Gbits/sec    0   1.52 MBytes
      [  5]   7.00-8.00   sec  2.75 GBytes  23.6 Gbits/sec    0   1.52 MBytes
      [  5]   8.00-9.00   sec  2.75 GBytes  23.6 Gbits/sec    0   1.52 MBytes
      [  5]   9.00-10.00  sec  2.73 GBytes  23.4 Gbits/sec    0   1.52 MBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  27.0 GBytes  23.2 Gbits/sec    0             sender
      [  5]   0.00-10.04  sec  27.0 GBytes  23.1 Gbits/sec                  receiver
      
      iperf Done.

      Additional info:

      Customer is using a bond interface on the worker node with 2x100 Gbps nic

            rh-ee-dacampbe Dan Campbell (Inactive)
            rhn-support-adubey Akash Dubey
            Anurag Saxena Anurag Saxena
            Votes:
            2 Vote for this issue
            Watchers:
            22 Start watching this issue

              Created:
              Updated:
              Resolved: