Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32161

High Egress IP failover latency during scale testing

XMLWordPrintable

    • No
    • SDN Sprint 253
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          We have created 24000 eips for 24000 pods (where each namespace has 1 EIP and 1 pod) on a 120 node baremetal environment and failed over the node which has 200 EIPs by blocking port  9107 using iptables and observed high pod connection latencies (varying between 41 sec to 221 msec) for which EIP failed over to other nodes. 
      
      
      pod EIP Failover latency in sec
      client-1-13103-78c6585bbb-jkr8h4 41.0 sec
      client-1-2777-7d86cd47bf-djgnf 38.0 sec
      client-1-2609-79cfd5ff55-7z446 23.2 sec
      client-1-22868-7bf96cd49-fjrtj 16.0 sec
      client-1-23491-56f499cc69-w5hbr 9.01 sec
      client-1-11301-78b5bbc987-vrs8s 9.01 sec
      client-1-6098-64b7d9d4f4-b62zm 2.00 sec
      client-1-22599-5975f8bdc4-hgng2 2.00 sec
      client-1-15570-86b979d584-j7cpb  221 msec

      CPU usage and ovs flow metrics avaibale in grafana dashbaord https://grafana.rdu2.scalelab.redhat.com:3000/d/FwPsenbaa/kube-burner-report-eip?orgId=1&from=1712835501022&to=1712857101023&var-Datasource=AWS+Pro+-+ripsaw-kube-burner&var-workload=egressip&var-uuid=7f8a09af-8ed6-4027-bbc7-0583aa18db10&var-master=f20-h02-000-r640.rdu2.scalelab.redhat.com&var-worker=f20-h11-000-r640.rdu2.scalelab.redhat.com&var-infra=f36-h10-000-r640.rdu2.scalelab.redhat.com&var-namespace=All&var-latencyPercentile=P99

      must-gahter  http://storage.scalelab.redhat.com/anilvenkata/eip_failover_mg/must-gather.local.2880304935723177257.tgz 

      All the resources were already created before we issued node failover. Node on which port 9107 is blcoked also hosts 200 pods. This node also has 200 EIPs. We only issued iptables command to block port 9107

      sudo iptables -A INPUT -p tcp --dport 9107 -j DROP

      and we didn't delete any conntrack entries or ovs flows etc .. for failover simulation.

            jcaamano@redhat.com Jaime Caamaño Ruiz
            vkommadi@redhat.com VENKATA ANIL kumar KOMMADDI
            Sachin Ninganure Sachin Ninganure
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: