Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38920

higher latency while applying network policies at scale

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      During the network policy scale testing on a 24 worker node ROSA environment, kube-burner workolad is creating 10 namespaces in parallel where in each namespace has 138 client pods, 138 server pods and 1 network policy which allows server pods to accept traffic from client pods  on port 8080 in same namespace.

      Client pods run go routines which sends requests to servers in parallel. Some servers are reachable from client within 10 seconds, however others are reachable after 30 seconds of creating the network policy object. So same network policy is taking longer time to get applied for some server pods. 

      For example, this is the log https://storage.scalelab.redhat.com/anilvenkata/networkpolicy/bugs/logclientpod.txt from one of the client pod. 

      grep "Got 200 response to address" logclientpod.txt
      shows that most of the server pods are reached within 10 seconds. Howerver it took 20 seconds for client to reach server pod 10.129.6.168 i.e
      2024/08/26 05:12:50 Got 200 response to address 10.129.6.168

      In the same client pod log, we can see that client pod tried to reach  server pod 10.129.6.168 multiple times between 05:12:33 and 05:12:50, but connection test was succesful only at 05:12:50.

      Note: kube-burner is creating all the namespaces, client and server pods before creating the network policies i.e kube-burner's job1 creates all the namespaces, clients and server pods. Once all of them are avaible, job2 starts which creates network polcies. So all the pods are ready before we start the network policies creation.

      Version-Release number of selected component (if applicable):

      4.16

      How reproducible: Always

      Steps to Reproduce:

      We can provide the environment when the developer wants to debug the issue.

       

      Actual results:

      Some server pods are taking 30 seconds to reach whereas other server pods are reachable from client pods within 5 seconds for the same network policy.

       

      Expected results:

      Latency of connection establishment to all the server pods from client pod should be as close as possible (for example, 5 to 10 seconds in this case).

       

      Must gather - https://storage.scalelab.redhat.com/anilvenkata/networkpolicy/bugs/must-gather.tar.gz 

              npinaeva@redhat.com Nadia Pinaeva
              vkommadi@redhat.com VENKATA ANIL kumar KOMMADDI
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: