[OCPBUGS-38920] higher latency while applying network policies at scale - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.16.z
Component/s: Networking / ovn-kubernetes
Labels:
- SDN:OVNK:NetworkPolicy
- SDN:Scale

Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

During the network policy scale testing on a 24 worker node ROSA environment, kube-burner workolad is creating 10 namespaces in parallel where in each namespace has 138 client pods, 138 server pods and 1 network policy which allows server pods to accept traffic from client pods on port 8080 in same namespace.

Client pods run go routines which sends requests to servers in parallel. Some servers are reachable from client within 10 seconds, however others are reachable after 30 seconds of creating the network policy object. So same network policy is taking longer time to get applied for some server pods.

For example, this is the log https://storage.scalelab.redhat.com/anilvenkata/networkpolicy/bugs/logclientpod.txt from one of the client pod.

grep "Got 200 response to address" logclientpod.txt
shows that most of the server pods are reached within 10 seconds. Howerver it took 20 seconds for client to reach server pod 10.129.6.168 i.e
2024/08/26 05:12:50 Got 200 response to address 10.129.6.168

In the same client pod log, we can see that client pod tried to reach server pod 10.129.6.168 multiple times between 05:12:33 and 05:12:50, but connection test was succesful only at 05:12:50.

Note: kube-burner is creating all the namespaces, client and server pods before creating the network policies i.e kube-burner's job1 creates all the namespaces, clients and server pods. Once all of them are avaible, job2 starts which creates network polcies. So all the pods are ready before we start the network policies creation.

Version-Release number of selected component (if applicable):

4.16

How reproducible: Always

Steps to Reproduce:

We can provide the environment when the developer wants to debug the issue.

Actual results:

Some server pods are taking 30 seconds to reach whereas other server pods are reachable from client pods within 5 seconds for the same network policy.

Expected results:

Latency of connection establishment to all the server pods from client pod should be as close as possible (for example, 5 to 10 seconds in this case).

Must gather - https://storage.scalelab.redhat.com/anilvenkata/networkpolicy/bugs/must-gather.tar.gz

Assignee:: Nadia Pinaeva

Reporter:: VENKATA ANIL kumar KOMMADDI

QA Contact:: Anurag Saxena

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/08/26 12:08 PM

Updated:: 2024/08/27 2:05 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates