Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56630

[UDN] a few pods are not able to connect to other pod ip in the same udn

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 5
    • None
    • None
    • None
    • None
    • Rejected
    • CORENET Sprint 272, CORENET Sprint 273, CORENET Sprint 274, CORENET Sprint 275
    • 4
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:
      After 3 back to back runs of udn-density[1] test (with pause) I see that a few pods in a udn are not able to connect to other pods using pod ip in the same udn. Additionally I also see a few pods when the are able to connect to the pod ip, then they are not able to connect to the service and service-ip

      # here you can see 2 client pods with 0 containers running
      
      ogp
      NAME                        READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
      client-1-75f7987479-6ktmp   0/1     Running   0          3h13m   10.128.4.23    ip-10-0-63-9.us-west-2.compute.internal      <none>           <none>
      client-1-75f7987479-7jctz   1/1     Running   0          3h13m   10.131.6.22    ip-10-0-30-0.us-west-2.compute.internal      <none>           <none>
      client-2-77b9944574-llrz8   1/1     Running   0          3h13m   10.128.2.28    ip-10-0-89-147.us-west-2.compute.internal    <none>           <none>
      client-2-77b9944574-mdglh   0/1     Running   0          3h13m   10.129.8.25    ip-10-0-20-114.us-west-2.compute.internal    <none>           <none>
      server-1-6bc7b48b6-9fpqb    1/1     Running   0          3h13m   10.128.10.23   ip-10-0-40-153.us-west-2.compute.internal    <none>           <none>
      server-1-6bc7b48b6-s4f2h    1/1     Running   0          3h13m   10.129.6.24    ip-10-0-103-174.us-west-2.compute.internal   <none>           <none>
      server-2-68b9cd895b-cv294   1/1     Running   0          3h13m   10.128.4.24    ip-10-0-63-9.us-west-2.compute.internal      <none>           <none>
      server-2-68b9cd895b-ql6cf   1/1     Running   0          3h13m   10.128.12.17   ip-10-0-38-140.us-west-2.compute.internal    <none>           <none>
      server-3-677fc86874-j9jbd   1/1     Running   0          3h13m   10.130.2.23    ip-10-0-71-136.us-west-2.compute.internal    <none>           <none>
      server-3-677fc86874-qfbst   1/1     Running   0          3h13m   10.128.8.24    ip-10-0-125-251.us-west-2.compute.internal   <none>           <none>
      

       

      oc get svc
      NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
      udn-density-1   ClusterIP   172.30.255.89    <none>        80/TCP    3h13m
      udn-density-2   ClusterIP   172.30.22.195    <none>        80/TCP    3h13m
      udn-density-3   ClusterIP   172.30.6.38      <none>        80/TCP    3h13m
      udn-density-4   ClusterIP   172.30.178.203   <none>        80/TCP    3h13m
      udn-density-5   ClusterIP   172.30.125.19    <none>        80/TCP    3h13m 
       # curling server pods's udn ip
      oc rsh client-1-75f7987479-6ktmp ~ 
      
      $ for i in 10.132.9.4 10.132.1.4 10.132.21.5 10.132.13.8 10.132.8.4 10.132.7.5; do curl $i:8080; done 
      
      curl:(7) Failed to connect to 10.132.9.4 port 8080 after 1611 ms: Couldn't connect to server
      curl: (7) Failed to connect to 10.132.1.4 port 8080 after 3067 ms: Couldn't connect to server 
      curl: (7) Failed to connect to 10.132.21.5 port 8080 after 3065 ms: Couldn't connect to server 
      
      <!DOCTYPE html> <html> <head> <title>Welcome 0128B</title> </head> <body> <h1>Welcome 0128B</h1><pre> </pre></body> </html> 
      
      curl: (7) Failed to connect to 10.132.8.4 port 8080 after 3061 ms: Couldn't connect to server 
      curl: (7) Failed to connect to 10.132.7.5 port 8080 after 3067 ms: Couldn't connect to server
      
      
      # you can see that curl to 10.132.13.8 were successful but other were not
       
      
      oc get svc
       NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE udn-density-1   ClusterIP   172.30.255.89    <none>        80/TCP    3h39m udn-density-2   ClusterIP   172.30.22.195    <none>        80/TCP    3h39m udn-density-3   ClusterIP   172.30.6.38      <none>        80/TCP    3h39m udn-density-4   ClusterIP   172.30.178.203   <none>        80/TCP    3h39m udn-density-5   ClusterIP   172.30.125.19    <none>        80/TCP    3h39m
      
      
      for i in 172.30.255.89 172.30.22.195 172.30.6.38 172.30.125.19 172.30.178.203; do curl $i; done 
      curl: (7) Failed to connect to 172.30.255.89 port 80 after 3081 ms: Couldn't connect to server 
      curl: (7) Failed to connect to 172.30.22.195 port 80 after 3056 ms: Couldn't connect to server 
      curl: (7) Failed to connect to 172.30.6.38 port 80 after 3067 ms: Couldn't connect to server 
      curl: (7) Failed to connect to 172.30.125.19 port 80 after 3067 ms: Couldn't connect to server 
      curl: (7) Failed to connect to 172.30.178.203 port 80 after 3067 ms: Couldn't connect to server
      
      
      
      
      

       

       

      Version-Release number of selected component (if applicable):

      4.19.0-0.nightly-2025-05-19-005500

      How reproducible:

      Always

      Steps to Reproduce:

      1. Run udn-density test from kube-burner https://github.com/kube-burner/kube-burner-ocp/tree/main/cmd/config/udn-density-pods 
      It up a bunch of udns and pods inside it

      2. For the above test I used 

      ./bin/amd64/kube-burner-ocp udn-density-pods --log-level=info --qps=20 --burst=20 --gc=true  --gc-metrics=true --pod-ready-threshold=140000s --profile-type=both --iterations=160 --churn=false  --job-pause=10m

       

       

      Actual results: Described above

      Expected results: 

      •  
        Expected output where service name is reachable
        for i in udn-density-1 udn-density-2 udn-density-3 udn-density-4 udn-density-5 ;do curl -s -o /dev/null -w "%{http_code}" $i; echo ""; done 
        200 200 200 200 200
        

         

              pdiak@redhat.com Patryk Diak
              mohit-sheth Mohit Jitendra Sheth
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: