Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44174

[Pre-Merge-Testing] Cross node udn pods connection broken after restarting ovn pods

XMLWordPrintable

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      Cross node udn pods connection broken for layer2 after restarting ovn pods
      Version-Release number of selected component (if applicable):
      build 4.18.0-0.nightly,openshift/api#1997,openshift/ovn-kubernetes#2334

      How reproducible:
      Always

      Steps to Reproduce:

      1. Create a namespace test2

      2. Create CRD layer2 in test2

      3. Create service and pos in test2

      % oc get svc -n test2
      NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
      test-service   ClusterIP   172.30.94.248   <none>        27017/TCP   31s
      % oc get pods -n test2
      NAME            READY   STATUS    RESTARTS   AGE
      hello-pod       1/1     Running   0          5s
      test-rc-6kf5l   1/1     Running   0          18s
      test-rc-g4nv2   1/1     Running   0          18s
      

      Before restarting. ovn pods, check pod2service, no issues

      % oc rsh -n test2 hello-pod 
      ~ $ while true; do curl 172.30.94.248:27017 --connect-timeout 5; sleep 2;echo "";done
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      

      Then restarted ovn pods

      % oc delete pods --all -n openshift-ovn-kubernetes
      pod "ovnkube-control-plane-58b858b9fd-md59k" deleted
      pod "ovnkube-control-plane-58b858b9fd-wzcjq" deleted
      pod "ovnkube-node-h57tt" deleted
      pod "ovnkube-node-l8jjj" deleted
      pod "ovnkube-node-pbbpz" deleted
      pod "ovnkube-node-pkfbd" deleted
      pod "ovnkube-node-s8djs" deleted
      pod "ovnkube-node-vprtg" deleted
      
      % oc get pods -n openshift-ovn-kubernetes
      NAME                                     READY   STATUS    RESTARTS   AGE
      ovnkube-control-plane-58b858b9fd-4rn8t   2/2     Running   0          98s
      ovnkube-control-plane-58b858b9fd-w24n6   2/2     Running   0          98s
      ovnkube-node-9n7h5                       8/8     Running   0          96s
      ovnkube-node-b8579                       8/8     Running   0          94s
      ovnkube-node-f7t5n                       8/8     Running   0          96s
      ovnkube-node-flzzx                       8/8     Running   0          94s
      ovnkube-node-k9tmd                       8/8     Running   0          95s
      ovnkube-node-nt8st                       8/8     Running   0          94s
      

      Check pod2service connection again, intermittently dropped

      % oc rsh -n test2 hello-pod              
      ~ $ while true; do curl 172.30.94.248:27017 --connect-timeout 5; sleep 2;echo "";done
      curl: (28) Connection timeout after 5000 ms
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      curl: (28) Connection timeout after 5000 ms
      
      curl: (28) Connection timeout after 5000 ms
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      Hello OpenShift!
      
      curl: (28) Connection timeout after 5001 ms
      
      curl: (28) Connection timeout after 5001 ms
      

      Then checking the endpoint of service

      % oc get pods -n test2 -o wide
      NAME            READY   STATUS    RESTARTS   AGE   IP            NODE                                  NOMINATED NODE   READINESS GATES
      hello-pod       1/1     Running   0          11m   10.131.0.56   huirwang-1104a-q6h7m-worker-b-qtfrk   <none>           <none>
      test-rc-6kf5l   1/1     Running   0          12m   10.129.2.38   huirwang-1104a-q6h7m-worker-c-ggglk   <none>           <none>
      test-rc-g4nv2   1/1     Running   0          12m   10.131.0.55   huirwang-1104a-q6h7m-worker-b-qtfrk   <none>           <none>
      
      % oc exec -n test2 test-rc-6kf5l  -- ip a show ovn-udn1
      3: ovn-udn1@if105: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default 
          link/ether 0a:58:0a:c8:04:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
          inet 10.200.4.3/24 brd 10.200.4.255 scope global ovn-udn1
             valid_lft forever preferred_lft forever
          inet6 fe80::858:aff:fec8:403/64 scope link 
             valid_lft forever preferred_lft forever
      
      % oc exec  -n test2 test-rc-g4nv2  -- ip a show ovn-udn1    
      3: ovn-udn1@if136: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default 
          link/ether 0a:58:0a:c8:03:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
          inet 10.200.3.4/24 brd 10.200.3.255 scope global ovn-udn1
             valid_lft forever preferred_lft forever
          inet6 fe80::858:aff:fec8:304/64 scope link 
             valid_lft forever preferred_lft forever
      

      From udn client to access both udn pods, be able to access the pod on same node as client pod, but not able to access the pod on different node

      % oc rsh -n test2 hello-pod 
      ~ $ curl 10.200.3.4:8080
      Hello OpenShift!
      ~ $ curl 10.200.3.4:8080
      Hello OpenShift!
      ~ $ curl 10.200.3.4:8080
      Hello OpenShift!
      ~ $ curl 10.200.3.4:8080
      Hello OpenShift!
      ~ $ curl 10.200.4.3:8080
      
      % oc rsh -n test2 hello-pod 
      ~ $ curl 10.200.4.3:8080 --connect-timeout 10
      curl: (28) Connection timeout after 10000 ms
      

      Actual results:
      Cross node udn pods connection broken for layer2 after restarting ovn pods
      Expected results:
      p2p connection should not be broken
      Additional info:
      Same testing for layer3, no such issue.

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      •  

              mkennell@redhat.com Martin Kennelly
              huirwang Huiran Wang
              Huiran Wang Huiran Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: