Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20220

[Multi-NIC]Egress traffic connection got timeout after remove another pod label

XMLWordPrintable

    • Moderate
    • No
    • SDN Sprint 243, SDN Sprint 245, SDN Sprint 246, SDN Sprint 247, SDN Sprint 248, SDN Sprint 249
    • 6
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, if a pod selected by an EgressIP through a secondary interface had its label removed, another pod in the same namespace would also lose its EgressIP assignment, breaking its connection to the external host. This issue has been resolved. Now, when a pod’s label is removed and it stops using the EgressIP, other pods with the matching label continue to use the EgressIP without interruption. (link:https://issues.redhat.com/browse/OCPBUGS-20220[*OCPBUGS-20220*])
      Show
      * Previously, if a pod selected by an EgressIP through a secondary interface had its label removed, another pod in the same namespace would also lose its EgressIP assignment, breaking its connection to the external host. This issue has been resolved. Now, when a pod’s label is removed and it stops using the EgressIP, other pods with the matching label continue to use the EgressIP without interruption. (link: https://issues.redhat.com/browse/OCPBUGS-20220 [* OCPBUGS-20220 *])
    • Bug Fix
    • Done

      Description of problem:

      [Multi-NIC]Egress traffic connect got timeout after remove another pod label in same namespace
      
      

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-10-08-024357
      
      
      

      How reproducible:

      Always
      
      

      Steps to Reproduce:

      1. Label one node as egress node
      2. Create an egressIP object, egressIP was assigned to egress node secondary interface
      # oc get egressip -o yaml
      apiVersion: v1
      items:
      - apiVersion: k8s.ovn.org/v1
        kind: EgressIP
        metadata:
          annotations:
            kubectl.kubernetes.io/last-applied-configuration: |
              {"apiVersion":"k8s.ovn.org/v1","kind":"EgressIP","metadata":{"annotations":{},"name":"egressip-66293"},"spec":{"egressIPs":["172.22.0.190"],"namespaceSelector":{"matchLabels":{"org":"qe"}},"podSelector":{"matchLabels":{"color":"pink"}}}}
          creationTimestamp: "2023-10-08T07:28:04Z"
          generation: 2
          name: egressip-66293
          resourceVersion: "461590"
          uid: f1ca3483-63f1-4f31-99b0-e6a55161c285
        spec:
          egressIPs:
          - 172.22.0.190
          namespaceSelector:
            matchLabels:
              org: qe
          podSelector:
            matchLabels:
              color: pink
        status:
          items:
          - egressIP: 172.22.0.190
            node: worker-0
      kind: List
      metadata:
        resourceVersion: ""
      3. Created a namespace and two pod under it. 
      % oc get pods -n hrw -o wide
      NAME         READY   STATUS    RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
      hello-pod    1/1     Running   0          6m46s   10.129.2.7    worker-1   <none>           <none>
      hello-pod1   1/1     Running   0          6s      10.131.0.14   worker-0   <none>           <none>
      
      4. Add label org=qe to namespace hrw
      # oc get ns hrw --show-labels
      NAME   STATUS   AGE   LABELS
      hrw    Active   21m   kubernetes.io/metadata.name=hrw,*org=qe,*pod-security.kubernetes.io/audit-version=v1.24,pod-security.kubernetes.io/audit=restricted,pod-security.kubernetes.io/warn-version=v1.24,pod-security.kubernetes.io/warn=restricted
      
      5. At this time, from both pods to access external endpoint, succeeded. 
      % oc rsh -n hrw hello-pod 
      ~ $ curl 172.22.0.1 --connect-timeout 5
      <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
      <html><head>
      <title>404 Not Found</title>
      </head><body>
      <h1>Not Found</h1>
      <p>The requested URL was not found on this server.</p>
      </body></html>
      ~ $ exit
       % oc rsh -n hrw hello-pod1 
      ~ $ curl 172.22.0.1 --connect-timeout 5
      <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
      <html><head>
      <title>404 Not Found</title>
      </head><body>
      <h1>Not Found</h1>
      <p>The requested URL was not found on this server.</p>
      </body></html>
      
      6. Add label color=pink to both pods
       % oc label pod hello-pod color=pink -n hrw
      pod/hello-pod labeled
       % oc label pod hello-pod1 color=pink -n hrw
      pod/hello-pod1 labeled
      
      7. Both pods can access external endpoint.
      8. Remove label color=pink from pod hello-pod
      % oc label pod hello-pod color- -n hrw     
      pod/hello-pod unlabeled
      
      
      
      

      Actual results:

      
      Access external endpoint from the pod which keep the label got connect timeout
       % oc rsh -n hrw hello-pod1            
      ~ $ curl 172.22.0.1 --connect-timeout 5
      curl: (28) Connection timeout after 5000 ms
      ~ $ 
      ~ $ 
      ~ $ curl 172.22.0.1 --connect-timeout 5
      curl: (28) Connection timeout after 5000 ms
      
      Note the label was removed from hello-pod , but try to access external endpoint from another pod, here hello-pod1 which should still use egressIP and be able to access
      
      

      Expected results:

      Should be able to access external endpoint
      
      

      Additional info:

      
      

            mkennell@redhat.com Martin Kennelly
            huirwang Huiran Wang
            Huiran Wang Huiran Wang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: