Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55366

[UDN L3]nodePort with ETP=cluster service is not working for UDN on LGW mode

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When testing nodePort with ETP=cluster on LGW mode.  und pod cannot access it's own nodePort service. 

       

      # oc get pod -n blue -o wide --show-labels
      NAME            READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES   LABELS
      test-rc-8m5jw   1/1     Running   1          27h   10.129.2.8    worker-2   <none>           <none>            name=client
      test-rc-krdjc   1/1     Running   1          27h   10.131.0.21   worker-1   <none>           <none>            name=test-pods
      
      
      # oc get svc  hello-pod -n blue -o yaml
      apiVersion: v1
      kind: Service
      metadata:
        creationTimestamp: "2025-04-24T07:20:52Z"
        labels:
          name: hello-pod
        name: hello-pod
        namespace: blue
        resourceVersion: "127354"
        uid: df4f173e-823c-473a-ada4-c73f33395375
      spec:
        clusterIP: 172.30.34.214
        clusterIPs:
        - 172.30.34.214
        externalTrafficPolicy: Cluster
        internalTrafficPolicy: Cluster
        ipFamilies:
        - IPv4
        ipFamilyPolicy: SingleStack
        ports:
        - name: http
          nodePort: 32705
          port: 27017
          protocol: TCP
          targetPort: 8080
        selector:
          name: test-pods
        sessionAffinity: None
        type: NodePort
      status:
        loadBalancer: {}
      
      
      

      this works well on SGW mode,  but failed after converting to LGW mode. 

      # oc rsh -n blue test-rc-8m5jw
      ~ $ curl 192.168.111.20:32705
      ^C
      ~ $ curl 192.168.111.20:32705
      curl: (56) Recv failure: Connection reset by peer
      
      
      

       

      ##### tcpdump from client node
      
      sh-5.1# tcpdump -i any -nn port 32705
      tcpdump: data link type LINUX_SLL2
      dropped privs to tcpdump
      tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
      listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
      10:09:21.881927 c9bf4e28689f2_3 P   IP 20.100.1.4.46378 > 192.168.111.20.32705: Flags [S], seq 690069817, win 65280, options [mss 1360,sackOK,TS val 2475200319 ecr 0,nop,wscale 7], length 0
      10:09:21.882635 ovn-k8s-mp1 In  IP 169.254.0.12.46378 > 192.168.111.20.32705: Flags [S], seq 690069817, win 65280, options [mss 1360,sackOK,TS val 2475200319 ecr 0,nop,wscale 7], length 0
      10:09:21.882684 br-ex Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [S], seq 690069817, win 65280, options [mss 1360,sackOK,TS val 2475200319 ecr 0,nop,wscale 7], length 0
      10:09:21.882886 enp2s0 Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [S], seq 690069817, win 65280, options [mss 1360,sackOK,TS val 2475200319 ecr 0,nop,wscale 7], length 0
      10:09:21.885908 enp2s0 In  IP 192.168.111.20.32705 > 192.168.111.25.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493850769 ecr 2475200319,nop,wscale 7], length 0
      10:09:21.885914 br-ex In  IP 192.168.111.20.32705 > 192.168.111.25.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493850769 ecr 2475200319,nop,wscale 7], length 0
      10:09:21.885928 ovn-k8s-mp1 Out IP 192.168.111.20.32705 > 169.254.0.12.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493850769 ecr 2475200319,nop,wscale 7], length 0
      10:09:21.886423 c9bf4e28689f2_3 Out IP 192.168.111.20.32705 > 20.100.1.4.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493850769 ecr 2475200319,nop,wscale 7], length 0
      10:09:21.886458 c9bf4e28689f2_3 P   IP 20.100.1.4.46378 > 192.168.111.20.32705: Flags [.], ack 1, win 510, options [nop,nop,TS val 2475200324 ecr 2493850769], length 0
      10:09:21.886505 c9bf4e28689f2_3 P   IP 20.100.1.4.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200324 ecr 2493850769], length 84
      10:09:21.886749 ovn-k8s-mp1 In  IP 169.254.0.12.46378 > 192.168.111.20.32705: Flags [.], ack 1, win 510, options [nop,nop,TS val 2475200324 ecr 2493850769], length 0
      10:09:21.886768 br-ex Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [.], ack 1, win 510, options [nop,nop,TS val 2475200324 ecr 2493850769], length 0
      10:09:21.886780 ovn-k8s-mp1 In  IP 169.254.0.12.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200324 ecr 2493850769], length 84
      10:09:21.886798 br-ex Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200324 ecr 2493850769], length 84
      10:09:21.886941 enp2s0 Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [.], ack 1, win 510, options [nop,nop,TS val 2475200324 ecr 2493850769], length 0
      10:09:21.886964 enp2s0 Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200324 ecr 2493850769], length 84
      10:09:22.092081 c9bf4e28689f2_3 P   IP 20.100.1.4.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200530 ecr 2493850769], length 84
      10:09:22.092129 ovn-k8s-mp1 In  IP 169.254.0.12.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200530 ecr 2493850769], length 84
      10:09:22.092161 br-ex Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200530 ecr 2493850769], length 84
      10:09:22.092172 enp2s0 Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200530 ecr 2493850769], length 84
      10:09:22.300071 c9bf4e28689f2_3 P   IP 20.100.1.4.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200738 ecr 2493850769], length 84
      10:09:22.300125 ovn-k8s-mp1 In  IP 169.254.0.12.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200738 ecr 2493850769], length 84
      10:09:22.300151 br-ex Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200738 ecr 2493850769], length 84
      10:09:22.300160 enp2s0 Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475200738 ecr 2493850769], length 84
      10:09:22.716074 c9bf4e28689f2_3 P   IP 20.100.1.4.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475201154 ecr 2493850769], length 84
      10:09:22.716132 ovn-k8s-mp1 In  IP 169.254.0.12.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475201154 ecr 2493850769], length 84
      10:09:22.716156 br-ex Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475201154 ecr 2493850769], length 84
      10:09:22.716166 enp2s0 Out IP 192.168.111.25.46378 > 192.168.111.20.32705: Flags [P.], seq 1:85, ack 1, win 510, options [nop,nop,TS val 2475201154 ecr 2493850769], length 84
      10:09:22.886119 enp2s0 In  IP 192.168.111.20.32705 > 192.168.111.25.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493851770 ecr 2475200319,nop,wscale 7], length 0
      10:09:22.886133 br-ex In  IP 192.168.111.20.32705 > 192.168.111.25.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493851770 ecr 2475200319,nop,wscale 7], length 0
      10:09:22.886156 ovn-k8s-mp1 Out IP 192.168.111.20.32705 > 169.254.0.12.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493851770 ecr 2475200319,nop,wscale 7], length 0
      10:09:22.886202 c9bf4e28689f2_3 Out IP 192.168.111.20.32705 > 20.100.1.4.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493851770 ecr 2475200319,nop,wscale 7], length 0
      10:09:22.886218 c9bf4e28689f2_3 P   IP 20.100.1.4.46378 > 192.168.111.20.32705: Flags [.], ack 1, win 510, options [nop,nop,TS val 2475201324 ecr 2493850769], length 0
      10:09:22.886249 ovn-k8s-mp1 In  IP 169.254.0.12.46378 > 192.168.111.20.32705: Flags [.], ack 1, win 510, options [nop,nop,TS val 2475201324 ecr 2493850769], length 0 

       

       

       

       

       

      tcpdump from server pod:
      
      sh-5.1# tcpdump -i 08d49b3a853fc_3 -nn 
      dropped privs to tcpdump
      tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
      listening on 08d49b3a853fc_3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
      10:09:21.884614 IP 100.65.0.3.46378 > 20.100.5.5.8080: Flags [S], seq 690069817, win 65280, options [mss 1360,sackOK,TS val 2475200319 ecr 0,nop,wscale 7], length 0
      10:09:21.884665 IP 20.100.5.5.8080 > 100.65.0.3.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493850769 ecr 2475200319,nop,wscale 7], length 0
      10:09:22.885497 IP 20.100.5.5.8080 > 100.65.0.3.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493851770 ecr 2475200319,nop,wscale 7], length 0
      10:09:24.933487 IP 20.100.5.5.8080 > 100.65.0.3.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493853818 ecr 2475200319,nop,wscale 7], length 0
      10:09:26.982417 ARP, Request who-has 20.100.5.1 tell 20.100.5.5, length 28
      10:09:26.983132 ARP, Reply 20.100.5.1 is-at 0a:58:14:64:05:01, length 28
      10:09:28.965464 IP 20.100.5.5.8080 > 100.65.0.3.46378: Flags [S.], seq 535529498, ack 690069818, win 64704, options [mss 1360,sackOK,TS val 2493857850 ecr 2475200319,nop,wscale 7], length 0 

       

       

      Sounds like from above tcpdump show:

      client --> SYN.  

      server -> SYN+ACK

      client --> receive SYN+ACK and then send ACK

      but server cannot receive ACK

       

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:

      1.   Create CUDN and namespace,  pods (client, server)

      2.   Create nodeport service with ETP=Cluster

      3.    switch to LGW

      4.  curl master-nodeip: nodePort from udn-client pod

      Actual results:

      nodePort service cannot be accessed. 

      Expected results:

      Additional info:

      SGW mode works well with same configuration

       

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      Is it an

      1. internal CI failure
      2. customer issue / SD
      3. internal RedHat testing failure

      If it is an internal RedHat testing failure:

      • Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

      If it is a CI failure:

      • Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
      • Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
      • Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
      • When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
      • If it's a connectivity issue,
      • What is the srcNode, srcIP and srcNamespace and srcPodName?
      • What is the dstNode, dstIP and dstNamespace and dstPodName?
      • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

      If it is a customer / SD issue:

      • Provide enough information in the bug description that Engineering doesn't need to read the entire case history.
      • Don't presume that Engineering has access to Salesforce.
      • Do presume that Engineering will access attachments through supportshell.
      • Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
      • Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
        • If the issue is in a customer namespace then provide a namespace inspect.
        • If it is a connectivity issue:
          • What is the srcNode, srcNamespace, srcPodName and srcPodIP?
          • What is the dstNode, dstNamespace, dstPodName and dstPodIP?
          • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
          • Please provide the UTC timestamp networking outage window from must-gather
          • Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
        • If it is not a connectivity issue:
          • Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.
      • When showing the results from commands, include the entire command in the output.  
      • For OCPBUGS in which the issue has been identified, label with "sbr-triaged"
      • For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with "sbr-untriaged"
      • Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
      • Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template"
      • For guidance on using this template please see
        OCPBUGS Template Training for Networking  components

              sdn-team-bot sdn-team bot
              zzhao1@redhat.com Zhanqi Zhao
              None
              None
              Zhanqi Zhao Zhanqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: