Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48787

EgressIP snat rules missed or duplicated after restarting ovn pod for default network

    • Critical
    • Yes
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      EgressIP snat rules missed or duplicated after restarting ovn pod for default network

      Version-Release number of selected component (if applicable):
      Pre-merge testing for 'build openshift/api#2127,openshift/ovn-kubernetes#2422' on AWS

      % oc get clusterversion
      NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.18.0-0.ci.test-2025-01-23-014246-ci-ln-sckty2b-latest   True        False         6h13m   Cluster version is 4.18.0-0.ci.test-2025-01-23-014246-ci-ln-sckty2b-latest
      

      How reproducible:

      Steps to Reproduce:
      We have one auto case OCP-47021 frequently failed basically test is after restarting ovn pod which located same node as egress node, either duplicating snat rules left or egressIP was not applied to egress node.
      I ran it for this pre-merge testing 4 times, and got 4 times failed result, below is the recorded result. Compared the testing result for 4.17.0-0.nightly-2025-01-21-205102 (AWS), the same case 4 times run, 4 times passed. I think it might be a regression for 4.18. I know there is a similar bug before https://issues.redhat.com/browse/OCPBUGS-16217 which is originally for Azure tracking, this is might different.

      Considering the comparing testing result to 4.17 and egressIP is a hot feature that customer broadly used, so I raised a new bug for DEV to evaluate if needs to be fixed in 4.18.

      1. First run, lr-policy-list for egressIP lost, this is probably same as below 3rd or 4th run

      I0123 14:09:18.673247 46561 client.go:835] Running 'oc --kubeconfig=/tmp/kubeconfig rsh -n openshift-ovn-kubernetes ovnkube-node-s9rrc bash -c ovn-nbctl lr-policy-list ovn_cluster_router | grep -v inport'
         I0123 14:09:22.245998 46561 cloud_egressip_ovn.go:2968] Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy-node, kube-rbac-proxy-ovn-metrics, northd, nbdb, sbdb, ovnkube-controller, kubecfg-setup (init)
         Routing Policies
                102 (ip4.src == $a8519615025667110816 || ip4.src == $a13607449821398607916) && ip4.dst == $a712973235162149816           allow               pkt_mark=1008
                102 ip4.src == 10.128.0.0/14 && ip4.dst == 10.128.0.0/14           allow
                102 ip4.src == 10.128.0.0/14 && ip4.dst == 100.64.0.0/16           allow
                102                                     pkt.mark == 42           allow
      
      

      2. Second Run, snat rules duplicated

      I0123 14:18:05.489056 47011 client.go:835] Running 'oc --kubeconfig=/tmp/kubeconfig rsh -n openshift-ovn-kubernetes ovnkube-node-2xdkn bash -c ovn-nbctl lr-policy-list ovn_cluster_router | grep -v inport'
       I0123 14:18:11.074792 47011 cloud_egressip_ovn.go:2968] Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy-node, kube-rbac-proxy-ovn-metrics, northd, nbdb, sbdb, ovnkube-controller, kubecfg-setup (init)
       Routing Policies
              102 (ip4.src == $a8519615025667110816 || ip4.src == $a13607449821398607916) && ip4.dst == $a712973235162149816           allow               pkt_mark=1008
              102 ip4.src == 10.128.0.0/14 && ip4.dst == 10.128.0.0/14           allow
              102 ip4.src == 10.128.0.0/14 && ip4.dst == 100.64.0.0/16           allow
              102                                     pkt.mark == 42           allow
              100                             ip4.src == 10.130.2.11         reroute                100.88.0.6
      
      But snat rules are duplicated
      I0123 14:18:22.777244 47011 client.go:835] Running 'oc --kubeconfig=/tmp/kubeconfig rsh -n openshift-ovn-kubernetes ovnkube-node-dl6r8 bash -c ovn-nbctl --format=csv --no-heading find nat | grep egressip-47021'
      I0123 14:18:28.495443 47011 cloud_egressip_ovn.go:2990] Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy-node, kube-rbac-proxy-ovn-metrics, northd, nbdb, sbdb, ovnkube-controller, kubecfg-setup (init)
      34884527-2d67-47c7-a2bd-f53f6bc37c90,[],[],"{ip-family=ip4, ""k8s.ovn.org/id""=""default-network-controller:EgressIP:egressip-47021_e2e-test-networking-jcjmrcip-xqdqr/test-rc-nc9b2:ip4"", ""k8s.ovn.org/name""=""egressip-47021_e2e-test-networking-jcjmrcip-xqdqr/test-rc-nc9b2"", ""k8s.ovn.org/owner-controller""=default-network-controller, ""k8s.ovn.org/owner-type""=EgressIP}","""10.0.14.52""",[],"""""",[],"""10.130.2.11""",k8s-ip-10-0-4-178.us-east-2.compute.internal,"""""","{stateless=""false""}",0,snat
      1592e8de-e9b3-4817-b960-f014f47be332,[],[],"{ip-family=ip4, ""k8s.ovn.org/id""=""default-network-controller:EgressIP:egressip-47021_e2e-test-networking-tvjstjy9-ppzgs/test-rc-h9zgm:ip4"", ""k8s.ovn.org/name""=""egressip-47021_e2e-test-networking-tvjstjy9-ppzgs/test-rc-h9zgm"", ""k8s.ovn.org/owner-controller""=default-network-controller, ""k8s.ovn.org/owner-type""=EgressIP}","""10.0.14.52""",[],"""""",[],"""10.131.0.17""",k8s-ip-10-0-4-178.us-east-2.compute.internal,"""""","{stateless=""false""}",0,snat
      

      After run, manually checking the env.

      % oc get pods -n e2e-test-networking-jcjmrcip-xqdqr -o wide
      NAME            READY   STATUS    RESTARTS   AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES
      test-rc-nc9b2   1/1     Running   0          17m   10.130.2.11   ip-10-0-64-214.us-east-2.compute.internal   <none>           <none>
      % oc get egressip                                          
      NAME             EGRESSIPS    ASSIGNED NODE                              ASSIGNED EGRESSIPS
      egressip-47021   10.0.14.52   ip-10-0-4-178.us-east-2.compute.internal   10.0.14.52
      
      sh-5.1# ovn-nbctl --format=csv --no-heading find nat | grep egressip-47021
      34884527-2d67-47c7-a2bd-f53f6bc37c90,[],[],"{ip-family=ip4, ""k8s.ovn.org/id""=""default-network-controller:EgressIP:egressip-47021_e2e-test-networking-jcjmrcip-xqdqr/test-rc-nc9b2:ip4"", ""k8s.ovn.org/name""=""egressip-47021_e2e-test-networking-jcjmrcip-xqdqr/test-rc-nc9b2"", ""k8s.ovn.org/owner-controller""=default-network-controller, ""k8s.ovn.org/owner-type""=EgressIP}","""10.0.14.52""",[],"""""",[],"""10.130.2.11""",k8s-ip-10-0-4-178.us-east-2.compute.internal,"""""","{stateless=""false""}",0,snat
      1592e8de-e9b3-4817-b960-f014f47be332,[],[],"{ip-family=ip4, ""k8s.ovn.org/id""=""default-network-controller:EgressIP:egressip-47021_e2e-test-networking-tvjstjy9-ppzgs/test-rc-h9zgm:ip4"", ""k8s.ovn.org/name""=""egressip-47021_e2e-test-networking-tvjstjy9-ppzgs/test-rc-h9zgm"", ""k8s.ovn.org/owner-controller""=default-network-controller, ""k8s.ovn.org/owner-type""=EgressIP}","""10.0.14.52""",[],"""""",[],"""10.131.0.17""",k8s-ip-10-0-4-178.us-east-2.compute.internal,"""""","{stateless=""false""}",0,snat
      

      3. Third run
      Failed at lr-policy-list for egressIP missed

      I0123 14:42:59.665372 47401 client.go:835] Running 'oc --kubeconfig=/tmp/kubeconfig rsh -n openshift-ovn-kubernetes ovnkube-node-85gbm bash -c ovn-nbctl lr-policy-list ovn_cluster_router | grep -v inport'
         I0123 14:43:05.083150 47401 cloud_egressip_ovn.go:2968] Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy-node, kube-rbac-proxy-ovn-metrics, northd, nbdb, sbdb, ovnkube-controller, kubecfg-setup (init)
         Routing Policies
                102 (ip4.src == $a8519615025667110816 || ip4.src == $a13607449821398607916) && ip4.dst == $a712973235162149816           allow               pkt_mark=1008
                102 ip4.src == 10.128.0.0/14 && ip4.dst == 10.128.0.0/14           allow
                102 ip4.src == 10.128.0.0/14 && ip4.dst == 100.64.0.0/16           allow
                102                                     pkt.mark == 42           allow
      
      

      After run, manually checking the env,egressIP was not assigned to egress node

      % oc get egressip
      NAME             EGRESSIPS    ASSIGNED NODE   ASSIGNED EGRESSIPS
      egressip-47021   10.0.7.233                   
      
      The egress node does having the egress label 
      % oc get nodes --show-labels | grep egress
      ip-10-0-4-178.us-east-2.compute.internal    Ready    worker                 4h30m   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m6i.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,k8s.ovn.org/egress-assignable=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-4-178.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m6i.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.k8s.aws/zone-id=use2-az1,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
      
      
      From running log, egressIP was assigned to egress node before restarting ovn pod
      I0123 14:41:01.466637 47401 client.go:835] Running 'oc --kubeconfig=/tmp/kubeconfig get egressip egressip-47021 -ojsonpath={.status.items}'
      I0123 14:41:02.649955 47401 utils.go:1273] egressIPStatus: [{"egressIP":"10.0.21.18","node":"ip-10-0-4-178.us-east-2.compute.internal"}]
      

      4. Fourth run

      Failed at same point as third run 
      I0123 15:01:31.048480 47820 client.go:835] Running 'oc --kubeconfig=/tmp/kubeconfig rsh -n openshift-ovn-kubernetes ovnkube-node-w2pr7 bash -c ovn-nbctl lr-policy-list ovn_cluster_router | grep -v inport'
          I0123 15:01:36.473679 47820 cloud_egressip_ovn.go:2968] Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy-node, kube-rbac-proxy-ovn-metrics, northd, nbdb, sbdb, ovnkube-controller, kubecfg-setup (init)
          Routing Policies
                 102 (ip4.src == $a8519615025667110816 || ip4.src == $a13607449821398607916) && ip4.dst == $a712973235162149816           allow               pkt_mark=1008
                 102 ip4.src == 10.128.0.0/14 && ip4.dst == 10.128.0.0/14           allow
                 102 ip4.src == 10.128.0.0/14 && ip4.dst == 100.64.0.0/16           allow
                 102                                     pkt.mark == 42           allow
      
      

      Manually checking

      % oc get egressip
      NAME             EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
      egressip-47021   10.0.18.229     
      
      % oc get nodes --show-labels | grep egress
      ip-10-0-4-178.us-east-2.compute.internal    Ready    worker                 4h47m   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m6i.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,k8s.ovn.org/egress-assignable=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-4-178.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m6i.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.k8s.aws/zone-id=use2-az1,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
      
      % oc describe node ip-10-0-4-178.us-east-2.compute.internal | grep -C3 egress-ipconfig
                          topology.k8s.aws/zone-id=use2-az1
                          topology.kubernetes.io/region=us-east-2
                          topology.kubernetes.io/zone=us-east-2a
      Annotations:        cloud.network.openshift.io/egress-ipconfig:
                            [{"interface":"eni-040e5f15c9a473467","ifaddr":{"ipv4":"10.0.0.0/19"},"capacity":{"ipv4":14,"ipv6":15}}]
                          csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-002c83cf3533356c7"}
                          k8s.ovn.org/bridge-egress-ips: []
      

      3.

      Actual results:

      Expected results:

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      Is it an

      1. internal CI failure
      2. customer issue / SD
      3. internal RedHat testing failure

      If it is an internal RedHat testing failure:

      • Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

      If it is a CI failure:

      • Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
      • Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
      • Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
      • When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
      • If it's a connectivity issue,
      • What is the srcNode, srcIP and srcNamespace and srcPodName?
      • What is the dstNode, dstIP and dstNamespace and dstPodName?
      • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

      If it is a customer / SD issue:

      • Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
      • Don’t presume that Engineering has access to Salesforce.
      • Do presume that Engineering will access attachments through supportshell.
      • Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
      • Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
        • If the issue is in a customer namespace then provide a namespace inspect.
        • If it is a connectivity issue:
          • What is the srcNode, srcNamespace, srcPodName and srcPodIP?
          • What is the dstNode, dstNamespace, dstPodName and dstPodIP?
          • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
          • Please provide the UTC timestamp networking outage window from must-gather
          • Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
        • If it is not a connectivity issue:
          • Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.
      • When showing the results from commands, include the entire command in the output.  
      • For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
      • For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
      • Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
      • Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
      • For guidance on using this template please see
        OCPBUGS Template Training for Networking  components

            [OCPBUGS-48787] EgressIP snat rules missed or duplicated after restarting ovn pod for default network

            Cannot reproduce the issue in 4.18.0-0.nightly-2025-02-05-033447, move it to closed.

            Huiran Wang added a comment - Cannot reproduce the issue in 4.18.0-0.nightly-2025-02-05-033447, move it to closed.

            jechen@redhat.com Any update regarding trying to reproduce this while I was away? Otherwise, we may have to close. Thanks.

            Martin Kennelly added a comment - jechen@redhat.com Any update regarding trying to reproduce this while I was away? Otherwise, we may have to close. Thanks.

            Jaime Caamaño Ruiz added a comment - - edited

            Dropped blocker status for the time being, as this is not easy to reproduce. jechen@redhat.com also not being able to reproduce now.

            Jaime Caamaño Ruiz added a comment - - edited Dropped blocker status for the time being, as this is not easy to reproduce. jechen@redhat.com also not being able to reproduce now.

            I can't reproduce. I have had a script in a loop continuously doing this and not detecting any of the issues described on this bug.

            I used latest upstream, on kind with a default shared gateway configuration.

            ❯ git lo -1
            4e18d5bb3 (HEAD -> master, upstream/master) Check if cluster manager controller has retry pod framework
            

            The script is just killing the ovn pod on the egress node and checking for the snat of the egress ip, and the reroute of the pod

            #!/bin/bash
            
            while true; do
              POD=$(kubectl get pods -n ovn-kubernetes  --field-selector spec.nodeName=ovn-worker -l app=ovnkube-node -o jsonpath='{.items[0].metadata.name}')
              [ "$(kubectl exec -ti -n ovn-kubernetes ${POD} -c ovnkube-controller -- ovn-nbctl lr-nat-list GR_ovn-worker | grep 172.18.0.10 | wc -l)" == "1" ] || break
              [ "$(kubectl exec -ti -n ovn-kubernetes ovnkube-node-hmxtk -c ovnkube-controller -- ovn-nbctl lr-policy-list ovn_cluster_router | grep 10.244.1.3 | wc -l)" == "1" ] || break
              kubectl delete pod -n ovn-kubernetes ${POD}
              sleep 10
            done
            

            It ran endlessly for me for over 100 iterations.

            Jaime Caamaño Ruiz added a comment - I can't reproduce. I have had a script in a loop continuously doing this and not detecting any of the issues described on this bug. I used latest upstream, on kind with a default shared gateway configuration. ❯ git lo -1 4e18d5bb3 (HEAD -> master, upstream/master) Check if cluster manager controller has retry pod framework The script is just killing the ovn pod on the egress node and checking for the snat of the egress ip, and the reroute of the pod #!/bin/bash while true; do POD=$(kubectl get pods -n ovn-kubernetes --field-selector spec.nodeName=ovn-worker -l app=ovnkube-node -o jsonpath='{.items[0].metadata.name}') [ "$(kubectl exec -ti -n ovn-kubernetes ${POD} -c ovnkube-controller -- ovn-nbctl lr-nat-list GR_ovn-worker | grep 172.18.0.10 | wc -l)" == "1" ] || break [ "$(kubectl exec -ti -n ovn-kubernetes ovnkube-node-hmxtk -c ovnkube-controller -- ovn-nbctl lr-policy-list ovn_cluster_router | grep 10.244.1.3 | wc -l)" == "1" ] || break kubectl delete pod -n ovn-kubernetes ${POD} sleep 10 done It ran endlessly for me for over 100 iterations.

            Jean Chen added a comment -

            Jean Chen added a comment - tested with pre-merged image built from https://github.com/openshift/ovn-kubernetes/pull/2420   Test passed first 6 times, failed on 7th run: https://jenkins-csb-openshift-qe-mastern.dno.corp.redhat.com/job/ocp-common/job/ginkgo-test/285529/console

            Ive tried 10 times so far and cannot reproduce - trying on a fresh cluster.

            Martin Kennelly added a comment - Ive tried 10 times so far and cannot reproduce - trying on a fresh cluster.

            Martin Kennelly added a comment - Asked QE for help since Huiran is on PTO https://redhat-internal.slack.com/archives/CLDDW02SJ/p1737638438581949  

            Martin Kennelly added a comment - i can see in the logs that it looks like the bug fixed by https://github.com/openshift/ovn-kubernetes/pull/2420/commits/d940cab5826e5c5415797fb17dc12799aec9a85b  

            Executed the test a few times on the test cluster given to me by huirwang  and it passed. Trying to see if theres a flake. I know theres an issue thats fixed if ovnkube-controller is restarted and if theres a EIP assigned to the Node. Its being brought downstream here https://github.com/openshift/ovn-kubernetes/pull/2420

            Going to inspect MG.

             

            Martin Kennelly added a comment - Executed the test a few times on the test cluster given to me by huirwang   and it passed. Trying to see if theres a flake. I know theres an issue thats fixed if ovnkube-controller is restarted and if theres a EIP assigned to the Node. Its being brought downstream here https://github.com/openshift/ovn-kubernetes/pull/2420 Going to inspect MG.  

              mkennell@redhat.com Martin Kennelly
              huirwang Huiran Wang
              Huiran Wang Huiran Wang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: