Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38785

[GCP]After deleting egress node, egressIP didn't failover to another available egress node

XMLWordPrintable

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The issue was found when verifying bug https://issues.redhat.com/browse/OCPBUGS-38653, as it's not same as original issue, open a new bug to track.
      
          

      Version-Release number of selected component (if applicable):

      The build from openshift/ovn-kubernetes#2265
      
          

      How reproducible:

      
          

      Steps to Reproduce:

          oc get nodes            
      NAME                                  STATUS   ROLES                  AGE     VERSION
      huirwang-08215-tkrrg-master-0         Ready    control-plane,master   154m    v1.30.3
      huirwang-08215-tkrrg-master-1         Ready    control-plane,master   154m    v1.30.3
      huirwang-08215-tkrrg-master-2         Ready    control-plane,master   154m    v1.30.3
      huirwang-08215-tkrrg-worker-a-czr4g   Ready    worker                 144m    v1.30.3
      huirwang-08215-tkrrg-worker-b-hsgxf   Ready    worker                 61s     v1.30.3
      huirwang-08215-tkrrg-worker-b-xd7pv   Ready    worker                 3m10s   v1.30.3
      huirwang-08215-tkrrg-worker-c-7lmrf   Ready    worker                 143m    v1.30.3
      huirwang-08215-tkrrg-worker-f-sgskm   Ready    worker                 27m     v1.30.3
      
      Apply egress label to node huirwang-08215-tkrrg-worker-b-hsgxf, huirwang-08215-tkrrg-worker-f-sgskm, huirwang-08215-tkrrg-worker-c-7lmrf
      
      Create egressIP object
      % oc get egressip -o yaml
      apiVersion: v1
      items:
      - apiVersion: k8s.ovn.org/v1
        kind: EgressIP
        metadata:
          creationTimestamp: "2024-08-21T06:26:29Z"
          generation: 3
          name: egressip-2
          resourceVersion: "99711"
          uid: 082100dc-1012-47e5-95c1-e0aa4faf97d9
        spec:
          egressIPs:
          - 10.0.128.101
          - 10.0.128.100
          namespaceSelector:
            matchLabels:
              name: qe
        status:
          items:
          - egressIP: 10.0.128.100
            node: huirwang-08215-tkrrg-worker-f-sgskm
          - egressIP: 10.0.128.101
            node: huirwang-08215-tkrrg-worker-c-7lmrf
      kind: List
      metadata:
        resourceVersion: ""
      
      Create namespace test and pods in test namespace, apply label name=qe to namespace test
      % oc get pods -n test -o wide
      NAME            READY   STATUS    RESTARTS   AGE   IP            NODE                                  NOMINATED NODE   READINESS GATES
      test-rc-547v8   1/1     Running   0          17m   10.130.2.9    huirwang-08215-tkrrg-worker-f-sgskm   <none>           <none>
      test-rc-x9hd4   1/1     Running   0          17m   10.131.0.30   huirwang-08215-tkrrg-worker-a-czr4g   <none>           <none>
      
      % oc -n openshift-ovn-kubernetes exec ${ovn_pod} -c northd -- ovn-nbctl find logical_router_policy match="\"ip4.src == ${podip}\""
      _uuid               : 8a52ad1c-ffe6-4039-b4c4-59221fb9da98
      action              : reroute
      bfd_sessions        : []
      external_ids        : {name=egressip-2}
      match               : "ip4.src == 10.131.0.30"
      nexthop             : []
      nexthops            : ["100.88.0.7", "100.88.0.8"]
      options             : {}
      priority            : 100
      
      % echo $LSP_ADDRESSES
      (tstor-huirwang-08215-tkrrg-master-0) 0a:58:64:58:00:02 100.88.0.2/16
      (tstor-huirwang-08215-tkrrg-master-1) 0a:58:64:58:00:03 100.88.0.3/16
      (tstor-huirwang-08215-tkrrg-master-2) 0a:58:64:58:00:04 100.88.0.4/16
      (tstor-huirwang-08215-tkrrg-worker-a-czr4g) 0a:58:64:58:00:05 100.88.0.5/16
      (tstor-huirwang-08215-tkrrg-worker-b-hsgxf) 0a:58:64:58:00:0a 100.88.0.10/16
      (tstor-huirwang-08215-tkrrg-worker-b-xd7pv) 0a:58:64:58:00:09 100.88.0.9/16
      (tstor-huirwang-08215-tkrrg-worker-c-7lmrf) 0a:58:64:58:00:07 100.88.0.7/16
      (tstor-huirwang-08215-tkrrg-worker-f-sgskm) 0a:58:64:58:00:08 100.88.0.8/16
      
      Delete one egress node
      % oc delete node huirwang-08215-tkrrg-worker-c-7lmrf
      node "huirwang-08215-tkrrg-worker-c-7lmrf" deleted
      
      Result:
      % oc get egressip -o yaml
      apiVersion: v1
      items:
      - apiVersion: k8s.ovn.org/v1
        kind: EgressIP
        metadata:
          creationTimestamp: "2024-08-21T06:26:29Z"
          generation: 4
          name: egressip-2
          resourceVersion: "100274"
          uid: 082100dc-1012-47e5-95c1-e0aa4faf97d9
        spec:
          egressIPs:
          - 10.0.128.101
          - 10.0.128.100
          namespaceSelector:
            matchLabels:
              name: qe
        status:
          items:
          - egressIP: 10.0.128.100
            node: huirwang-08215-tkrrg-worker-f-sgskm
      kind: List
      metadata:
        resourceVersion: ""
      % oc -n openshift-ovn-kubernetes exec ${ovn_pod} -c northd -- ovn-nbctl find logical_router_policy match="\"ip4.src == ${podip}\""
      _uuid               : 8a52ad1c-ffe6-4039-b4c4-59221fb9da98
      action              : reroute
      bfd_sessions        : []
      external_ids        : {name=egressip-2}
      match               : "ip4.src == 10.131.0.30"
      nexthop             : []
      nexthops            : ["100.88.0.8"]
      options             : {}
      priority            : 100
      
      The egressIP didn't failover to another available egress node huirwang-08215-tkrrg-worker-b-hsgxf, we can see the egress label was applied on this node. 
      % oc get node huirwang-08215-tkrrg-worker-b-hsgxf  --show-labels
      NAME                                  STATUS   ROLES    AGE   VERSION   LABELS
      huirwang-08215-tkrrg-worker-b-hsgxf   Ready    worker   28m   v1.30.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n2-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,k8s.ovn.org/egress-assignable=,kubernetes.io/arch=amd64,kubernetes.io/hostname=huirwang-08215-tkrrg-worker-b-hsgxf,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n2-standard-4,node.openshift.io/os_id=rhcos,topology.gke.io/zone=us-central1-b,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-b
      
      
      
          

      Actual results:

      EgressIP was not failover to another egress node
      oc get CloudPrivateIPConfig 10.0.128.101 -o yaml
      apiVersion: cloud.network.openshift.io/v1
      kind: CloudPrivateIPConfig
      metadata:
        annotations:
          k8s.ovn.org/egressip-owner-ref: egressip-2
        creationTimestamp: "2024-08-21T06:26:29Z"
        finalizers:
        - cloudprivateipconfig.cloud.network.openshift.io/finalizer
        generation: 2
        name: 10.0.128.101
        resourceVersion: "103371"
        uid: 357a3e26-254d-40af-a4b2-c95f9e2b7cee
      spec:
        node: huirwang-08215-tkrrg-worker-b-hsgxf
      status:
        conditions:
        - lastTransitionTime: "2024-08-21T06:33:44Z"
          message: 'Error processing cloud assignment request, err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP
            ''10.0.128.101/32'' is already being used by another resource. "}]}'
          observedGeneration: 2
          reason: CloudResponseError
          status: "False"
          type: Assigned
        node: huirwang-08215-tkrrg-worker-b-hsgxf
      
          

      Expected results:

      Should be able to failover to egress node
      
          

      Additional info:

      
          

              pdiak@redhat.com Patryk Diak
              huirwang Huiran Wang
              Huiran Wang Huiran Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: