Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13802

EgressIP was NOT migrated to correct workers after deleting machine it was assigned in GCP XPN cluster.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Normal
    • None
    • 4.13
    • None
    • SDN Sprint 236, SDN Sprint 237
    • 2
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-11187. The following is the description of the original issue:

      Description of problem:

      EgressIP was NOT migrated to correct workers after deleting machine it was assigned in GCP XPN cluster.
      
      

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-03-29-235439
      
      

      How reproducible:

      Always
      
      

      Steps to Reproduce:

      1. Set up GCP XPN cluster.
      2. Scale two new worker nodes
      % oc scale --replicas=2 machineset huirwang-0331a-m4mws-worker-c -n openshift-machine-api        
      machineset.machine.openshift.io/huirwang-0331a-m4mws-worker-c scaled
      
      3. Wait the two new workers node ready.
       % oc get machineset -n openshift-machine-api
      NAME                            DESIRED   CURRENT   READY   AVAILABLE   AGE
      huirwang-0331a-m4mws-worker-a   1         1         1       1           86m
      huirwang-0331a-m4mws-worker-b   1         1         1       1           86m
      huirwang-0331a-m4mws-worker-c   2         2         2       2           86m
      huirwang-0331a-m4mws-worker-f   0         0                             86m
      % oc get nodes
      NAME                                                          STATUS   ROLES                  AGE     VERSION
      huirwang-0331a-m4mws-master-0.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
      huirwang-0331a-m4mws-master-1.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
      huirwang-0331a-m4mws-master-2.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
      huirwang-0331a-m4mws-worker-a-hfqsn.c.openshift-qe.internal   Ready    worker                 71m     v1.26.2+dc93b13
      huirwang-0331a-m4mws-worker-b-vbqf2.c.openshift-qe.internal   Ready    worker                 71m     v1.26.2+dc93b13
      huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   Ready    worker                 8m22s   v1.26.2+dc93b13
      huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal   Ready    worker                 8m22s   v1.26.2+dc93b13
      3. Label one new worker node as egress node
       % oc label node huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal k8s.ovn.org/egress-assignable="" 
      node/huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal labeled
      
      4. Create egressIP object
      oc get egressIP
      NAME         EGRESSIPS     ASSIGNED NODE                                                 ASSIGNED EGRESSIPS
      egressip-1   10.0.32.100   huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   10.0.32.100
      5. Label second new worker node as egress node 
      % oc label node huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal k8s.ovn.org/egress-assignable="" 
      node/huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal labeled
      6. Delete the assigned egress node
      % oc delete machines.machine.openshift.io huirwang-0331a-m4mws-worker-c-rhbkr  -n openshift-machine-api
      machine.machine.openshift.io "huirwang-0331a-m4mws-worker-c-rhbkr" deleted
       % oc get nodes
      NAME                                                          STATUS   ROLES                  AGE   VERSION
      huirwang-0331a-m4mws-master-0.c.openshift-qe.internal         Ready    control-plane,master   87m   v1.26.2+dc93b13
      huirwang-0331a-m4mws-master-1.c.openshift-qe.internal         Ready    control-plane,master   86m   v1.26.2+dc93b13
      huirwang-0331a-m4mws-master-2.c.openshift-qe.internal         Ready    control-plane,master   87m   v1.26.2+dc93b13
      huirwang-0331a-m4mws-worker-a-hfqsn.c.openshift-qe.internal   Ready    worker                 76m   v1.26.2+dc93b13
      huirwang-0331a-m4mws-worker-b-vbqf2.c.openshift-qe.internal   Ready    worker                 76m   v1.26.2+dc93b13
      huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal   Ready    worker                 13m   v1.26.2+dc93b13
      29468 W0331 02:48:34.917391       1 egressip_healthcheck.go:162] Could not connect to huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal (10.129.4.2:9107): context       deadline exceeded
      29469 W0331 02:48:34.917417       1 default_network_controller.go:903] Node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal is not ready, deleting it from egre      ss assignment
      29470 I0331 02:48:34.917590       1 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:Logical_Switch_Port Row:map[o      ptions:{GoMap:map[router-port:rtoe-GR_huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column       _uuid == {6efd3c58-9458-44a2-a43b-e70e669efa72}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
      29471 E0331 02:48:34.920766       1 egressip.go:993] Allocator error: EgressIP: egressip-1 assigned to node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal whi      ch is not reachable, will attempt rebalancing
      29472 E0331 02:48:34.920789       1 egressip.go:997] Allocator error: EgressIP: egressip-1 assigned to node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal whi      ch is not ready, will attempt rebalancing
      29473 I0331 02:48:34.920808       1 egressip.go:1212] Deleting pod egress IP status: {huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal 10.0.32.100} for EgressIP:       egressip-1
      
      

      Actual results:

      The egressIP was not migrated to correct worker
       oc get egressIP      
      NAME         EGRESSIPS     ASSIGNED NODE                                                 ASSIGNED EGRESSIPS
      egressip-1   10.0.32.100   huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   10.0.32.100
      
      

      Expected results:

      The egressIP should migrated to correct worker from deleted node.
      
      

      Additional info:

      
      

      Attachments

        Issue Links

          Activity

            People

              jluhrsen Jamo Luhrsen
              openshift-crt-jira-prow OpenShift Prow Bot
              Huiran Wang Huiran Wang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: