Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11803

If first egressIP node is deleted, egressIP does not failover to the second available egressIP node

XMLWordPrintable

    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      If first egressIP node is deleted, egressIP does not failover to the second available egressIP node

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Have two nodes labelled egress-assignable, configure egressip object, delete the egress node where the egressip was first assigned, egressip does not fail over to the second egress node
      
      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.14.0-0.nightly-2023-04-03-211601   True        False         31m     Cluster version is 4.14.0-0.nightly-2023-04-03-211601
      
      $ oc get node
      NAME                                                        STATUS   ROLES                  AGE   VERSION
      jechen-0413h-wldhh-master-0.c.openshift-qe.internal         Ready    control-plane,master   57m   v1.26.2+54b5520
      jechen-0413h-wldhh-master-1.c.openshift-qe.internal         Ready    control-plane,master   56m   v1.26.2+54b5520
      jechen-0413h-wldhh-master-2.c.openshift-qe.internal         Ready    control-plane,master   56m   v1.26.2+54b5520
      jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal   Ready    worker                 43m   v1.26.2+54b5520
      jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal   Ready    worker                 42m   v1.26.2+54b5520
      jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal   Ready    worker                 42m   v1.26.2+54b5520
      

      Steps to Reproduce:

      1. Have two nodes labelled egress-assignable
      $ oc label node jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
      node/jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal labeled
      
      $ oc label node jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
      node/jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal labeled
      
      
      2. create a namespace, create some test pods in it, create an egressip object in it, label the namespace to match the namespace 
      $ oc new-project test
      
      $ cat config_egressip1_ovn_ns_team_red.yaml
      apiVersion: k8s.ovn.org/v1
      kind: EgressIP
      metadata:
        name: egressip-red
      spec:
        egressIPs:
        - 10.0.128.201
        namespaceSelector:
          matchLabels:
            team: red 
      
      $ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
      egressip.k8s.ovn.org/egressip-red created
      
      $ oc get egressips.k8s.ovn.org 
      NAME           EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
      egressip-red   10.0.128.201   jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal   10.0.128.201
      
      
      $ oc get egressips.k8s.ovn.org egressip-red -oyaml
      apiVersion: k8s.ovn.org/v1
      kind: EgressIP
      metadata:
        creationTimestamp: "2023-04-13T23:08:29Z"
        generation: 2
        name: egressip-red
        resourceVersion: "47901"
        uid: 407676cf-e8f9-4ae0-992d-1046058811cd
      spec:
        egressIPs:
        - 10.0.128.201
        namespaceSelector:
          matchLabels:
            team: red
      status:
        items:
        - egressIP: 10.0.128.201
          node: jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal
      
      
      $ oc label ns test team=red
      
      $ oc get pod -owide
      NAME            READY   STATUS    RESTARTS   AGE     IP            NODE                                                        NOMINATED NODE   READINESS GATES
      test-rc-fc4bg   1/1     Running   0          4m57s   10.131.0.16   jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal   <none>           <none>
      test-rc-hbln6   1/1     Running   0          4m56s   10.131.0.17   jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal   <none>           <none>
      test-rc-nm7vd   1/1     Running   0          4m57s   10.128.2.15   jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal   <none>           <none>
      test-rc-wzdrl   1/1     Running   0          4m57s   10.129.2.13   jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal   <none>           <none>
      
      $ oc exec test-rc-fc4bg -- curl 10.0.0.2:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100    12  100    12    0     0   2000      0 --:--:-- --:--:-- --:--:--  240010.0.128.201
      
      $  oc exec test-rc-hbln6 -- curl 10.0.0.2:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100    12  100    12    0     0   2400      0 --:--:-- --:--:-- --:--:--  2400
      10.0.128.201
      
      $ oc exec test-rc-nm7vd -- curl 10.0.0.2:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100    12  100    12    0     0    923      0 --:--:-- --:--:-- --:--:--  1000
      10.0.128.201
      
      $ oc exec test-rc-wzdrl  -- curl 10.0.0.2:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100    12  100    12    0     0   1714      0 --:--:-- --:--:-- --:--:--  1714
      10.0.128.201
      
      egressip works at this point
      
       
       3. Delete the node that egressip was assigned on
      
      $ oc get node jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal -oyaml > backup-worker-a.yaml
      
      $ oc delete node jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal
      node "jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal" deleted
      
      $ oc get node
      NAME                                                        STATUS   ROLES                  AGE   VERSION
      jechen-0413h-wldhh-master-0.c.openshift-qe.internal         Ready    control-plane,master   72m   v1.26.2+54b5520
      jechen-0413h-wldhh-master-1.c.openshift-qe.internal         Ready    control-plane,master   72m   v1.26.2+54b5520
      jechen-0413h-wldhh-master-2.c.openshift-qe.internal         Ready    control-plane,master   71m   v1.26.2+54b5520
      jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal   Ready    worker                 58m   v1.26.2+54b5520
      jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal   Ready    worker                 58m   v1.26.2+54b5520
      
      
      $ oc get egressips.k8s.ovn.org 
      NAME           EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
      egressip-red   10.0.128.201   jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal   10.0.128.201
      
      
      $ oc get egressips.k8s.ovn.org -oyaml
      apiVersion: v1
      items:
      - apiVersion: k8s.ovn.org/v1
        kind: EgressIP
        metadata:
          creationTimestamp: "2023-04-13T23:08:29Z"
          generation: 2
          name: egressip-red
          resourceVersion: "47901"
          uid: 407676cf-e8f9-4ae0-992d-1046058811cd
        spec:
          egressIPs:
          - 10.0.128.201
          namespaceSelector:
            matchLabels:
              team: red
        status:
          items:
          - egressIP: 10.0.128.201
            node: jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal
      kind: List
      metadata:
        resourceVersion: ""
      
      
      $ oc get node jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal  --show-labels
      NAME                                                        STATUS   ROLES    AGE   VERSION           LABELS
      jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal   Ready    worker   60m   v1.26.2+54b5520   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,k8s.ovn.org/egress-assignable=,kubernetes.io/arch=amd64,kubernetes.io/hostname=jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhcos,topology.gke.io/zone=us-central1-b,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-b
      
      
      $ oc get pod -owide
      NAME            READY   STATUS    RESTARTS   AGE   IP            NODE                                                        NOMINATED NODE   READINESS GATES
      test-rc-7rt7x   1/1     Running   0          9s    10.129.2.15   jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal   <none>           <none>
      test-rc-bfx6p   1/1     Running   0          9s    10.128.2.17   jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal   <none>           <none>
      test-rc-nm7vd   1/1     Running   0          12m   10.128.2.15   jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal   <none>           <none>
      test-rc-wzdrl   1/1     Running   0          12m   10.129.2.13   jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal   <none>           <none>
      
      
      $ oc exec test-rc-7rt7x  -- curl 10.0.0.2:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100    10  100    10    0     0   2000      0 --:--:-- --:--:-- --:--:--  200010.0.128.3
      
      $  oc exec test-rc-bfx6p -- curl 10.0.0.2:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100    10  100    10    0     0   2500      0 --:--:-- --:--:-- --:--:--  2500
      10.0.128.4
      
      $ oc exec test-rc-nm7vd -- curl 10.0.0.2:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100    10  100    10    0     0   2500      0 --:--:-- --:--:-- --:--:--  333310.0.128.4
      
      $  oc exec test-rc-wzdrl -- curl 10.0.0.2:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100    10  100    10    0     0   2500      0 --:--:-- --:--:-- --:--:--  250010.0.128.3
      
      
      
      

      Actual results:

      EgressIP did not fail over to the second egress node, outbound traffic uses node IP 

      Expected results:

      EgressIP should fail over to the second egress node, outbound traffic should still use egressIP as source IP

      Additional info:

       

            jluhrsen Jamo Luhrsen
            jechen@redhat.com Jean Chen
            Jean Chen Jean Chen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: