-
Bug
-
Resolution: Duplicate
-
Major
-
4.14.z
-
None
-
No
-
Proposed
-
False
-
Description of problem:
If first egressIP node is deleted, egressIP does not failover to the second available egressIP node
Version-Release number of selected component (if applicable):
How reproducible:
Have two nodes labelled egress-assignable, configure egressip object, delete the egress node where the egressip was first assigned, egressip does not fail over to the second egress node $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-04-03-211601 True False 31m Cluster version is 4.14.0-0.nightly-2023-04-03-211601 $ oc get node NAME STATUS ROLES AGE VERSION jechen-0413h-wldhh-master-0.c.openshift-qe.internal Ready control-plane,master 57m v1.26.2+54b5520 jechen-0413h-wldhh-master-1.c.openshift-qe.internal Ready control-plane,master 56m v1.26.2+54b5520 jechen-0413h-wldhh-master-2.c.openshift-qe.internal Ready control-plane,master 56m v1.26.2+54b5520 jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal Ready worker 43m v1.26.2+54b5520 jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal Ready worker 42m v1.26.2+54b5520 jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal Ready worker 42m v1.26.2+54b5520
Steps to Reproduce:
1. Have two nodes labelled egress-assignable $ oc label node jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal labeled $ oc label node jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal labeled 2. create a namespace, create some test pods in it, create an egressip object in it, label the namespace to match the namespace $ oc new-project test $ cat config_egressip1_ovn_ns_team_red.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip-red spec: egressIPs: - 10.0.128.201 namespaceSelector: matchLabels: team: red $ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip-red created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-red 10.0.128.201 jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal 10.0.128.201 $ oc get egressips.k8s.ovn.org egressip-red -oyaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2023-04-13T23:08:29Z" generation: 2 name: egressip-red resourceVersion: "47901" uid: 407676cf-e8f9-4ae0-992d-1046058811cd spec: egressIPs: - 10.0.128.201 namespaceSelector: matchLabels: team: red status: items: - egressIP: 10.0.128.201 node: jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal $ oc label ns test team=red $ oc get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-rc-fc4bg 1/1 Running 0 4m57s 10.131.0.16 jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal <none> <none> test-rc-hbln6 1/1 Running 0 4m56s 10.131.0.17 jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal <none> <none> test-rc-nm7vd 1/1 Running 0 4m57s 10.128.2.15 jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal <none> <none> test-rc-wzdrl 1/1 Running 0 4m57s 10.129.2.13 jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal <none> <none> $ oc exec test-rc-fc4bg -- curl 10.0.0.2:9095 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12 100 12 0 0 2000 0 --:--:-- --:--:-- --:--:-- 240010.0.128.201 $ oc exec test-rc-hbln6 -- curl 10.0.0.2:9095 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12 100 12 0 0 2400 0 --:--:-- --:--:-- --:--:-- 2400 10.0.128.201 $ oc exec test-rc-nm7vd -- curl 10.0.0.2:9095 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12 100 12 0 0 923 0 --:--:-- --:--:-- --:--:-- 1000 10.0.128.201 $ oc exec test-rc-wzdrl -- curl 10.0.0.2:9095 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12 100 12 0 0 1714 0 --:--:-- --:--:-- --:--:-- 1714 10.0.128.201 egressip works at this point 3. Delete the node that egressip was assigned on $ oc get node jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal -oyaml > backup-worker-a.yaml $ oc delete node jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal node "jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal" deleted $ oc get node NAME STATUS ROLES AGE VERSION jechen-0413h-wldhh-master-0.c.openshift-qe.internal Ready control-plane,master 72m v1.26.2+54b5520 jechen-0413h-wldhh-master-1.c.openshift-qe.internal Ready control-plane,master 72m v1.26.2+54b5520 jechen-0413h-wldhh-master-2.c.openshift-qe.internal Ready control-plane,master 71m v1.26.2+54b5520 jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal Ready worker 58m v1.26.2+54b5520 jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal Ready worker 58m v1.26.2+54b5520 $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-red 10.0.128.201 jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal 10.0.128.201 $ oc get egressips.k8s.ovn.org -oyaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2023-04-13T23:08:29Z" generation: 2 name: egressip-red resourceVersion: "47901" uid: 407676cf-e8f9-4ae0-992d-1046058811cd spec: egressIPs: - 10.0.128.201 namespaceSelector: matchLabels: team: red status: items: - egressIP: 10.0.128.201 node: jechen-0413h-wldhh-worker-a-799mt.c.openshift-qe.internal kind: List metadata: resourceVersion: "" $ oc get node jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal --show-labels NAME STATUS ROLES AGE VERSION LABELS jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal Ready worker 60m v1.26.2+54b5520 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,k8s.ovn.org/egress-assignable=,kubernetes.io/arch=amd64,kubernetes.io/hostname=jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhcos,topology.gke.io/zone=us-central1-b,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-b $ oc get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-rc-7rt7x 1/1 Running 0 9s 10.129.2.15 jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal <none> <none> test-rc-bfx6p 1/1 Running 0 9s 10.128.2.17 jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal <none> <none> test-rc-nm7vd 1/1 Running 0 12m 10.128.2.15 jechen-0413h-wldhh-worker-c-qbcls.c.openshift-qe.internal <none> <none> test-rc-wzdrl 1/1 Running 0 12m 10.129.2.13 jechen-0413h-wldhh-worker-b-2qxk6.c.openshift-qe.internal <none> <none> $ oc exec test-rc-7rt7x -- curl 10.0.0.2:9095 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 10 100 10 0 0 2000 0 --:--:-- --:--:-- --:--:-- 200010.0.128.3 $ oc exec test-rc-bfx6p -- curl 10.0.0.2:9095 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 10 100 10 0 0 2500 0 --:--:-- --:--:-- --:--:-- 2500 10.0.128.4 $ oc exec test-rc-nm7vd -- curl 10.0.0.2:9095 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 10 100 10 0 0 2500 0 --:--:-- --:--:-- --:--:-- 333310.0.128.4 $ oc exec test-rc-wzdrl -- curl 10.0.0.2:9095 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 10 100 10 0 0 2500 0 --:--:-- --:--:-- --:--:-- 250010.0.128.3
Actual results:
EgressIP did not fail over to the second egress node, outbound traffic uses node IP
Expected results:
EgressIP should fail over to the second egress node, outbound traffic should still use egressIP as source IP
Additional info:
- duplicates
-
OCPBUGS-11187 EgressIP was NOT migrated to correct workers after deleting machine it was assigned in GCP XPN cluster.
- Closed