-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.17
-
Important
-
No
-
Rejected
-
False
-
Description of problem:
The issue was found when verifying bug https://issues.redhat.com/browse/OCPBUGS-38653, as it's not same as original issue, open a new bug to track.
Version-Release number of selected component (if applicable):
The build from openshift/ovn-kubernetes#2265
How reproducible:
Steps to Reproduce:
oc get nodes NAME STATUS ROLES AGE VERSION huirwang-08215-tkrrg-master-0 Ready control-plane,master 154m v1.30.3 huirwang-08215-tkrrg-master-1 Ready control-plane,master 154m v1.30.3 huirwang-08215-tkrrg-master-2 Ready control-plane,master 154m v1.30.3 huirwang-08215-tkrrg-worker-a-czr4g Ready worker 144m v1.30.3 huirwang-08215-tkrrg-worker-b-hsgxf Ready worker 61s v1.30.3 huirwang-08215-tkrrg-worker-b-xd7pv Ready worker 3m10s v1.30.3 huirwang-08215-tkrrg-worker-c-7lmrf Ready worker 143m v1.30.3 huirwang-08215-tkrrg-worker-f-sgskm Ready worker 27m v1.30.3 Apply egress label to node huirwang-08215-tkrrg-worker-b-hsgxf, huirwang-08215-tkrrg-worker-f-sgskm, huirwang-08215-tkrrg-worker-c-7lmrf Create egressIP object % oc get egressip -o yaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2024-08-21T06:26:29Z" generation: 3 name: egressip-2 resourceVersion: "99711" uid: 082100dc-1012-47e5-95c1-e0aa4faf97d9 spec: egressIPs: - 10.0.128.101 - 10.0.128.100 namespaceSelector: matchLabels: name: qe status: items: - egressIP: 10.0.128.100 node: huirwang-08215-tkrrg-worker-f-sgskm - egressIP: 10.0.128.101 node: huirwang-08215-tkrrg-worker-c-7lmrf kind: List metadata: resourceVersion: "" Create namespace test and pods in test namespace, apply label name=qe to namespace test % oc get pods -n test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-rc-547v8 1/1 Running 0 17m 10.130.2.9 huirwang-08215-tkrrg-worker-f-sgskm <none> <none> test-rc-x9hd4 1/1 Running 0 17m 10.131.0.30 huirwang-08215-tkrrg-worker-a-czr4g <none> <none> % oc -n openshift-ovn-kubernetes exec ${ovn_pod} -c northd -- ovn-nbctl find logical_router_policy match="\"ip4.src == ${podip}\"" _uuid : 8a52ad1c-ffe6-4039-b4c4-59221fb9da98 action : reroute bfd_sessions : [] external_ids : {name=egressip-2} match : "ip4.src == 10.131.0.30" nexthop : [] nexthops : ["100.88.0.7", "100.88.0.8"] options : {} priority : 100 % echo $LSP_ADDRESSES (tstor-huirwang-08215-tkrrg-master-0) 0a:58:64:58:00:02 100.88.0.2/16 (tstor-huirwang-08215-tkrrg-master-1) 0a:58:64:58:00:03 100.88.0.3/16 (tstor-huirwang-08215-tkrrg-master-2) 0a:58:64:58:00:04 100.88.0.4/16 (tstor-huirwang-08215-tkrrg-worker-a-czr4g) 0a:58:64:58:00:05 100.88.0.5/16 (tstor-huirwang-08215-tkrrg-worker-b-hsgxf) 0a:58:64:58:00:0a 100.88.0.10/16 (tstor-huirwang-08215-tkrrg-worker-b-xd7pv) 0a:58:64:58:00:09 100.88.0.9/16 (tstor-huirwang-08215-tkrrg-worker-c-7lmrf) 0a:58:64:58:00:07 100.88.0.7/16 (tstor-huirwang-08215-tkrrg-worker-f-sgskm) 0a:58:64:58:00:08 100.88.0.8/16 Delete one egress node % oc delete node huirwang-08215-tkrrg-worker-c-7lmrf node "huirwang-08215-tkrrg-worker-c-7lmrf" deleted Result: % oc get egressip -o yaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2024-08-21T06:26:29Z" generation: 4 name: egressip-2 resourceVersion: "100274" uid: 082100dc-1012-47e5-95c1-e0aa4faf97d9 spec: egressIPs: - 10.0.128.101 - 10.0.128.100 namespaceSelector: matchLabels: name: qe status: items: - egressIP: 10.0.128.100 node: huirwang-08215-tkrrg-worker-f-sgskm kind: List metadata: resourceVersion: "" % oc -n openshift-ovn-kubernetes exec ${ovn_pod} -c northd -- ovn-nbctl find logical_router_policy match="\"ip4.src == ${podip}\"" _uuid : 8a52ad1c-ffe6-4039-b4c4-59221fb9da98 action : reroute bfd_sessions : [] external_ids : {name=egressip-2} match : "ip4.src == 10.131.0.30" nexthop : [] nexthops : ["100.88.0.8"] options : {} priority : 100 The egressIP didn't failover to another available egress node huirwang-08215-tkrrg-worker-b-hsgxf, we can see the egress label was applied on this node. % oc get node huirwang-08215-tkrrg-worker-b-hsgxf --show-labels NAME STATUS ROLES AGE VERSION LABELS huirwang-08215-tkrrg-worker-b-hsgxf Ready worker 28m v1.30.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n2-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,k8s.ovn.org/egress-assignable=,kubernetes.io/arch=amd64,kubernetes.io/hostname=huirwang-08215-tkrrg-worker-b-hsgxf,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n2-standard-4,node.openshift.io/os_id=rhcos,topology.gke.io/zone=us-central1-b,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-b
Actual results:
EgressIP was not failover to another egress node oc get CloudPrivateIPConfig 10.0.128.101 -o yaml apiVersion: cloud.network.openshift.io/v1 kind: CloudPrivateIPConfig metadata: annotations: k8s.ovn.org/egressip-owner-ref: egressip-2 creationTimestamp: "2024-08-21T06:26:29Z" finalizers: - cloudprivateipconfig.cloud.network.openshift.io/finalizer generation: 2 name: 10.0.128.101 resourceVersion: "103371" uid: 357a3e26-254d-40af-a4b2-c95f9e2b7cee spec: node: huirwang-08215-tkrrg-worker-b-hsgxf status: conditions: - lastTransitionTime: "2024-08-21T06:33:44Z" message: 'Error processing cloud assignment request, err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP ''10.0.128.101/32'' is already being used by another resource. "}]}' observedGeneration: 2 reason: CloudResponseError status: "False" type: Assigned node: huirwang-08215-tkrrg-worker-b-hsgxf
Expected results:
Should be able to failover to egress node
Additional info: