-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
4.10.z
-
None
-
Important
-
No
-
Rejected
-
False
-
Description of problem:
After scale down the default ingresscontroller replicas to 0 and remove all default router pods, there is still one keepalived pod in MASTER state and holding the ingress Virtual IP. Comparing the ipfailover (https://docs.openshift.com/container-platform/4.10/networking/configuring-ipfailover.html) which using the same image, if the track_script failed then it enter FAULT state.
Version-Release number of selected component (if applicable):
4.10.58
How reproducible:
always
Steps to Reproduce:
1. scale down the replicas of default ingresscontroller to zero $ oc -n openshift-ingress-operator scale ingresscontroller/default --replicas=0 $ oc -n openshift-ingress get pod No resources found in openshift-ingress namespace. 2. check the keepalivd pods running on worker nodes and their logs $ oc -n openshift-vsphere-infra get pod -l app=vsphere-infra-vrrp NAME READY STATUS RESTARTS AGE keepalived-hongli-vs410-k6n7k-master-0 2/2 Running 0 7h50m keepalived-hongli-vs410-k6n7k-master-1 2/2 Running 0 7h51m keepalived-hongli-vs410-k6n7k-master-2 2/2 Running 0 7h50m keepalived-hongli-vs410-k6n7k-worker-2mvxd 2/2 Running 0 7h42m keepalived-hongli-vs410-k6n7k-worker-46sjv 2/2 Running 0 7h42m keepalived-hongli-vs410-k6n7k-worker-ddmtg 2/2 Running 0 7h42m $ oc -n openshift-vsphere-infra logs keepalived-hongli-vs410-k6n7k-worker-ddmtg <......> Fri Apr 28 04:57:50 2023: (hongli-vs410_INGRESS) Received advert from 172.31.249.204 with lower priority 20, ours 20, forcing new election Fri Apr 28 04:57:50 2023: (hongli-vs410_INGRESS) Sending/queueing gratuitous ARPs on ens192 for 172.31.248.53 Fri Apr 28 04:57:50 2023: Sending gratuitous ARP on ens192 for 172.31.248.53 Fri Apr 28 04:57:50 2023: Sending gratuitous ARP on ens192 for 172.31.248.53 $ oc -n openshift-vsphere-infra rsh keepalived-hongli-vs410-k6n7k-worker-ddmtg Defaulted container "keepalived" out of: keepalived, keepalived-monitor, render-config-keepalived (init) sh-4.4# ip a | grep 172 inet 172.31.249.245/23 brd 172.31.249.255 scope global dynamic noprefixroute ens192 inet 172.31.248.53/32 scope global ens192 <<<---ingress vip sh-4.4# $ oc get infrastructures.config.openshift.io cluster -oyaml <......> status: platform: VSphere platformStatus: type: VSphere vsphere: apiServerInternalIP: 172.31.248.49 ingressIP: 172.31.248.53 3.
Actual results:
all keepalived pods on worker nodes have the same priority 20, and the one with highest IP win the election and takes MASTER state.
Expected results:
if no default router pod running, the keepalived should be in FAULT state so it won't be able to enter MASTER state.
Additional info:
the keepalived.conf looks like # TODO: Improve this check. The port is assumed to be alive. # Need to assess what is the ramification if the port is not there. vrrp_script chk_ingress { script "/usr/bin/timeout 0.9 /usr/bin/curl -o /dev/null -Lfs http://localhost:1936/healthz/ready" interval 1 weight 20 rise 3 fall 2 }vrrp_script chk_default_ingress { script "/usr/bin/timeout 4.9 /etc/keepalived/chk_default_ingress.sh" interval 5 weight 50 rise 3 fall 2 } vrrp_instance hongli-vs410_INGRESS { state BACKUP interface ens192 virtual_router_id 2 priority 20 advert_int 1 authentication { auth_type PASS auth_pass hongli-vs410_ingress_vip } virtual_ipaddress { 172.31.248.53/32 } track_script { chk_ingress chk_default_ingress } }