Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: 4.10.z
Component/s: Networking / runtime-cfg
Labels:
None

Severity:
Important
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

After scale down the default ingresscontroller replicas to 0 and remove all default router pods, there is still one keepalived pod in MASTER state and holding the ingress Virtual IP.

Comparing the ipfailover (https://docs.openshift.com/container-platform/4.10/networking/configuring-ipfailover.html) which using the same image, if the track_script failed then it enter FAULT state.

Version-Release number of selected component (if applicable):

4.10.58

How reproducible:

always

Steps to Reproduce:

1. scale down the replicas of default ingresscontroller to zero 
$ oc -n openshift-ingress-operator scale ingresscontroller/default --replicas=0

$ oc -n openshift-ingress get pod
No resources found in openshift-ingress namespace.


2. check the keepalivd pods running on worker nodes and their logs 
$ oc -n openshift-vsphere-infra get pod -l app=vsphere-infra-vrrp
NAME                                         READY   STATUS    RESTARTS   AGE
keepalived-hongli-vs410-k6n7k-master-0       2/2     Running   0          7h50m
keepalived-hongli-vs410-k6n7k-master-1       2/2     Running   0          7h51m
keepalived-hongli-vs410-k6n7k-master-2       2/2     Running   0          7h50m
keepalived-hongli-vs410-k6n7k-worker-2mvxd   2/2     Running   0          7h42m
keepalived-hongli-vs410-k6n7k-worker-46sjv   2/2     Running   0          7h42m
keepalived-hongli-vs410-k6n7k-worker-ddmtg   2/2     Running   0          7h42m

$ oc -n openshift-vsphere-infra logs keepalived-hongli-vs410-k6n7k-worker-ddmtg
<......>
Fri Apr 28 04:57:50 2023: (hongli-vs410_INGRESS) Received advert from 172.31.249.204 with lower priority 20, ours 20, forcing new election
Fri Apr 28 04:57:50 2023: (hongli-vs410_INGRESS) Sending/queueing gratuitous ARPs on ens192 for 172.31.248.53
Fri Apr 28 04:57:50 2023: Sending gratuitous ARP on ens192 for 172.31.248.53
Fri Apr 28 04:57:50 2023: Sending gratuitous ARP on ens192 for 172.31.248.53


$ oc -n openshift-vsphere-infra rsh keepalived-hongli-vs410-k6n7k-worker-ddmtg
Defaulted container "keepalived" out of: keepalived, keepalived-monitor, render-config-keepalived (init)
sh-4.4# ip a | grep 172
    inet 172.31.249.245/23 brd 172.31.249.255 scope global dynamic noprefixroute ens192
    inet 172.31.248.53/32 scope global ens192                <<<---ingress vip
sh-4.4# 

$ oc get infrastructures.config.openshift.io cluster -oyaml
<......>
status:
  platform: VSphere
  platformStatus:
    type: VSphere
    vsphere:
      apiServerInternalIP: 172.31.248.49
      ingressIP: 172.31.248.53


3.

Actual results:

all keepalived pods on worker nodes have the same priority 20, and the one with highest IP win the election and takes MASTER state.

Expected results:

if no default router pod running, the keepalived should be in FAULT state so it won't be able to enter MASTER state.

Additional info:

the keepalived.conf looks like 

# TODO: Improve this check. The port is assumed to be alive.
# Need to assess what is the ramification if the port is not there.
vrrp_script chk_ingress {
    script "/usr/bin/timeout 0.9 /usr/bin/curl -o /dev/null -Lfs http://localhost:1936/healthz/ready"
    interval 1
    weight 20
    rise 3
    fall 2
}vrrp_script chk_default_ingress {
    script "/usr/bin/timeout 4.9 /etc/keepalived/chk_default_ingress.sh"
    interval 5
    weight 50
    rise 3
    fall 2
}

vrrp_instance hongli-vs410_INGRESS {
    state BACKUP
    interface ens192
    virtual_router_id 2
    priority 20
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass hongli-vs410_ingress_vip
    }
    virtual_ipaddress {
        172.31.248.53/32
    }
    track_script {
        chk_ingress
        chk_default_ingress
    }
}

Assignee:: Benjamin Nemec

Reporter:: Hongan Li

QA Contact:: Zhanqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/04/28 7:30 AM

Updated:: 2023/05/05 3:30 AM

Resolved:: 2023/05/04 3:59 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates