Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12893

[IPI vSphere] the keepalived pod is still in MASTER state even no default router pod running on that node

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Major Major
    • None
    • 4.10.z
    • None
    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      After scale down the default ingresscontroller replicas to 0 and remove all default router pods, there is still one keepalived pod in MASTER state and holding the ingress Virtual IP.
      
      Comparing the ipfailover (https://docs.openshift.com/container-platform/4.10/networking/configuring-ipfailover.html) which using the same image, if the track_script failed then it enter FAULT state.  

      Version-Release number of selected component (if applicable):

      4.10.58

      How reproducible:

      always

      Steps to Reproduce:

      1. scale down the replicas of default ingresscontroller to zero 
      $ oc -n openshift-ingress-operator scale ingresscontroller/default --replicas=0
      
      $ oc -n openshift-ingress get pod
      No resources found in openshift-ingress namespace.
      
      
      2. check the keepalivd pods running on worker nodes and their logs 
      $ oc -n openshift-vsphere-infra get pod -l app=vsphere-infra-vrrp
      NAME                                         READY   STATUS    RESTARTS   AGE
      keepalived-hongli-vs410-k6n7k-master-0       2/2     Running   0          7h50m
      keepalived-hongli-vs410-k6n7k-master-1       2/2     Running   0          7h51m
      keepalived-hongli-vs410-k6n7k-master-2       2/2     Running   0          7h50m
      keepalived-hongli-vs410-k6n7k-worker-2mvxd   2/2     Running   0          7h42m
      keepalived-hongli-vs410-k6n7k-worker-46sjv   2/2     Running   0          7h42m
      keepalived-hongli-vs410-k6n7k-worker-ddmtg   2/2     Running   0          7h42m
      
      $ oc -n openshift-vsphere-infra logs keepalived-hongli-vs410-k6n7k-worker-ddmtg
      <......>
      Fri Apr 28 04:57:50 2023: (hongli-vs410_INGRESS) Received advert from 172.31.249.204 with lower priority 20, ours 20, forcing new election
      Fri Apr 28 04:57:50 2023: (hongli-vs410_INGRESS) Sending/queueing gratuitous ARPs on ens192 for 172.31.248.53
      Fri Apr 28 04:57:50 2023: Sending gratuitous ARP on ens192 for 172.31.248.53
      Fri Apr 28 04:57:50 2023: Sending gratuitous ARP on ens192 for 172.31.248.53
      
      
      $ oc -n openshift-vsphere-infra rsh keepalived-hongli-vs410-k6n7k-worker-ddmtg
      Defaulted container "keepalived" out of: keepalived, keepalived-monitor, render-config-keepalived (init)
      sh-4.4# ip a | grep 172
          inet 172.31.249.245/23 brd 172.31.249.255 scope global dynamic noprefixroute ens192
          inet 172.31.248.53/32 scope global ens192                <<<---ingress vip
      sh-4.4# 
      
      $ oc get infrastructures.config.openshift.io cluster -oyaml
      <......>
      status:
        platform: VSphere
        platformStatus:
          type: VSphere
          vsphere:
            apiServerInternalIP: 172.31.248.49
            ingressIP: 172.31.248.53
      
      
      3.
      

      Actual results:

      all keepalived pods on worker nodes have the same priority 20, and the one with highest IP win the election and takes MASTER state.

      Expected results:

      if no default router pod running, the keepalived should be in FAULT state so it won't be able to enter MASTER state. 

      Additional info:

      the keepalived.conf looks like 
      
      # TODO: Improve this check. The port is assumed to be alive.
      # Need to assess what is the ramification if the port is not there.
      vrrp_script chk_ingress {
          script "/usr/bin/timeout 0.9 /usr/bin/curl -o /dev/null -Lfs http://localhost:1936/healthz/ready"
          interval 1
          weight 20
          rise 3
          fall 2
      }vrrp_script chk_default_ingress {
          script "/usr/bin/timeout 4.9 /etc/keepalived/chk_default_ingress.sh"
          interval 5
          weight 50
          rise 3
          fall 2
      }
      
      vrrp_instance hongli-vs410_INGRESS {
          state BACKUP
          interface ens192
          virtual_router_id 2
          priority 20
          advert_int 1
          
          authentication {
              auth_type PASS
              auth_pass hongli-vs410_ingress_vip
          }
          virtual_ipaddress {
              172.31.248.53/32
          }
          track_script {
              chk_ingress
              chk_default_ingress
          }
      }
       
      
      

       

              bnemec@redhat.com Benjamin Nemec
              rhn-support-hongli Hongan Li
              Zhanqi Zhao Zhanqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: