Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4605

Keepalived health check causes unnecessary VIP flapping when HAProxy is healthy

XMLWordPrintable

    • -
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously Keepalived health check was looking at the status of the load-balanced kube-apiserver. This could be problematic when the cluster was recovering from an outage and API server is unreliable. Instead, have Keepalived check for the readyness of HAProxy and let HAProxy manage the API server backends. This prevents unnecessary API VIP failovers.
      Show
      Previously Keepalived health check was looking at the status of the load-balanced kube-apiserver. This could be problematic when the cluster was recovering from an outage and API server is unreliable. Instead, have Keepalived check for the readyness of HAProxy and let HAProxy manage the API server backends. This prevents unnecessary API VIP failovers.
    • Bug Fix

      This relates to the recovery of a cluster following an etcd outage.

      The ingress path to kube-apiserver is:

      ───────────> VIP ─────────────────> Local HAProxy ────┬─> kube-apiserver-master-0
          (managed by keepalived)                           │
                                                            ├─> kube-apiserver-master-1
                                                            │
                                                            └─> kube-apiserver-master-2
      

      Each master is running an HAProxy which load balances between the 3 kube-apiservers. Each HAProxy is running health checks against each kube-apiserver, and will add or remove it from the available pool based on its health.

      We only use keepalived to ensure that HAProxy is not a single point of failure. It is the job of keepalived to ensure that incoming traffic is being directed to an HAProxy which is functioning correctly.

      The current health check we are using for keepalived involves polling /readyz against the local HAProxy. While this seems intuitively correct it is in fact testing the wrong thing. It is testing whether the kube-apiserver it connects to is functioning correctly. However, this is not the purpose of keepalived. HAProxy runs health checks against kube-apiserver backends. keepalived simply selects a correctly functioning HAProxy.

      This becomes important during recovery from an outage. When none of the kube-apiservers are healthy this health check will fail continuously, and the API VIP will move uselessly between masters. However the situation is much worse when only one of the kube-apiservers is up. In this case there is a high probability that it is overloaded and at least rate limiting incoming connections. This may lead us to fail the keepalived health check and fail the VIP over to the next HAProxy. This will cause all open kube-apiserver connections to reset, even the established ones. This increases the load on the kube-apiserver and increases the probability that the health check will fail again.

      Ideally the keepalived health check would check only the health of HAProxy itself, not the health of the pool of kube-apiservers. In practise it will probably never be necessary to move the VIP while the master is up, regardless of the health of the cluster. A network partition affecting HAProxy would already be handled by VRRP between the masters, so it may be that it would be sufficient to check that the local HAProxy pod is healthy.

              maandre@redhat.com Martin André
              rhn-gps-mbooth Matthew Booth
              Jon Uriarte Jon Uriarte
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: