Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1257

Keepalived health check causes unnecessary VIP flapping when HAProxy is healthy

XMLWordPrintable

    • -
    • None
    • ShiftStack Sprint 225, ShiftStack Sprint 226, ShiftStack Sprint 227, ShiftStack Sprint 228
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, the `Keepalived` health check read the status of a load-balanced `kube-apiserver`. This can cause issues if a cluster recovers from an outage and the API server is unreliable, because the health check cannot find a healthy `kube-apiserver`.

      For the {product-title} {product-version} release, the `Keepalived` router locates a functioning `HAProxy` router and then passes the health check operation of a `kube-apiserver` to this `HAProxy` router. This change prevents unnecessary API Virtual IP (VIP) failovers.

      (link:https://issues.redhat.com/browse/OCPBUGS-1257[*OCPBUGS-1257*])
      Show
      Previously, the `Keepalived` health check read the status of a load-balanced `kube-apiserver`. This can cause issues if a cluster recovers from an outage and the API server is unreliable, because the health check cannot find a healthy `kube-apiserver`. For the {product-title} {product-version} release, the `Keepalived` router locates a functioning `HAProxy` router and then passes the health check operation of a `kube-apiserver` to this `HAProxy` router. This change prevents unnecessary API Virtual IP (VIP) failovers. (link: https://issues.redhat.com/browse/OCPBUGS-1257 [* OCPBUGS-1257 *])
    • Bug Fix
    • Done

      This relates to the recovery of a cluster following an etcd outage.

      The ingress path to kube-apiserver is:

      ───────────> VIP ─────────────────> Local HAProxy ────┬─> kube-apiserver-master-0
          (managed by keepalived)                           │
                                                            ├─> kube-apiserver-master-1
                                                            │
                                                            └─> kube-apiserver-master-2
      

      Each master is running an HAProxy which load balances between the 3 kube-apiservers. Each HAProxy is running health checks against each kube-apiserver, and will add or remove it from the available pool based on its health.

      We only use keepalived to ensure that HAProxy is not a single point of failure. It is the job of keepalived to ensure that incoming traffic is being directed to an HAProxy which is functioning correctly.

      The current health check we are using for keepalived involves polling /readyz against the local HAProxy. While this seems intuitively correct it is in fact testing the wrong thing. It is testing whether the kube-apiserver it connects to is functioning correctly. However, this is not the purpose of keepalived. HAProxy runs health checks against kube-apiserver backends. keepalived simply selects a correctly functioning HAProxy.

      This becomes important during recovery from an outage. When none of the kube-apiservers are healthy this health check will fail continuously, and the API VIP will move uselessly between masters. However the situation is much worse when only one of the kube-apiservers is up. In this case there is a high probability that it is overloaded and at least rate limiting incoming connections. This may lead us to fail the keepalived health check and fail the VIP over to the next HAProxy. This will cause all open kube-apiserver connections to reset, even the established ones. This increases the load on the kube-apiserver and increases the probability that the health check will fail again.

      Ideally the keepalived health check would check only the health of HAProxy itself, not the health of the pool of kube-apiservers. In practise it will probably never be necessary to move the VIP while the master is up, regardless of the health of the cluster. A network partition affecting HAProxy would already be handled by VRRP between the masters, so it may be that it would be sufficient to check that the local HAProxy pod is healthy.

            maandre@redhat.com Martin André
            rhn-gps-mbooth Matthew Booth
            Ramón Lobillo Ramón Lobillo
            Darragh Fitzmaurice Darragh Fitzmaurice
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: