Uploaded image for project: 'Network Edge'
  1. Network Edge
  2. NE-709

[Tech Debt] [Perf+Scale] Investigate and improve memory performance of backend server weights with random


    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Undefined Undefined
    • openshift-4.11
    • None
    • None
    • Investigate implications and performance impact of adjusting backend server weights
    • False
    • False
    • Green
    • In Progress
    • OCPPLAN-7878 - NetEdge - Maintainability and Debugability & Tech Backlog
    • Impediment
    • OCPPLAN-7878NetEdge - Maintainability and Debugability & Tech Backlog
    • 0% To Do, 0% In Progress, 100% Done
    • 0
    • 0

      In OCP 4.8 the router was changed to use the "random" balancing algorithm for non-passthrough routes by default. It was previously "leastconn".

      Bug https://bugzilla.redhat.com/show_bug.cgi?id=2007581 shows that using "random" by default incurs significant memory overhead for each backend that uses it.

      PR https://github.com/openshift/cluster-ingress-operator/pull/663
      reverted the change and made "leastconn" the default again (OCP 4.8 onwards).

      The analysis in https://bugzilla.redhat.com/show_bug.cgi?id=2007581#c40 shows that the default haproxy behaviour is to multiply the weight (specified in the route CR) by 16 as it builds its data structures for each backend. If no weight is specified then openshift-router sets the weight to 256. If you have many, many thousands of routes then this balloons quickly and leads to a significant increase in memory usage, as highlighted by customer cases attached to BZ#2007581.

      The purpose of this issue is to both explore changing the openshift-router default weight (i.e., 256) to something smaller, or indeed unset (assuming no explicit weight has been requested), and to measure the memory usage within the context of the existing perf&scale tests that we use for vetting new haproxy releases.

      It may be that the low-hanging change is to not default to weight=256 for backends that only have one pod replica (i.e., if no value specified, and there is only 1 pod replica, then don't default to 256 for that single server entry).

      Outcome: does changing the [default] weight value make it feasible to switch back to "random" as the default balancing algorithm for a future OCP release.

        There are no Sub-Tasks for this issue.

            amcdermo@redhat.com ANDREW MCDERMOTT
            mmasters1@redhat.com Miciah Masters
            Arvind Iyengar Arvind Iyengar (Inactive)
            0 Vote for this issue
            4 Start watching this issue