Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-826

Improve load balancing in dp_hash select groups with equal weights

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • rhel-8
    • openvswitch3.1
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • openvswitch3.1-3.1.0-143.el8fdp
    • rhel-sst-network-fastdatapath
    • ssg_networking

      OpenShift users report uneven load balancing between service backends:

      1. The number of service backend pods is a power of 2, traffic distribution is even/balanced.
      2. If the number of service backend pods is not a “power of two” the pods are “roughly” split in a group that receives half the amount of traffic as the other group (The size of these groups also seems deterministic).

      Here is some result of sending 5000 requests (with 20 parallel calls) to above service,
      The first column is the number of calls received for the given pod – the second column is the pod name.

      With 8 backend pods - even distribution:

          644 openssl-server-9585ff455-6nbg7
          619 openssl-server-9585ff455-7b8rm
          589 openssl-server-9585ff455-9tnz7
          617 openssl-server-9585ff455-c84hw
          628 openssl-server-9585ff455-mjskl
          618 openssl-server-9585ff455-ptkcc
          602 openssl-server-9585ff455-rl7pk
          683 openssl-server-9585ff455-xd2cb
      

      With 9 backend pods, uneven distribution:

          634 openssl-server-9585ff455-6nbg7
          326 openssl-server-9585ff455-72br8  <-----
          605 openssl-server-9585ff455-7b8rm
          655 openssl-server-9585ff455-9tnz7
          596 openssl-server-9585ff455-c84hw
          622 openssl-server-9585ff455-mjskl
          336 openssl-server-9585ff455-ptkcc  <-----
          592 openssl-server-9585ff455-rl7pk
          634 openssl-server-9585ff455-xd2cb
      

      With 10 backend pods, uneven distribution:

          307 openssl-server-9585ff455-6nbg7  <-----
          309 openssl-server-9585ff455-72br8  <-----
          597 openssl-server-9585ff455-7b8rm
          348 openssl-server-9585ff455-9tnz7  <-----
          631 openssl-server-9585ff455-c84hw
          626 openssl-server-9585ff455-mjskl
          657 openssl-server-9585ff455-n25g5
          316 openssl-server-9585ff455-ptkcc  <-----
          585 openssl-server-9585ff455-rl7pk
          624 openssl-server-9585ff455-xd2cb
      

      imaximet@redhat.com:
      The issue is in a way how OVS allocates hash space for OpenFlow group buckets and how those hashes are mapped to the buckets. if we have 8 backends, then the hash space is 16, then we map those 16 hashes to 8 buckets (backends) and the Webster's method used by OVS just maps them 2 to 1. So, we have an even distribution of hashes between buckets.
      If we have 10 backends, the hash space will be 16 again, but now we need to map 16 different hashes to 10 buckets. Obviously, some buckets will get more hashes than others. In this case about 6 buckets will get 2 hashes each and 4 buckets will get 1 hash each. So, each of these 6 backends will see about 2x the traffic of each of the remaining 4.

      One possible solution is to increase the hash space, so the maximum difference between buckets has less impact.

              imaximet@redhat.com Ilya Maximets
              rh-ee-algiorgi Alessandro Giorgi
              Minxi Hou Minxi Hou
              Votes:
              5 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: