-
Sub-task
-
Resolution: Duplicate
-
Undefined
-
None
-
rhel-8
-
None
-
1
-
False
-
-
False
-
rhel-sst-network-fastdatapath
-
-
-
ssg_networking
OpenShift users report uneven load balancing between service backends:
- The number of service backend pods is a power of 2, traffic distribution is even/balanced.
- If the number of service backend pods is not a “power of two” the pods are “roughly” split in a group that receives half the amount of traffic as the other group (The size of these groups also seems deterministic).
Here is some result of sending 5000 requests (with 20 parallel calls) to above service,
The first column is the number of calls received for the given pod – the second column is the pod name.
With 8 backend pods - even distribution:
644 openssl-server-9585ff455-6nbg7 619 openssl-server-9585ff455-7b8rm 589 openssl-server-9585ff455-9tnz7 617 openssl-server-9585ff455-c84hw 628 openssl-server-9585ff455-mjskl 618 openssl-server-9585ff455-ptkcc 602 openssl-server-9585ff455-rl7pk 683 openssl-server-9585ff455-xd2cb
With 9 backend pods, uneven distribution:
634 openssl-server-9585ff455-6nbg7 326 openssl-server-9585ff455-72br8 <----- 605 openssl-server-9585ff455-7b8rm 655 openssl-server-9585ff455-9tnz7 596 openssl-server-9585ff455-c84hw 622 openssl-server-9585ff455-mjskl 336 openssl-server-9585ff455-ptkcc <----- 592 openssl-server-9585ff455-rl7pk 634 openssl-server-9585ff455-xd2cb
With 10 backend pods, uneven distribution:
307 openssl-server-9585ff455-6nbg7 <----- 309 openssl-server-9585ff455-72br8 <----- 597 openssl-server-9585ff455-7b8rm 348 openssl-server-9585ff455-9tnz7 <----- 631 openssl-server-9585ff455-c84hw 626 openssl-server-9585ff455-mjskl 657 openssl-server-9585ff455-n25g5 316 openssl-server-9585ff455-ptkcc <----- 585 openssl-server-9585ff455-rl7pk 624 openssl-server-9585ff455-xd2cb
imaximet@redhat.com:
The issue is in a way how OVS allocates hash space for OpenFlow group buckets and how those hashes are mapped to the buckets. if we have 8 backends, then the hash space is 16, then we map those 16 hashes to 8 buckets (backends) and the Webster's method used by OVS just maps them 2 to 1. So, we have an even distribution of hashes between buckets.
If we have 10 backends, the hash space will be 16 again, but now we need to map 16 different hashes to 10 buckets. Obviously, some buckets will get more hashes than others. In this case about 6 buckets will get 2 hashes each and 4 buckets will get 1 hash each. So, each of these 6 backends will see about 2x the traffic of each of the remaining 4.
One possible solution is to increase the hash space, so the maximum difference between buckets has less impact.