In DIST mode, when a request is made on a key that is not mapped locally, a remote get is sent to all data owners of that key and the first response is used. This can add unnecessary load on the network as all nodes still eventually respond, and if values are large this can cause a lot of unnecessary network traffic.
The purpose of broadcasting to all data owners is so that (1) if one is down, another could still respond (2) if one is overloaded, others may respond faster.
A solution around this could be based on either (or both) of:
- Provide a configurable stagger timeout, e.g. 100ms. E.g., RPC to (random) Owner1. Wait for timeout t. If no response, RPC to Owner2. etc.
- Always broadcast to a (configurable) subset of owners, e.g., always 2 even if numOwners is 5.
Needs careful thought and design.