GCP cloud UPerf between 2 nodes (c3-standard-8), 100% reads
With `tcp_nodelay=false` (default)
- size=1000, 1 sender thread: 83us RTT
- size=1500, 1 sender thread: 2400us RTT
- size=1000, 100 sender threads: 276us
- size=1500, 100 sender threads: 276 us
With `tcp_nodelay=true`
- size=1000, 1 sender thread: 66us RTT
- size=1500, 1 sender thread: 72 RTT
- size=1000, 100 sender threads: 280us
- size=1500, 100 sender threads: 284 us
It seems the effect of setting tcp_nodelay=false is only seen for a single thread; when many senders are sending messages, this has no effect.
The single thread is delayed by nagling, which has an effect on the request- and response sending.
The reason what the 1000 byte message worked is that TCP sent a single segment, which was sent immediately because it was large enough. The 1500 byte message created 2 TCP segments: 1480 and 20 bytes (roughly) and the second segment was not sent because Nagling was waiting for a more smaller payloads to pack into the same segment, delaying the send (small payload) and the response (1500, creating 2 TCP segments).
Let's therefore turn tcp_nodelay back on!