The state transfer in STATE writes into a ByteArrayDataOutputStream instead of ByteArrayOutputStream as the case in STATE_TRANSFER. In our case with replicated maps containing thousands of complex class objects this takes an insane amount of time now. Reason for this I would say is that the stream buffer is increasing in very small steps (as you have documented) and the result is very expensive array copying taking place. Wrapping with BufferedOutputStream does not help much once the buffer starts to become large.
Just by letting the ByteArrayDataOutputStream grow exponentially we get a massive speed improvement.