Looks like a problem in entry forwarding.
Here is test scenario:
- DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
- HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
After the test run, the numberOfEntries on each node are:
- node1: 26608
- node2: 26622
- node3: 26746
- node4: 0
Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
Current CH: owners(574ff563) are [node4, node1]
The events sequence is:
- hotrod -> node1
- node1 forwarding it to primary owner node4
- node4 doesn't process the forwarded entry, shutdown
Result owners(7c29bccb) is  empty. This entry is completely lost without any errors.