-
Bug
-
Resolution: Done
-
Critical
-
7.0.0.Alpha5, 7.1.1.Final
-
None
Topology T: coordinator = A, owners(k) = [C, D], pending_owners(k) = null
B sends prepareCommand(tx1, put(k, v)) to C, D
D adds backup locks and replies
C acquires lock, ready to send reply to B
A starts installing topology T+1: owners(k) = [C, D], pending_owners(k) = [C, E]
A, C and E install topology T+1, B and D do not
E requests and receives tx data from C, including tx1
C leaves
B sees a SuspectException, sends rollbackCommand(tx1) to C, D
D removes tx1
C has left, but is ignored
B reports to the user that the tx has been rolled back
B and D install topology T+1 (optional)
A starts installing topology T+2: owners(k) = [D], pending_owners(k) = [E]
A, B, D, E all install topology T+2
E requests and receives state from D, but it does not remove tx1
A starts installing topology T+3: owners(k) = [E], pending_owners(k) = null
E now has a stale backup lock on k
It seems very hard to reproduce in production: C would have to leave soon enough so that B and D haven't received the T+1 topology yet, but late enough for it to send its transaction data to E.
A possible solution would be to catch any SuspectException during prepare/commit/rollback (without ignoring leavers), wait for a new topology, and replicate the command again on the new owners. Obviously, this wouldn't work with asynchronous prepare/commit/rollback.