5.2.7.Final, 6.0.2.Final, 7.0.0.CR2
This is a strange one:
A pessimistic transaction started on the primary owner during state transfer can fail because a backup owner issues a ClusteredGetCommand to the primary while processing the write command. This can happen if the backup needs to perform a remote get (due to a DELTA_WRITE, conditional, or reliable return values). In this case, it's a DELTA_WRITE.
In this chain of events, state transfer is ongoing, and the union CH is (east-dht5, west-dht5, east-dht6) In this case, the pessimistic transaction originated on east-dht5 must be invoked on west-dht5 and east-dht6:
Let's examine east-dht6. It gets the write request. It needs to perform a remote get to pull the full key context, since it receives only the delta.
east-dht5 receives the ClusteredGetCommand request. Because TxDistributionInterceptor::line 325 is true, the acquireRemoteLock flag is set to true.
Thus, when east-dht5 runs the ClusteredGetCommand. it will create and invoke a LockControlCommand. Because the LCC is created inline, it will not have its origin set. But the LCC will create a RemoteTxInvocationContext. When the LCC runs through the interceptor chain, TxInterceptor::invokeNextInterceptorAndVerifyTransaction will check the originator (because of the remote context). When it doesn't find it, it will force a rollback.
When the transaction attempts to commit, it fails due to the spurious rollback. For instance:
So it looks like there's a couple ways to handle this. One would be to only acquire a remote lock in TxDistributionInterceptor if the current node was the originator. Another – possibly less intrusive – would be to set the origin of the invoked LCC command to that of the ClusteredGetCommand. The former should be preferable, but this code tends to be a bit labyrinthine with the various pessimistic use cases out there.