-
Bug
-
Resolution: Done
-
Major
-
5.0.1.FINAL
-
None
There is an issue with stale locks when a new node joins a running cluster.
There are three caches all using eagerLockSingleNode=true with the transaction running on the primary data owner. The local locks for the affected keys are acquired, and the prepare is sent to the backup owner. During this time, the new node is detected and joins the cluster. The transaction times out waiting for the transaction lock, and a rollback is attempted. Whereas the local keys are unlocked, the remotely-acquired locks never release.
I have full trace log files at:
http://dl.dropbox.com/u/10929737/5.0.1-stale-lock/data-grid-4/server.log.2011-11-09.log.31.gz
http://dl.dropbox.com/u/10929737/5.0.1-stale-lock/data-grid-4/server.log.2011-11-09.log.32.gz
http://dl.dropbox.com/u/10929737/5.0.1-stale-lock/data-grid-5/server.log.2011-11-09.log.rar
The transaction in question is GlobalTransaction:<data-grid-4-61247>:169901, found on data-grid-4. I have verified that the primary owner of the keys in question are on data-grid-4.