Details
-
Bug
-
Resolution: Done
-
Critical
-
13.0.0.Dev02
Description
ISPN-10753 changed StateTransferLockImpl to use a StampLock, which is not reentrant.
Acquiring the read lock twice from the same thread is possible, because the read lock is not exclusive but it brings a deadlock risk:
- thread 1 acquires the read lock in EntryWrappingInterceptor.applyChanges()
- thread 2 tries to acquire the write lock in StateConsumerImpl.onTopologyUpdate() and blocks
- thread 1 tries to acquire the read lock a second time in ClusteringDependentLogic.DistributionLogic.commitSingleEntry() and also blocks
This actually happens in the test suite, causing random failures in NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut():
11:35:10,740 TRACE (non-blocking-thread-Test-NodeA-p39961-t1:[]) [StateConsumerImpl] Received new topology for cache defaultcache, isRebalance = false, isMember = true, topology = CacheTopology{id=8, phase=READ_NEW_WRITE_ALL, rebalanceId=3, currentCH=DefaultConsistentHash{ns=1, owners = (2)[Test-NodeA: 1+0, Test-NodeB: 0+1]}, pendingCH=DefaultConsistentHash{ns=1, owners = (3)[Test-NodeA: 0+1, Test-NodeB: 0+0, Test-NodeC: 1+0]}, unionCH=DefaultConsistentHash{ns=1, owners = (3)[Test-NodeA: 0+1, Test-NodeB: 0+1, Test-NodeC: 1+0]}, actualMembers=[Test-NodeA, Test-NodeB, Test-NodeC], persistentUUIDs=[a861c235-ccf1-4f01-857e-5810a9bbced0, 48f2b81e-7f18-456c-ab36-b587e8a3a235, a57502c1-343f-47f8-86ed-d87a6cda39b0]} ### This message is logged between the 2 read locks 11:35:10,740 TRACE (jgroups-8,Test-NodeA:[]) [EntryWrappingInterceptor] About to commit entry ReadCommittedEntry(f3a7a72){key=testkey, value=v1, oldValue=null, isCreated=true, isChanged=true, isRemoved=false, isExpired=false, isCommited=false, skipLookup=false, metadata=EmbeddedExpirableMetadata{version=null, lifespan=-1, maxIdle=-1}, oldMetadata=null, internalMetadata=null} 11:35:10,740 TRACE (non-blocking-thread-Test-NodeA-p39961-t1:[]) [DefaultSegmentedDataContainer] Ensuring segments {0} are started 11:36:10,770 ERROR (testng-Test:[]) [TestingUtil] Cache defaultcache timed out waiting for rebalancing to complete on node Test-NodeA, expected member list is [Test-NodeA, Test-NodeB, Test-NodeC], current member list is [Test-NodeA, Test-NodeB]! 11:36:10,771 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut java.lang.RuntimeException: Cache defaultcache timed out waiting for rebalancing to complete on node Test-NodeA, expected member list is [Test-NodeA, Test-NodeB, Test-NodeC], current member list is [Test-NodeA, Test-NodeB]! at org.infinispan.test.TestingUtil.waitForNoRebalance(TestingUtil.java:452) ~[test-classes/:?] at org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.doTest(NonTxPrimaryOwnerBecomingNonOwnerTest.java:168) ~[test-classes/:?] at org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut(NonTxPrimaryOwnerBecomingNonOwnerTest.java:68) ~[test-classes/:?]
The read lock in EntryWrappingInterceptor.applyChanges() seems to be obsolete: there are other code paths in EntryWrappingInterceptor that also commit entries, but do not acquire the state transfer read lock. Removing this read lock acquisition should fix the deadlock.
Attachments
Issue Links
- relates to
-
ISPN-10753 StateTransferLockImpl should use a StampedLock
- Closed