-
Bug
-
Resolution: Done
-
Critical
-
13.0.0.Final
ISPN-10753 changed StateTransferLockImpl to use a StampLock, which is not reentrant.
Acquiring the read lock twice from the same thread is possible, because the read lock is not exclusive but it brings a deadlock risk:
- thread 1 acquires the read lock in EntryWrappingInterceptor.applyChanges()
- thread 2 tries to acquire the write lock in StateConsumerImpl.onTopologyUpdate() and blocks
- thread 1 tries to acquire the read lock a second time in ClusteringDependentLogic.DistributionLogic.commitSingleEntry() and also blocks
This actually happens in the test suite, causing random failures in NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut():
11:35:10,740 TRACE (non-blocking-thread-Test-NodeA-p39961-t1:[]) [StateConsumerImpl] Received new topology for cache defaultcache, isRebalance = false, isMember = true, topology = CacheTopology{id=8, phase=READ_NEW_WRITE_ALL, rebalanceId=3, currentCH=DefaultConsistentHash{ns=1, owners = (2)[Test-NodeA: 1+0, Test-NodeB: 0+1]}, pendingCH=DefaultConsistentHash{ns=1, owners = (3)[Test-NodeA: 0+1, Test-NodeB: 0+0, Test-NodeC: 1+0]}, unionCH=DefaultConsistentHash{ns=1, owners = (3)[Test-NodeA: 0+1, Test-NodeB: 0+1, Test-NodeC: 1+0]}, actualMembers=[Test-NodeA, Test-NodeB, Test-NodeC], persistentUUIDs=[a861c235-ccf1-4f01-857e-5810a9bbced0, 48f2b81e-7f18-456c-ab36-b587e8a3a235, a57502c1-343f-47f8-86ed-d87a6cda39b0]} ### This message is logged between the 2 read locks 11:35:10,740 TRACE (jgroups-8,Test-NodeA:[]) [EntryWrappingInterceptor] About to commit entry ReadCommittedEntry(f3a7a72){key=testkey, value=v1, oldValue=null, isCreated=true, isChanged=true, isRemoved=false, isExpired=false, isCommited=false, skipLookup=false, metadata=EmbeddedExpirableMetadata{version=null, lifespan=-1, maxIdle=-1}, oldMetadata=null, internalMetadata=null} 11:35:10,740 TRACE (non-blocking-thread-Test-NodeA-p39961-t1:[]) [DefaultSegmentedDataContainer] Ensuring segments {0} are started 11:36:10,770 ERROR (testng-Test:[]) [TestingUtil] Cache defaultcache timed out waiting for rebalancing to complete on node Test-NodeA, expected member list is [Test-NodeA, Test-NodeB, Test-NodeC], current member list is [Test-NodeA, Test-NodeB]! 11:36:10,771 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut java.lang.RuntimeException: Cache defaultcache timed out waiting for rebalancing to complete on node Test-NodeA, expected member list is [Test-NodeA, Test-NodeB, Test-NodeC], current member list is [Test-NodeA, Test-NodeB]! at org.infinispan.test.TestingUtil.waitForNoRebalance(TestingUtil.java:452) ~[test-classes/:?] at org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.doTest(NonTxPrimaryOwnerBecomingNonOwnerTest.java:168) ~[test-classes/:?] at org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut(NonTxPrimaryOwnerBecomingNonOwnerTest.java:68) ~[test-classes/:?]
The read lock in EntryWrappingInterceptor.applyChanges() seems to be obsolete: there are other code paths in EntryWrappingInterceptor that also commit entries, but do not acquire the state transfer read lock. Removing this read lock acquisition should fix the deadlock.
- relates to
-
ISPN-10753 StateTransferLockImpl should use a StampedLock
- Closed