Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-13160

Write may block topology update forever

    XMLWordPrintable

Details

    Description

      ISPN-10753 changed StateTransferLockImpl to use a StampLock, which is not reentrant.

      Acquiring the read lock twice from the same thread is possible, because the read lock is not exclusive but it brings a deadlock risk:

      1. thread 1 acquires the read lock in EntryWrappingInterceptor.applyChanges()
      2. thread 2 tries to acquire the write lock in StateConsumerImpl.onTopologyUpdate() and blocks
      3. thread 1 tries to acquire the read lock a second time in ClusteringDependentLogic.DistributionLogic.commitSingleEntry() and also blocks

      This actually happens in the test suite, causing random failures in NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut():

      11:35:10,740 TRACE (non-blocking-thread-Test-NodeA-p39961-t1:[]) [StateConsumerImpl] Received new topology for cache defaultcache, isRebalance = false, isMember = true, topology = CacheTopology{id=8, phase=READ_NEW_WRITE_ALL, rebalanceId=3, currentCH=DefaultConsistentHash{ns=1, owners = (2)[Test-NodeA: 1+0, Test-NodeB: 0+1]}, pendingCH=DefaultConsistentHash{ns=1, owners = (3)[Test-NodeA: 0+1, Test-NodeB: 0+0, Test-NodeC: 1+0]}, unionCH=DefaultConsistentHash{ns=1, owners = (3)[Test-NodeA: 0+1, Test-NodeB: 0+1, Test-NodeC: 1+0]}, actualMembers=[Test-NodeA, Test-NodeB, Test-NodeC], persistentUUIDs=[a861c235-ccf1-4f01-857e-5810a9bbced0, 48f2b81e-7f18-456c-ab36-b587e8a3a235, a57502c1-343f-47f8-86ed-d87a6cda39b0]}
      ### This message is logged between the 2 read locks
      11:35:10,740 TRACE (jgroups-8,Test-NodeA:[]) [EntryWrappingInterceptor] About to commit entry ReadCommittedEntry(f3a7a72){key=testkey, value=v1, oldValue=null, isCreated=true, isChanged=true, isRemoved=false, isExpired=false, isCommited=false, skipLookup=false, metadata=EmbeddedExpirableMetadata{version=null, lifespan=-1, maxIdle=-1}, oldMetadata=null, internalMetadata=null}
      11:35:10,740 TRACE (non-blocking-thread-Test-NodeA-p39961-t1:[]) [DefaultSegmentedDataContainer] Ensuring segments {0} are started
      11:36:10,770 ERROR (testng-Test:[]) [TestingUtil] Cache defaultcache timed out waiting for rebalancing to complete on node Test-NodeA, expected member list is [Test-NodeA, Test-NodeB, Test-NodeC], current member list is [Test-NodeA, Test-NodeB]!
      11:36:10,771 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut
      java.lang.RuntimeException: Cache defaultcache timed out waiting for rebalancing to complete on node Test-NodeA, expected member list is [Test-NodeA, Test-NodeB, Test-NodeC], current member list is [Test-NodeA, Test-NodeB]!
      	at org.infinispan.test.TestingUtil.waitForNoRebalance(TestingUtil.java:452) ~[test-classes/:?]
      	at org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.doTest(NonTxPrimaryOwnerBecomingNonOwnerTest.java:168) ~[test-classes/:?]
      	at org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut(NonTxPrimaryOwnerBecomingNonOwnerTest.java:68) ~[test-classes/:?]
      

      The read lock in EntryWrappingInterceptor.applyChanges() seems to be obsolete: there are other code paths in EntryWrappingInterceptor that also commit entries, but do not acquire the state transfer read lock. Removing this read lock acquisition should fix the deadlock.

      Attachments

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: