Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-799

JoinTask as it invalidates L1 entries should be given precedence in acquiring locks

    XMLWordPrintable

Details

    Description

      The SingleJoinTest transaction test failure itself is intermittent due to the way addresses are organised in the hash wheel, so you are correct that it is a timing issue. Anyway, it still is a very real problem. Just to re-iterate and to make sure we are talking about the same thing:

      1. View is

      {A, B, C}

      2. K is mapped to

      {A, B}
      3. A tx starts to update K, and is prepared. Locks now held for K on {A, B}

      4. D joins. D is placed on the hash wheel between A and B. So the new view is

      {A, D, B, C}

      5. As per the test (artificial, I know, but could still happen), the tx waits for a long time before committing. In the case of the test, artificially waits until D has finished joining before committing, by use of a latch.
      6. D never joins as even though it receives the prepare for the tx and could potentially commit itself (as a new owner), it fails as it is unable to invalidate K on B.

      There are a few solutions here:

      1) This is pretty easy to detect. Attempt to acquire the lock with a smaller lock acquisition timeout and if the transaction is still stuck, abort the transaction and proceed with the join.
      2) If the blocking node is not the transaction originator (as in this case: the tx was started on A), then just force lock removal and tx rollback on B only. Let the tx complete on A, since the new joiner will receive the transactional event and will be able to apply it as a new owner.

      My vote is to go for solution 1 - a bit more crude, but 2 would be very complex to implement. And even then, would only solve for the invalidation being blocked on a node that did not originate the transaction. E.g., the tx originated on A but the lock issue was on B. If, however, the tx originated on B, and B no longer owns the entry in question, then 2 is no longer a solution and the only solution would be 1.

      Attachments

        Activity

          People

            vblagoje Vladimir Blagojevic (Inactive)
            vblagoje Vladimir Blagojevic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: