Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-425

Stale data read when L1 invalidation happens while UnionConsistentHash is in use

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 4.1.0.BETA1
    • 4.1.0.ALPHA3
    • Core
    • None

      See below:

      ----- "Manik Surtani" <manik@jboss.org> wrote:

      > On 3 May 2010, at 08:51, Galder Zamarreno wrote:
      >
      > > Resending without log until the message is approved.
      > >
      > > –
      > > Galder Zamarreño
      > > Sr. Software Engineer
      > > Infinispan, JBoss Cache
      > >
      > > ----- Forwarded Message -----
      > > From: galder@redhat.com
      > > To: "infinispan -Dev List" <infinispan-dev@lists.jboss.org>
      > > Sent: Friday, April 30, 2010 6:30:05 PM GMT +01:00 Amsterdam /
      > Berlin / Bern / Rome / Stockholm / Vienna
      > > Subject: Stale data read when L1 invalidation happens while
      > UnionConsistentHash is in use
      > >
      > > Hi,
      > >
      > > I've spent all day chasing down a random Hot Rod testsuite failure
      > related to distribution. This is the last hurdle to close
      > https://jira.jboss.org/jira/browse/ISPN-411. In
      > HotRodDistributionTest, which is still to be committed, I test adding
      > a new node, doing a put on this node, and then doing a get in a
      > different node and making sure that I get what was put. The test
      > randomly fails saying that the get returns the old value. The failure
      > is nothing to do with Hot Rod itself but rather a race condition where
      > union consistent hash is used. Let me explain:
      > >
      > > 1. An earlier operation had set
      > "k-testDistributedPutWithTopologyChanges" key to
      > "v5-testDistributedPutWithTopologyChanges".
      > > 2. Start a new hot rod server in eq-7969.
      > > 2. eq-7969 node calls a put on that key with
      > "v6-testDistributedPutWithTopologyChanges". Recipients for the put
      > are: eq-7969 and eq-61332.
      > > 3. eq-7969 sends an invalidate L1 to all, including eq-13415
      > > 4. eq-13415 should invalidate
      > "k-testDistributedPutWithTopologyChanges" but it doesn't, since it
      > considers that "k-testDistributedPutWithTopologyChanges" is local to
      > eq-13415:
      > >
      > > 2010-04-30 18:02:19,907 6046 TRACE
      > [org.infinispan.distribution.DefaultConsistentHash]
      > (OOB-2,Infinispan-Cluster,eq-13415 Hash code for key
      > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
      > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} is 344897059
      > > 2010-04-30 18:02:19,907 6046 TRACE
      > [org.infinispan.distribution.DefaultConsistentHash]
      > (OOB-2,Infinispan-Cluster,eq-13415 Candidates for key
      > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
      > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} are

      {5458=eq-7969, > 6831=eq-61332}

      > > 2010-04-30 18:02:19,907 6046 TRACE
      > [org.infinispan.distribution.DistributionManagerImpl]
      > (OOB-2,Infinispan-Cluster,eq-13415 Is local
      > CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
      > 116, 101, 115, 116, 68, 105, 115, 116, ..]}} to eq-13415 query returns
      > true and consistentHash is
      > org.infinispan.distribution.UnionConsistentHash@10747b4
      > >
      > > This is a log with log messages that I added to debug it. The key
      > factor here is that UnionConsistentHash is in use, probably due to
      > rehashing not having fully finished.
      > >
      > > 5. The end result is that a read of
      > "k-testDistributedPutWithTopologyChanges" in eq-13415 returns
      > "v5-testDistributedPutWithTopologyChanges".
      > >
      > > I thought that maybe we could be more conservative here and if
      > rehashing is in progress (or UnionConsistentHash is in use) invalidate
      > regardless. Assuming that a put always follows an invalidation in
      > distribution and not viceversa, that would be fine. The only downside
      > is that you'd be invalidating too much but put would replace the data
      > in the node where invalidation should not have happened but it did, so
      > not a problem.
      > >
      > > Thoughts? Alternatively, maybe I need to shape my test so that I
      > wait for rehashing to finish, but the problem would still be there.
      >
      > Yes, this seems to be a bug with concurrent rehashing and invalidation
      > rather than HotRod.
      >
      > Could you modify your test to so the following:
      >
      > 1. start 2 caches C1 and C2.
      > 2. put a key K such that K maps on to C1 and C2
      > 3. add a new node, C3. K should now map to C1 and C3.
      > 4. Modify the value on C1 before rehashing completes.
      > 5. See if we see the stale value on C2.
      >
      > To do this you would need a custom object for K that hashes the way
      > you would expect (this could be hardcoded) and a value which blocks
      > when serializing so we can control how long rehashing takes.

      Since logical addresses are used underneath and these change from one run to the other, I'm not sure how I can generate such key programatically. It's even more complicated to figure out a key that will later, when C3 starts, map to it. Without having these addresses locked somehow, or their hash codes, I can't see how this is doable. IOW, to be able to do this, I need to mock these addresses into giving fixed as hash codes. I'll dig further into this.

      >
      > I never promised the test would be simple!
      >
      > Cheers
      > Manik
      > –
      > Manik Surtani
      > manik@jboss.org
      > Lead, Infinispan
      > Lead, JBoss Cache
      > http://www.infinispan.org
      > http://www.jbosscache.org
      >
      >
      >
      >
      >
      > _______________________________________________
      > infinispan-dev mailing list
      > infinispan-dev@lists.jboss.org
      > https://lists.jboss.org/mailman/listinfo/infinispan-dev

      _______________________________________________
      infinispan-dev mailing list
      infinispan-dev@lists.jboss.org
      https://lists.jboss.org/mailman/listinfo/infinispan-dev

              rh-ee-galder Galder Zamarreño
              rh-ee-galder Galder Zamarreño
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: