Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 5.2.7.Final, 5.3.0.Final
Affects Version/s: 5.3.0.Final
Component/s: State Transfer
Labels:
- 5.2.x
- nbst

Git Pull Request:
https://github.com/infinispan/infinispan/pull/1826, https://github.com/infinispan/infinispan/pull/1847

This causes random failures in ConcurrentOverlappingLeaveTest and ConcurrentNonOverlappingLeaveTest.

1. Starting with a 4-node cluster: [E, F, G, H] (topology 7).
2. F leaves, and E sends a REBALANCE_START command with nodes [E, G, H] (topology 8). Some segments are owned by [H] in the current CH and by [H, G] in the pending CH.
3. E reports that it finished receiving state with a REBAlANCE_CONFIRM command.
4. H leaves, and E sends a CH_UPDATE command with nodes [E, G] (topology 9).
The segments that were owned by [H] in the previous currentCH are assigned to [E, G] in the new currentCH (otherwise they wouldn't have any owners).
5. The StateConsumerImpl on E requests state for the "lost" segments from G.
6. G confirms the end of the rebalance as well, and E sends a CH_UPDATE command to end the rebalance (topology 10).
7. E sends a REBALANCE_START command to assign all segments for [E, G] (topology 11).
8. While the StateConsumerImpl on E is starting the state transfer, it also receives a StateResponseCommand for the lost segments from G.
9. Because the structures keeping track of the received state are not properly initialized, E considers it finished receiving state for topology 11.
10. E receives a StateResponseCommand from G with actual data, but it ignores it because StateConsumerImpl.updatedKeys == null.

11:30:39,807 DEBUG (transport-thread-4,NodeE:dist) [LocalTopologyManagerImpl] Updating local consistent hash(es) for cache dist: new topology = CacheTopology{id=7, currentCH=DefaultConsistentHash{numSegments=60, numOwners=2, members=[NodeE-51027, NodeG-6339, NodeH-47370]}, pendingCH=null}
11:30:39,810 DEBUG (transport-thread-3,NodeE:dist) [LocalTopologyManagerImpl] Starting local rebalance for cache dist, topology = CacheTopology{id=8, currentCH=DefaultConsistentHash{numSegments=60, numOwners=2, members=[NodeE-51027, NodeG-6339, NodeH-47370]}, pendingCH=DefaultConsistentHash{numSegments=60, numOwners=2, members=[NodeE-51027, NodeG-6339, NodeH-47370]}}
11:30:39,817 DEBUG (transport-thread-3,NodeE:dist) [StateConsumerImpl] Finished receiving of segments for cache dist for topology 8.
11:30:39,832 DEBUG (transport-thread-4,NodeE:dist) [LocalTopologyManagerImpl] Updating local consistent hash(es) for cache dist: new topology = CacheTopology{id=9, currentCH=DefaultConsistentHash{numSegments=60, numOwners=2, members=[NodeE-51027, NodeG-6339]}, pendingCH=DefaultConsistentHash{numSegments=60, numOwners=2, members=[NodeE-51027, NodeG-6339]}}
11:30:39,834 DEBUG (transport-thread-4,NodeE:dist) [StateConsumerImpl] Adding inbound state transfer for segments [38, 36, 47, 44, 45] of cache dist
11:30:39,853 DEBUG (transport-thread-3,NodeE:dist) [LocalTopologyManagerImpl] Starting local rebalance for cache dist, topology = CacheTopology{id=11, currentCH=DefaultConsistentHash{numSegments=60, numOwners=2, members=[NodeE-51027, NodeG-6339]}, pendingCH=DefaultConsistentHash{numSegments=60, numOwners=2, members=[NodeE-51027, NodeG-6339]}}
11:30:39,859 TRACE (remote-thread-1,NodeE:) [InboundInvocationHandlerImpl] Calling perform() on StateResponseCommand{cache=dist, origin=NodeG-6339, topologyId=9}
11:30:39,864 DEBUG (remote-thread-1,NodeE:dist) [StateConsumerImpl] Finished receiving of segments for cache dist for topology 11.
11:30:39,866 TRACE (transport-thread-5,NodeE:dist) [LocalTopologyManagerImpl] Ignoring consistent hash update 10 for cache dist, we have already received a newer topology 11
11:30:39,868 TRACE (remote-thread-1,NodeE:) [InboundInvocationHandlerImpl] Calling perform() on StateResponseCommand{cache=dist, origin=NodeG-6339, topologyId=11}
11:30:39,872 TRACE (remote-thread-1,NodeE:dist dist) [EntryWrappingInterceptor] State transfer will not write key/value MagicKey#k3{672f69c9@NodeG-6339}/v3 because it was already updated by somebody else
11:30:40,582 ERROR (testng-ConcurrentNonOverlappingLeaveTest:) [UnitTestTestNGListener] Test testTransactional(org.infinispan.distribution.rehash.ConcurrentNonOverlappingLeaveTest) failed.
java.lang.AssertionError: Fail on owner cache NodeE-51027: dc.get(MagicKey#k3{672f69c9@NodeG-6339}) returned null!

Assignee:: Dan Berindei (Inactive)

Reporter:: Dan Berindei (Inactive)

Archiver:: Amol Dongare

Created:: 2013/05/20 3:21 AM

Updated:: 2024/07/15 9:09 AM

Resolved:: 2013/05/29 9:42 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty