Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 8.1.5.Final, 8.2.2.Final, 9.0.0.Final
Affects Version/s: 8.0.1.Final, 7.2.5.Final, 8.1.0.Alpha2, 9.0.0.Final
Component/s: Core, Test Suite
Labels:
- testsuite_stability

Git Pull Request:
https://github.com/infinispan/infinispan/pull/3782, https://github.com/infinispan/infinispan/pull/3877, https://github.com/infinispan/infinispan/pull/4321, https://github.com/infinispan/infinispan/pull/4327, https://github.com/infinispan/infinispan/pull/4328

LocalTopologyManagerImpl is responsible for sending the ClusterTopologyControlCommand(GET_STATUS) response, and when it sends the response it doesn't check the current view id against the new coordinator's view id. If the old coordinator already sent a topology update before the merge, that topology update might be processed after sending the status response. The new coordinator will send a topology update with a topology id of max(status response topology ids) + 1. The node will then process the topology update from the old coordinator, but it will ignore the topology update from the new coordinator with the same topology id.

This is extra common in the partition handling tests, e.g. BasePessimisticTxPartitionAndMergeTest subclasses, because the test "injects" the JGroups view on each node serially, and often the 4th node sends the status response before it gets the new view.

22:16:37,776 DEBUG (remote-thread-NodeD-p26-t6:[]) [LocalTopologyManagerImpl] Sending cluster status response for view 10
// Topology from NodeC
22:16:37,778 DEBUG (transport-thread-NodeD-p28-t2:[]) [LocalTopologyManagerImpl] Updating local topology for cache pes-cache: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeC-46467, NodeD-30486]}
// Later, topology from NodeA
22:16:37,827 DEBUG (transport-thread-NodeD-p28-t1:[]) [LocalTopologyManagerImpl] Ignoring late consistent hash update for cache pes-cache, current topology is 8: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeA-37631, NodeB-47846, NodeC-46467, NodeD-30486]}

As a solution, we can delay sending the status response until we have the same view as the coordinator (or a later one). We already check that the sender is the current coordinator before applying a topology update, so this will guarantee that the we don't apply other topology updates from the old coordinator. Since the status request is only sent after the new view was installed, this will not introduce any delays in the vast majority of cases.

is incorporated by

ISPN-6455 ClusterTopologyManagerTest timing out randomly

Closed

JBEAP-6986 (7.0.z) ISPN-5883 - Node can apply new topology after sending status response

Closed

JBEAP-4841 (7.1.0) Upgrade Infinispan to 8.1.5.Final

Closed

relates to

ISPN-5956 OptimisticTxPartitionAndMergeDuringRollbackTest.testDegradedPartition random failures

Closed

Assignee:: Dan Berindei (Inactive)

Reporter:: Dan Berindei (Inactive)

Archiver:: Amol Dongare

Created:: 2015/10/26 2:46 AM

Updated:: 2024/07/15 9:27 AM

Resolved:: 2016/05/16 2:58 PM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty