-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
8.2.6.Final, 9.0.0.Final
-
None
-
None
Scenario:
- 3 nodes, server mode with Partition handling enabled
- 2 nodes are killed and bring back online
- the nodes are unable to merge and the cluster remains in degraded mode.
I suspect that the FORK channel/protocol is the culprit since the heartbeat command is never handled in the joiner node, but the coordinator receives a CacheNotFoundResponse quickly (i.e. without timeout). The request is received and "delivered" but never reaches Infinispan.
When starting node 1 (logs from coordinator):
Received new cluster view: 5, isCoordinator = true, old status = COORDINATOR Received new cluster view: 5, isCoordinator = true, old status = COORDINATOR //hearbeat sent, ClusterTopologyManagerImpl.confirmMembersAvailable(); Responses: value=CacheNotFoundResponse, received=true, suspected=false Node node01-47572 left while updating cache members //the view is not handled
When I started node 2:
Received new cluster view: 6, isCoordinator = true, old status = COORDINATOR Updating cluster members for all the caches. New list is [node03-48579, node01-47572, node02-32959] //hearbeat sent, ClusterTopologyManagerImpl.confirmMembersAvailable(); Responses: Responses{ node01-47572: value=SuccessfulResponse{responseValue=true} , received=true, suspected=false node02-32959: value=CacheNotFoundResponse, received=true, suspected=false} Node node02-32959 left while updating cache members //the view is not handled
It is always reproducible. The configuration is
<replicated-cache name="default" mode="SYNC" batching="true"> <partition-handling enabled="true"/> <locking isolation="REPEATABLE_READ"/> <state-transfer enabled="false"/>
- is related to
-
ISPN-5290 Better automatic merge for caches with enabled partition handling
- To Do