Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-7800

Cluster always in Degraded Mode

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 8.2.6.Final, 9.0.0.Final
    • None
    • None

      Scenario:

      • 3 nodes, server mode with Partition handling enabled
      • 2 nodes are killed and bring back online
      • the nodes are unable to merge and the cluster remains in degraded mode.

      I suspect that the FORK channel/protocol is the culprit since the heartbeat command is never handled in the joiner node, but the coordinator receives a CacheNotFoundResponse quickly (i.e. without timeout). The request is received and "delivered" but never reaches Infinispan.

      When starting node 1 (logs from coordinator):

      Received new cluster view: 5, isCoordinator = true, old status = COORDINATOR
      Received new cluster view: 5, isCoordinator = true, old status = COORDINATOR
      //hearbeat sent, ClusterTopologyManagerImpl.confirmMembersAvailable();
      Responses: value=CacheNotFoundResponse, received=true, suspected=false
      Node node01-47572 left while updating cache members
      //the view is not handled
      

      When I started node 2:

      Received new cluster view: 6, isCoordinator = true, old status = COORDINATOR
      Updating cluster members for all the caches. New list is [node03-48579, node01-47572, node02-32959]
      //hearbeat sent, ClusterTopologyManagerImpl.confirmMembersAvailable();
      Responses: Responses{
        node01-47572: value=SuccessfulResponse{responseValue=true} , received=true, suspected=false
        node02-32959: value=CacheNotFoundResponse, received=true, suspected=false}
      Node node02-32959 left while updating cache members
      //the view is not handled
      

      It is always reproducible. The configuration is

      <replicated-cache name="default" mode="SYNC" batching="true">
        <partition-handling enabled="true"/>
        <locking isolation="REPEATABLE_READ"/>
      <state-transfer enabled="false"/>
      

            Unassigned Unassigned
            pruivo@redhat.com Pedro Ruivo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: