Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-4766

Cache can't start if coordinator leaves during join and joiner becomes the new coordinator

    XMLWordPrintable

Details

    Description

      When the joiner becomes the coordinator, it tries to recover the current cache topologies, but it receives just one expected member and no current topology. This causes a NPE in ClusterCacheStatus:

      22:51:49,547 ERROR (transport-thread-NodeB-p21124-t1:) [ClusterCacheStatus] ISPN000228: Failed to recover cache dist state after the current node became the coordinator
      java.lang.NullPointerException
      	at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:104)
      	at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:452)
      	at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:260)
      	at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:180)
      	at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:427)
      

      The LocalTopologyManagerImpl waits a bit after receiving the SuspectException and tries again, but this time it receives a null initial topology, causing another NPE:

      22:51:51,319 DEBUG (testng-GlobalKeySetTaskTest:) [LocalTopologyManagerImpl] Error sending join request for cache dist to coordinator
      java.lang.NullPointerException
      	at org.infinispan.topology.LocalTopologyManagerImpl.resetLocalTopologyBeforeRebalance(LocalTopologyManagerImpl.java:222)
      	at org.infinispan.topology.LocalTopologyManagerImpl.handleTopologyUpdate(LocalTopologyManagerImpl.java:191)
      	at org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:105)
      	at org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:108)
      

      This keeps going on until the state transfer timeout expires.

      Attachments

        Activity

          People

            dberinde@redhat.com Dan Berindei (Inactive)
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: