Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-932

Failed nodes remain in the topology.


    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 4.2.1.FINAL
    • None
    • Core
    • None

      A node will remain in the cluster topology even if it never enters the RUNNING state.

      1. CacheDelegate.start
      2. ComponentRegistry.start
      3. AbstractComponentRegistry.start
      4. AbstractComponentRegistry.internalStart
      5. AbstractComponentRegistry.handleLifecycleTransitionFailure

      The last start method will execute the @Start methods of the components. In the event that one of the methods throws an exception, the node will enter the FAILED state.

      The problem is that in distributed mode the node is added to the cluster topology before the rehashing takes place. If an exception is thrown during the rehash, the join still completes successfully.

      1. Broadcast new consistent hash.
      2. Get state.
      3. Invalidate state. (This is in a finally block. Occurs even if get state fails.)
      4. Complete join. (This is in a finally block. Occurs even if get state/invalidation fail.)

      There needs to be a way to remove a node from the topology if it enters the FAILED state. Or, perhaps wait to add it to the topology until it enters the RUNNING state.

            manik_jira Manik Surtani (Inactive)
            shane_dev_jira Shane Johnson (Inactive)
            0 Vote for this issue
            3 Start watching this issue