Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-4831

Node cannot join cluster correctly after restart

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • 5.2.6.Final
    • State Transfer
    • None

    Description

      Node is not able to join the cluster again after restart.
      Node 1 was restarted, later it seems that the cluster was disconnected for a while and node1 is failed to become a coordinator.
      Then the node1 was restarted again, but some state transfer timeouts and the cluster was not connected any more, until both nodes restart:

      Node 1 (the one with restart at 3:05 and 3:07):

      Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770, LOUAPPWPS983-56765]
      Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport startJGroupsChannelIfNeeded
      INFO: ISPN000079: Cache local address is LOUAPPWPS983-56765, physical addresses are [133.27.18.204:7800]
      Sep 29, 2014 3:05:21 AM org.infinispan.factories.GlobalComponentRegistry start
      INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
      Sep 29, 2014 3:05:22 AM org.infinispan.factories.TransactionManagerFactory construct
      INFO: ISPN000161: Using a batchMode transaction manager
      Sep 29, 2014 3:05:22 AM org.infinispan.jmx.CacheJmxRegistration start
      INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
      Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
      INFO: ISPN000161: Using a batchMode transaction manager
      Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
      INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
      Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
      INFO: ISPN000161: Using a batchMode transaction manager
      Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
      INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
      Sep 29, 2014 3:06:18 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000094: Received new cluster view: [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
      Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17] [LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
      Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
      INFO: ISPN000080: Disconnecting and closing JGroups Channel
      Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
      INFO: ISPN000082: Stopping the RpcDispatcher
      Sep 29, 2014 3:08:45 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport start
      INFO: ISPN000078: Starting JGroups Channel
      Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770, LOUAPPWPS983-54866]
      Sep 29, 2014 3:08:47 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport startJGroupsChannelIfNeeded
      INFO: ISPN000079: Cache local address is LOUAPPWPS983-54866, physical addresses are [133.27.18.204:7800]
      Sep 29, 2014 3:08:47 AM org.infinispan.factories.GlobalComponentRegistry start
      INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
      Sep 29, 2014 3:08:47 AM org.infinispan.factories.TransactionManagerFactory construct
      INFO: ISPN000161: Using a batchMode transaction manager
      Sep 29, 2014 3:08:47 AM org.infinispan.jmx.CacheJmxRegistration start
      INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
      Sep 29, 2014 3:12:47 AM org.apache.catalina.core.ApplicationContext log
      SEVERE: StandardWrapper.Throwable
      org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
      at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
      at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
      at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
      at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
      at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
      at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
      at org.infinispan.CacheImpl.start(CacheImpl.java:582)
      at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
      at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
      at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
      at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:66)
      at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:65)
      at scala.collection.Iterator$class.foreach(Iterator.scala:727)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1156)
      at org.infinispan.rest.StartupListener.init(StartupListener.scala:65)
      at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1280)
      at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1193)
      at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1088)
      at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5176)
      at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5460)
      at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
      at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
      at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
      at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:633)
      at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1120)
      at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1678)
      at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index on LOUAPPWPS983-54866
      at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
      ... 30 more

      Node 2 (without restart):
      Sep 29, 2014 3:03:54 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|14] [LOUAPPWPS984-56770]
      Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770, LOUAPPWPS983-56765]
      Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17] [LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
      Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|18] [LOUAPPWPS984-56770]
      Sep 29, 2014 3:07:33 AM org.infinispan.topology.ClusterTopologyManagerImpl handleNewView
      ERROR: ISPN000196: Failed to recover cluster state after the current node became the coordinator
      java.lang.IllegalStateException: Trying to set a topology with invalid members for cache eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index: members = [LOUAPPWPS984-56770], topology = CacheTopology{id=37, currentCH=DefaultConsistentHash

      {numSegments=60, numOwners=1, members=[LOUAPPWPS984-56770, LOUAPPWPS983-56765]}

      , pendingCH=null}
      at org.infinispan.topology.ClusterCacheStatus.updateCacheTopology(ClusterCacheStatus.java:165)
      at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:315)
      at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
      at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:579)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
      INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770, LOUAPPWPS983-54866]
      Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
      WARN: ISPN000211: Transactions were requested by node LOUAPPWPS983-54866 with topology 30, older than the local topology (35)
      Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
      WARN: ISPN000212: Segments were requested by node LOUAPPWPS983-54866 with topology 30, older than the local topology (35)
      Sep 29, 2014 12:42:27 PM org.infinispan.transaction.TransactionTable shutDownGracefully
      WARN: ISPN000100: Stopping, but there are 0 local transactions and 16 remote transactions that did not finish in time.
      Sep 29, 2014 12:42:27 PM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
      INFO: ISPN000080: Disconnecting and closing JGroups Channel
      Sep 29, 2014 12:42:29 PM org.jgroups.logging.JDKLogImpl warn
      WARNING: LOUAPPWPS984-56770: failed to collect all ACKs (expected=1) for view [LOUAPPWPS983-54866|20] after 2000ms, missing ACKs from [LOUAPPWPS983-54866]
      Sep 29, 2014 12:42:30 PM org.jgroups.logging.JDKLogImpl error

      Attachments

        1. configuration.txt
          1 kB
        2. node1-configuration.xml
          2 kB
        3. node1WithRestart.txt
          593 kB
        4. node2-configuration.xml
          2 kB
        5. node2WithoutRestart.log
          35 kB

        Issue Links

          Activity

            People

              Unassigned Unassigned
              maruta.s Marta Sedlakova (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: