Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9198

Node X left the cluster - SuspectException: ISPN000400: Node X was suspected

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Critical
    • None
    • None
    • None
    • None
    • Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32, DataGrid Sprint #33

    Description

      After the commit df9ffb5ba46752d2509aa3a08c59519469cc929a in Infinispan, the tests regression-cs-hotrod-dist-reads and regression-cs-hotrod-repl-reads are failing.

      I ran 3 times the same test with the commit df9ffb5ba46752d2509aa3a08c59519469cc929a and they are working.

      If we run with master for "regression-cs-hotrod-dist-reads" it will because of:

      11:18:04,874 INFO  [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
      11:22:06,470 INFO  [org.radargun.Slave] (sc-main) Stage 'BasicOperationsTest' should not be executed
      11:22:06,472 INFO  [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
      �[0m�[0m11:22:34,182 INFO  [org.infinispan.CLUSTER] (jgroups-112,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|8] (7) [slave3, slave6, slave0, slave5, slave4, slave2, slave1]
      �[0m�[0m11:22:34,190 INFO  [org.infinispan.CLUSTER] (jgroups-112,slave0) ISPN100001: Node slave7 left the cluster
      �[0m�[0m11:22:48,182 INFO  [org.infinispan.CLUSTER] (jgroups-115,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|9] (6) [slave3, slave6, slave0, slave5, slave4, slave1]
      �[0m�[0m11:22:48,191 INFO  [org.infinispan.CLUSTER] (jgroups-115,slave0) ISPN100001: Node slave2 left the cluster
      �[0m�[0m11:23:09,176 INFO  [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN000094: Received new cluster view for channel cluster: [slave3|10] (5) [slave3, slave6, slave0, slave5, slave1]
      �[0m�[0m11:23:09,179 INFO  [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN100001: Node slave4 left the cluster
      �[0m�[0m11:23:20,173 INFO  [org.infinispan.CLUSTER] (jgroups-121,slave0) ISPN000094: Received new cluster view for channel cluster: [slave6|11] (4) [slave6, slave0, slave5, slave1]
      �[0m�[0m11:23:20,178 INFO  [org.infinispan.CLUSTER] (jgroups-121,slave0) ISPN100001: Node slave3 left the cluster
      �[0m�[33m11:23:20,199 WARN  [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t60) ISPN000210: Failed to request state of cache memcachedCache from node slave3, segments {114 184 190-191}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
      	at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
      	at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
      	at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
      	at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
      	at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
      	at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
      	at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
      	at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
      	at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
      	at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      	Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
      		at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
      		at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
      		at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
      		at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
      		at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
      		at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
      		at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
      		at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
      		at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
      		at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
      		at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
      		... 3 more
      	Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected
      		at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
      		at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
      		at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
      		at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
      		at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
      		at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
      		at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
      		at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
      		at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
      		at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
      		... 3 more
      	[CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave3 was suspected]
      
      �[0m�[33m11:23:23,916 WARN  [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-117,slave0) JGRP000011: slave0: dropped message 319143 from non-member slave3 (view=[slave6|11] (4) [slave6, slave0, slave5, slave1])
      �[0m�[0m11:23:34,142 INFO  [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN000094: Received new cluster view for channel cluster: [slave6|12] (3) [slave6, slave0, slave5]
      �[0m�[0m11:23:34,145 INFO  [org.infinispan.CLUSTER] (jgroups-111,slave0) ISPN100001: Node slave1 left the cluster
      �[0m�[33m11:23:34,154 WARN  [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t61) ISPN000210: Failed to request state of cache hotrodDist from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      	at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
      	at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
      	at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
      	at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
      	at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
      	at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
      	at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
      	at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
      	at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
      	at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      	Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      		at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
      		at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
      		at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
      		at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
      		at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
      		at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
      		at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
      		at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
      		at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
      		at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
      		at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
      		... 3 more
      	Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      		at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
      		at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
      		at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
      		at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
      		at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
      		at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
      		at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
      		at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
      		at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
      		at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
      		... 3 more
      	[CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
      
      �[0m�[33m11:23:34,183 WARN  [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t58) ISPN000210: Failed to request state of cache rest from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      	at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
      	at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
      	at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
      	at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
      	at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
      	at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
      	at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
      	at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
      	at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
      	at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      	Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      		at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
      		at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
      		at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
      		at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
      		at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
      		at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
      		at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
      		at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
      		at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
      		at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
      		at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
      		... 3 more
      	Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      		at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
      		at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
      		at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
      		at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
      		at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
      		at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
      		at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
      		at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
      		at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
      		at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
      		... 3 more
      	[CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
      
      �[0m�[33m11:23:34,177 WARN  [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p5-t54) ISPN000210: Failed to request state of cache memcachedCache from node slave1, segments {59-60 71-74 78 81 146 180-181 185 192 217}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      	at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
      	at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
      	at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
      	at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
      	at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
      	at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
      	at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
      	at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
      	at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
      	at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      	Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      		at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
      		at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
      		at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:92)
      		at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:261)
      		at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
      		at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
      		at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1073)
      		at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:130)
      		at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
      		at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
      		at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
      		... 3 more
      	Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected
      		at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
      		at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
      		at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
      		at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
      		at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
      		at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$null$3(JGroupsTransport.java:672)
      		at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60)
      		at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
      		at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60)
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$receiveClusterView$4(JGroupsTransport.java:672)
      		at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
      		... 3 more
      	[CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node slave1 was suspected]
      
      �[0m�[31m11:23:34,203 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t5) ISPN000208: No live owners found for segments {59-60 71 73-74 78 81 146 180-181 185 192 217} of cache rest. Excluded owners: []
      �[0m�[31m11:23:34,318 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t11) ISPN000208: No live owners found for segments {59-60 71 73-74 78 81 146 180-181 185 192 217} of cache memcachedCache. Excluded owners: []
      �[0m�[33m11:23:34,332 WARN  [org.infinispan.statetransfer.StateConsumerImpl] (stateTransferExecutor-thread--p5-t61) Discarding received cache entries for segment 72 of cache memcachedCache because they do not belong to this node.
      �[0m�[33m11:23:34,396 WARN  [org.infinispan.statetransfer.StateConsumerImpl] (stateTransferExecutor-thread--p5-t57) Discarding received cache entries for segment 72 of cache memcachedCache because they do not belong to this node.
      �[0m�[31m11:23:34,398 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p4-t1) ISPN000208: No live owners found for segments {71 74 146} of cache hotrodDist. Excluded owners: []
      �[0m�[33m11:23:34,521 WARN  [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-123,slave0) JGRP000011: slave0: dropped message 338000 from non-member slave1 (view=[slave6|12] (3) [slave6, slave0, slave5])
      �[0m�[33m11:23:51,210 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-124,slave0) slave0: not member of view [slave6|13]; discarding it
      �[0m�[33m11:24:02,223 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([slave6|13]) doesn't match the current view-id ([slave6|12]); discarding delta view [slave6|14], ref-view=[slave6|13], left=[slave5]
      �[0m�[33m11:24:02,231 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: not member of view [slave6|14]; discarding it
      �[0m�[33m11:24:11,932 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-119,slave0) slave0: not member of view [slave5|15]; discarding it
      �[0m�[0m11:24:12,485 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-129,slave0) ISPN000094: Received new cluster view for channel cluster: [slave0|16] (2) [slave0, slave5]
      �[0m�[0m11:24:12,488 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-129,slave0) ISPN100001: Node slave6 left the cluster
      �[0m�[33m11:24:14,492 WARN  [org.jgroups.protocols.pbcast.GMS] (VERIFY_SUSPECT.TimerThread-129,slave0) slave0: failed to collect all ACKs (expected=1) for view [slave0|16] after 2000ms, missing 1 ACKs from (1) slave5
      �[0m�[33m11:24:34,209 WARN  [org.jgroups.protocols.pbcast.NAKACK2] (jgroups-128,slave0) JGRP000011: slave0: dropped message 319152 from non-member slave3 (view=[slave0|16] (2) [slave0, slave5]) (received 11 identical messages from slave3 in the last 70294 ms)
      �[0m�[0m11:25:10,121 INFO  [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN000093: Received new, MERGED cluster view for channel cluster: MergeView::[slave3|17] (7) [slave3, slave5, slave2, slave0, slave4, slave1, slave7], 6 subgroups: [slave5|15] (1) [slave5], [slave3|15] (2) [slave3, slave0], [slave0|16] (2) [slave0, slave5], [slave3|7] (8) [slave3, slave6, slave0, slave5, slave4, slave2, slave7, slave1], [slave3|8] (7) [slave3, slave6, slave0, slave5, slave4, slave2, slave1], [slave3|9] (6) [slave3, slave6, slave0, slave5, slave4, slave1]
      �[0m�[0m11:25:10,124 INFO  [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave3 joined the cluster
      �[0m�[0m11:25:10,127 INFO  [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave2 joined the cluster
      �[0m�[0m11:25:10,128 INFO  [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave4 joined the cluster
      �[0m�[0m11:25:10,129 INFO  [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave1 joined the cluster
      �[0m�[0m11:25:10,130 INFO  [org.infinispan.CLUSTER] (jgroups-128,slave0) ISPN100000: Node slave7 joined the cluster
      �[0m�[33m11:25:10,362 WARN  [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t58) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=49, phase=NO_REBALANCE, rebalanceId=14, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, pendingCH=null, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
      �[0m�[0m11:25:10,374 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 57
      �[0m�[0m11:25:10,374 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=__global_tx_table__]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
      �[0m�[0m11:25:10,374 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=___hotRodTopologyCache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
      �[0m�[0m11:25:10,374 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=___protobuf_metadata]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
      �[0m�[0m11:25:10,374 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t72) [Context=org.infinispan.CONFIG]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
      �[0m�[0m11:25:10,394 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100008: Updating cache members list [slave5], topology id 58
      �[0m�[0m11:25:10,415 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=___hotRodTopologyCache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
      �[0m�[0m11:25:10,415 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=___protobuf_metadata]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
      �[0m�[0m11:25:10,416 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=__global_tx_table__]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
      �[0m�[0m11:25:10,419 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=hotrodRepl]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 43
      �[0m�[0m11:25:10,420 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=rest]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 59
      �[0m�[0m11:25:10,420 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=___script_cache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 42
      �[0m�[33m11:25:10,419 WARN  [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t70) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=47, phase=READ_ALL_WRITE_ALL, rebalanceId=14, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 83+30, slave5: 88+37, slave0: 85+35]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
      �[0m�[0m11:25:10,422 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 52
      �[0m�[0m11:25:10,424 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t72) [Context=org.infinispan.CONFIG]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
      �[0m�[33m11:25:10,423 WARN  [org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy] (stateTransferExecutor-thread--p5-t58) ISPN000517: Ignoring cache topology from [slave0] during merge: CacheTopology{id=44, phase=READ_OLD_WRITE_ALL, rebalanceId=13, currentCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 83+30, slave5: 88+37, slave0: 85+35]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[slave6: 86+75, slave5: 82+89, slave0: 88+92]}, unionCH=null, actualMembers=[slave6, slave5, slave0], persistentUUIDs=[c1c2227d-2656-431e-a5b5-721459759a7f, 30123e75-2b33-46bc-a2e2-15f882782719, 2d550811-e842-4382-b8ae-3f36973f49f9]}
      �[0m�[0m11:25:10,426 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100008: Updating cache members list [slave5], topology id 53
      �[0m�[0m11:25:10,426 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100007: After merge (or coordinator change), recovered members [slave5] with topology id 49
      �[0m�[0m11:25:10,430 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100008: Updating cache members list [slave5], topology id 50
      �[0m�[0m11:25:10,431 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t71) [Context=hotrodRepl]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 44
      �[0m�[0m11:25:10,431 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t73) [Context=___script_cache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 43
      �[0m�[0m11:25:10,435 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t70) [Context=memcachedCache]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 54
      �[0m�[0m11:25:10,438 INFO  [org.infinispan.CLUSTER] (stateTransferExecutor-thread--p5-t58) [Context=hotrodDist]ISPN100002: Starting rebalance with members [slave5, slave0], phase READ_OLD_WRITE_ALL, topology id 51
      �[0m�[33m11:25:12,134 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-128,slave0) slave0: failed to collect all ACKs (expected=1) for view [slave3|17] after 2000ms, missing 1 ACKs from (1) slave5
      11:26:11,452 INFO  [org.radargun.Slave] (sc-main) Starting stage ClusterSplitVerify
      11:26:11,453 ERROR [org.radargun.stages.monitor.ClusterSplitVerifyStage] (sc-main) Cluster size at the beginning of the test was 8 but changed to 7 during the test! Perhaps a split occured, or a new node joined?
      11:26:11,454 INFO  [org.radargun.Slave] (sc-main) Finished stage ClusterSplitVerify
      11:26:11,455 INFO  [org.radargun.RemoteMasterConnection] (sc-main) Message successfully sent to the master
      11:26:11,571 INFO  [org.radargun.Slave] (sc-main) Starting stage ScenarioDestroy
      11:26:11,573 INFO  [org.radargun.stages.ScenarioDestroyStage] (sc-main) Scenario finished, destroying...
      11:26:11,575 INFO  [org.radargun.stages.ScenarioDestroyStage] (sc-main) Memory before cleanup: 
      Runtime free: 1,484,671 kb
      Runtime max:27,960,320 kb
      Runtime total:1,974,784 kb
      

      If we run with master for "regression-cs-hotrod-dist-writes" it will because of:

      21:39:52,651 INFO  [org.radargun.RemoteSlaveConnection] (main) Master started and listening for connection on: /172.18.1.18:2103
      21:39:52,651 INFO  [org.radargun.RemoteSlaveConnection] (main) Waiting 5 seconds for server socket to open completely
      21:39:57,655 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 16 slaves.
      21:39:57,666 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 15 slaves.
      21:39:57,667 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 14 slaves.
      21:39:57,668 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 13 slaves.
      21:39:57,669 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 12 slaves.
      21:39:57,670 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 11 slaves.
      21:39:57,671 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 10 slaves.
      21:39:57,672 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 9 slaves.
      21:39:57,674 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 8 slaves.
      21:39:57,675 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 7 slaves.
      21:39:57,676 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 6 slaves.
      21:39:57,677 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 5 slaves.
      21:39:57,678 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 4 slaves.
      21:39:57,708 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 3 slaves.
      21:39:57,710 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 2 slaves.
      21:39:57,711 INFO  [org.radargun.RemoteSlaveConnection] (main) Awaiting registration from 1 slaves.
      21:44:57,668 ERROR [org.radargun.Master] (main) Exception in Master.run: 
      java.io.IOException: 1 slaves haven't connected within timeout!
      	at org.radargun.RemoteSlaveConnection.establish(RemoteSlaveConnection.java:112) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
      	at org.radargun.Master.run(Master.java:59) [radargun-core-3.0.0-SNAPSHOT.jar:?]
      	at org.radargun.LaunchMaster.main(LaunchMaster.java:34) [radargun-core-3.0.0-SNAPSHOT.jar:?]
      21:44:57,697 WARN  [org.radargun.RemoteSlaveConnection] (main) Failed to send termination to slaves.
      java.lang.NullPointerException: null
      	at org.radargun.RemoteSlaveConnection$SlaveRecord.access$100(RemoteSlaveConnection.java:63) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
      	at org.radargun.RemoteSlaveConnection.mcastBuffer(RemoteSlaveConnection.java:201) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
      	at org.radargun.RemoteSlaveConnection.mcastObject(RemoteSlaveConnection.java:211) ~[radargun-core-3.0.0-SNAPSHOT.jar:?]
      	at org.radargun.RemoteSlaveConnection.release(RemoteSlaveConnection.java:357) [radargun-core-3.0.0-SNAPSHOT.jar:?]
      	at org.radargun.Master.run(Master.java:155) [radargun-core-3.0.0-SNAPSHOT.jar:?]
      	at org.radargun.LaunchMaster.main(LaunchMaster.java:34) [radargun-core-3.0.0-SNAPSHOT.jar:?]
      21:44:57,703 INFO  [org.radargun.ShutDownHook] (Thread-1) Master process is being shutdown
      Master 16071 finished with value 127
      kill: sending signal to 16071 failed: No such process
      kill: sending signal to 16071 failed: No such process
      

      The tests are related with HotRod during the reads operation in a replicated and distributed cache.

      The scenario is:
      8 - Servers
      8 - Slaves

      Commits:
      13/May - df9ffb5ba46752d2509aa3a08c59519469cc929a - Passed - Executed 3 times
      14/May - df9ffb5ba46752d2509aa3a08c59519469cc929a - Passed - Executed 3 times (I didn't executed this because it is the same of the above, just to keep the history)
      15/May - 26ba1aeb1d66cf65fb5c410ec98629093c29ab0b - Need to double check (It passed)
      16/May - 92a5e4f62c39d63221aed2ed5763081b626874e6 - Need to double check (It start failing here)

      Attachments

        1. round1.zip
          89 kB
        2. round2.zip
          89 kB
        3. Screenshot from 2018-05-30 13-21-32.png
          Screenshot from 2018-05-30 13-21-32.png
          217 kB
        4. Screenshot from 2018-05-30 13-25-09.png
          Screenshot from 2018-05-30 13-25-09.png
          242 kB

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei (Inactive)
              dlovison@redhat.com Diego Lovison
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: