-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
5.1.2.FINAL
-
None
This could be categorized as a performance problem.
It happened in resilience test run: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/23
originally to verify ISPN-1826
It was run with infinispan special build from Galder's branch (https://github.com/galderz/infinispan/tree/t_1826_5)
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-build-infinispan-from-source/45/
test starts 4 nodes, kills node2, starts node2 and sees what happens
trace logging on server side was on. there were two runs
200 clients, 10K entries
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/23
20 clients, 1K entries
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/24
in run 24 everyting looks nice:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/24/artifact/report/stats-throughput.png
in run 23 the state transfer takes forever (more than 10 min)
these important views are installed on coordinator (node03):
2012-02-02 05:11:00,560 TRACE [BaseStateTransferManagerImpl] (transport-thread-1) Received new cache view: testCache CacheView{viewId=6, members=[edg-perf04-45788, edg-perf03-36944, edg-perf02-51026, edg-perf01-47003]} 2012-02-02 05:15:13,591 TRACE [BaseStateTransferManagerImpl] (transport-thread-9) Received new cache view: testCache CacheView{viewId=7, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003]} 2012-02-02 05:18:17,219 TRACE [BaseStateTransferManagerImpl] (transport-thread-1) Received new cache view: testCache CacheView{viewId=8, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]} 2012-02-02 05:28:17,511 TRACE [BaseStateTransferManagerImpl] (transport-thread-22) Received new cache view: testCache CacheView{viewId=10, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]}
viewId=8 is the one that takes 10 min to prepare and after that the prepare fails:
2012-02-02 05:28:17,219 ERROR [CacheViewsManagerImpl] (CacheViewInstaller-9,edg-perf03-36944) ISPN000172: Failed to prepare view CacheView{viewId=8, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]} for cache testCache, ro..
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:319)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:877)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
viewId=10 is a retry and that succeeds quite quickly but the test is already ending about that time.
It might be worth looking at the tracelogs since they're already there...
10K entries and 200 clients isn't such a big load ...
- is blocked by
-
JGRP-1428 UnicastRequest and GroupRequest should mark a target as suspected if the target has already left the cluster at creation time
- Resolved
- relates to
-
ISPN-1872 Coordinator hangs when cache is loaded to it and l1cache enabled in cluster
- Closed
-
ISPN-1933 State transfer in REPL mode takes more than 10 min
- Closed
-
ISPN-1878 Remove the NO_FC flag in the CommandAwareRpcDispatcher
- Closed
-
ISPN-1879 Update UNICAST to UNICAST2 and add RSVP message flag to state transfer RPCs
- Closed