Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: 5.1.2.FINAL
Component/s: State Transfer
Labels:
None

Git Pull Request:
https://github.com/infinispan/infinispan/pull/984
Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=786202

This could be categorized as a performance problem.

It happened in resilience test run: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/23
originally to verify ~~ISPN-1826~~
It was run with infinispan special build from Galder's branch (https://github.com/galderz/infinispan/tree/t_1826_5)
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-build-infinispan-from-source/45/

test starts 4 nodes, kills node2, starts node2 and sees what happens
trace logging on server side was on. there were two runs

200 clients, 10K entries
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/23

20 clients, 1K entries
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/24

in run 24 everyting looks nice:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/24/artifact/report/stats-throughput.png
in run 23 the state transfer takes forever (more than 10 min)

these important views are installed on coordinator (node03):

2012-02-02 05:11:00,560 TRACE [BaseStateTransferManagerImpl] (transport-thread-1) Received new cache view: testCache CacheView{viewId=6, members=[edg-perf04-45788, edg-perf03-36944, edg-perf02-51026, edg-perf01-47003]}
2012-02-02 05:15:13,591 TRACE [BaseStateTransferManagerImpl] (transport-thread-9) Received new cache view: testCache CacheView{viewId=7, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003]}
2012-02-02 05:18:17,219 TRACE [BaseStateTransferManagerImpl] (transport-thread-1) Received new cache view: testCache CacheView{viewId=8, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]}
2012-02-02 05:28:17,511 TRACE [BaseStateTransferManagerImpl] (transport-thread-22) Received new cache view: testCache CacheView{viewId=10, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]}

viewId=8 is the one that takes 10 min to prepare and after that the prepare fails:

2012-02-02 05:28:17,219 ERROR [CacheViewsManagerImpl] (CacheViewInstaller-9,edg-perf03-36944) ISPN000172: Failed to prepare view CacheView{viewId=8, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]} for cache  testCache, ro..
java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
	at java.util.concurrent.FutureTask.get(FutureTask.java:91)
	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:319)
	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
	at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:877)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

viewId=10 is a retry and that succeeds quite quickly but the test is already ending about that time.

It might be worth looking at the tracelogs since they're already there...

10K entries and 200 clients isn't such a big load ...

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

apply_state.log
6 kB
2012/02/05 1:15 PM
apply_state.txt
5 kB
2012/02/14 10:23 AM
retransmissions.txt
3 kB
2012/02/14 10:23 AM
uuperf-tcp.txt
1 kB
2012/02/15 2:19 AM
uuperf-udp.txt
1 kB
2012/02/15 2:19 AM
uuperf-unicast1.txt
1 kB
2012/02/15 3:36 AM
dan.xml
4 kB
2012/02/17 5:06 AM

is blocked by

JGRP-1428 UnicastRequest and GroupRequest should mark a target as suspected if the target has already left the cluster at creation time

Resolved

relates to

ISPN-1872 Coordinator hangs when cache is loaded to it and l1cache enabled in cluster

Closed

ISPN-1933 State transfer in REPL mode takes more than 10 min

Closed

ISPN-1878 Remove the NO_FC flag in the CommandAwareRpcDispatcher

Closed

ISPN-1879 Update UNICAST to UNICAST2 and add RSVP message flag to state transfer RPCs

Closed

Assignee:: Dan Berindei (Inactive)

Reporter:: Michal Linhard (Inactive)

Archiver:: Amol Dongare

Created:: 2012/02/02 1:07 PM

Updated:: 2020/09/14 5:34 AM

Resolved:: 2012/04/13 10:31 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty

Hide