Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-1838

State transfer takes more than 10 minutes with only 10K entries.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 5.1.2.FINAL
    • Fix Version/s: None
    • Component/s: State Transfer
    • Labels:
      None

      Description

      This could be categorized as a performance problem.

      It happened in resilience test run: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/23
      originally to verify ISPN-1826
      It was run with infinispan special build from Galder's branch (https://github.com/galderz/infinispan/tree/t_1826_5)
      http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-build-infinispan-from-source/45/

      test starts 4 nodes, kills node2, starts node2 and sees what happens
      trace logging on server side was on. there were two runs

      200 clients, 10K entries
      http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/23

      20 clients, 1K entries
      http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/24

      in run 24 everyting looks nice:
      http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/24/artifact/report/stats-throughput.png
      in run 23 the state transfer takes forever (more than 10 min)

      these important views are installed on coordinator (node03):

      2012-02-02 05:11:00,560 TRACE [BaseStateTransferManagerImpl] (transport-thread-1) Received new cache view: testCache CacheView{viewId=6, members=[edg-perf04-45788, edg-perf03-36944, edg-perf02-51026, edg-perf01-47003]}
      2012-02-02 05:15:13,591 TRACE [BaseStateTransferManagerImpl] (transport-thread-9) Received new cache view: testCache CacheView{viewId=7, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003]}
      2012-02-02 05:18:17,219 TRACE [BaseStateTransferManagerImpl] (transport-thread-1) Received new cache view: testCache CacheView{viewId=8, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]}
      2012-02-02 05:28:17,511 TRACE [BaseStateTransferManagerImpl] (transport-thread-22) Received new cache view: testCache CacheView{viewId=10, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]}
      

      viewId=8 is the one that takes 10 min to prepare and after that the prepare fails:

      2012-02-02 05:28:17,219 ERROR [CacheViewsManagerImpl] (CacheViewInstaller-9,edg-perf03-36944) ISPN000172: Failed to prepare view CacheView{viewId=8, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]} for cache  testCache, ro..
      java.util.concurrent.TimeoutException
      	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:91)
      	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:319)
      	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
      	at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:877)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      

      viewId=10 is a retry and that succeeds quite quickly but the test is already ending about that time.

      It might be worth looking at the tracelogs since they're already there...

      10K entries and 200 clients isn't such a big load ...

        Attachments

        1. apply_state.log
          6 kB
        2. apply_state.txt
          5 kB
        3. dan.xml
          4 kB
        4. retransmissions.txt
          3 kB
        5. uuperf-tcp.txt
          1 kB
        6. uuperf-udp.txt
          1 kB
        7. uuperf-unicast1.txt
          1 kB

          Issue Links

            Activity

              People

              Assignee:
              dan.berindei Dan Berindei
              Reporter:
              mlinhard Michal Linhard (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: