Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-8614

Clustering performance regression in read-heavy SYNC stress scenarios

    Details

      Description

      There is a performance regression in clustering stress tests testing performance of the cluster under increasing load (number of concurrent clients). All tests use a 4-node EAP cluster and 5 nodes that generate load.
      During heavyread tests majority of requests don't write to the session, only read. This should simulate real client behaviour.

      Scenarios affected:
      stress-heavyread-session-dist-sync
      stress-heavyread-session-repl-sync

      There is a difference in throughput graph between 7.1.0.DR15 and 7.1.0.DR16 in both scenarios. This issue is probably causing performance drop after the number of clients reached 4000 (in case of dist-sync scenario). There is also a significant number of sampling errors at that time (comparing to no sampling errors in 7.1.0.DR15).

      See the performance report for heavyread-session-dist-sync scenario:
      http://download.eng.brq.redhat.com/scratch/dcihak/2017-04-19_14-28-37/stress.html

      and for heavyread-session-repl-sync scenario:
      http://download.eng.brq.redhat.com/scratch/dcihak/2017-04-19_14-40-09/stress.html

      There are results of another run of the heavyread-session-dist-sync scenario:
      http://download.eng.brq.redhat.com/scratch/dcihak/2017-04-19_16-10-26/stress.html

      When number of sessions reached 4400 clients start logging following WARN message:

      2017/04/06 18:14:07:058 EDT [WARN ][Runner - 15] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:c - Error sampling data:  <org.jboss.smartfrog.loaddriver.RequestProcessingException: Invalid response code: 503 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
      <html><head>
      <title>503 Service Temporarily Unavailable</title>
      </head><body>
      <h1>Service Temporarily Unavailable</h1>
      <p>The server is temporarily unable to service your
      request due to maintenance downtime or capacity
      problems. Please try again later.</p>
      <hr>
      <address>Apache/2.2.26 (@VENDOR@) Server at dev224 Port 8080</address>
      </body></html>
      >
      

      Link to client log:
      http://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/job/eap-7x-stress-heavyread-session-dist-sync/41/console-dev224/

      At the same time server starts logging:

      33m18:13:18,784 WARN  [org.jgroups.protocols.pbcast.GMS] (thread-13) dev213: not member of view [dev212|4]; discarding it
      [JBossINF] [0m[31m18:13:18,886 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-75) ISPN000136: Error executing command PrepareCommand, writing keys [SessionAccessMetaDataKey(fS0csQoHO532jD_tHe4XmfRoPQkCI0gg6xv0tvp-), SessionCreationMetaDataKey(fS0csQoHO532jD_tHe4XmfRoPQkCI0gg6xv0tvp-)]: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from dev212, see cause for remote stack trace
      [JBossINF] 	at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:44)
      [JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:821)
      [JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:648)
      [JBossINF] 	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
      [JBossINF] 	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
      [JBossINF] 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
      [JBossINF] 	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
      [JBossINF] 	at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:40)
      [JBossINF] 	at org.jgroups.blocks.Request.checkCompletion(Request.java:152)
      [JBossINF] 	at org.jgroups.blocks.GroupRequest.receiveResponse(GroupRequest.java:115)
      [JBossINF] 	at org.jgroups.blocks.RequestCorrelator.dispatch(RequestCorrelator.java:427)
      [JBossINF] 	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:357)
      [JBossINF] 	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:245)
      [JBossINF] 	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:664)
      [JBossINF] 	at org.jgroups.JChannel.up(JChannel.java:738)
      [JBossINF] 	at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:120)
      [JBossINF] 	at org.jgroups.stack.Protocol.up(Protocol.java:380)
      [JBossINF] 	at org.jgroups.protocols.FORK.up(FORK.java:114)
      [JBossINF] 	at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
      [JBossINF] 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
      [JBossINF] 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
      [JBossINF] 	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1037)
      [JBossINF] 	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
      [JBossINF] 	at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
      [JBossINF] 	at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
      [JBossINF] 	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
      [JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:649)
      [JBossINF] 	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
      [JBossINF] 	at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
      [JBossINF] 	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:325)
      [JBossINF] 	at org.jgroups.protocols.MERGE3.up(MERGE3.java:292)
      [JBossINF] 	at org.jgroups.protocols.Discovery.up(Discovery.java:296)
      [JBossINF] 	at org.jgroups.protocols.TP.passMessageUp(TP.java:1657)
      [JBossINF] 	at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1872)
      [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      [JBossINF] 	at org.jboss.as.clustering.jgroups.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:52)
      [JBossINF] 	at java.lang.Thread.run(Thread.java:745)
      [JBossINF] Caused by: org.infinispan.commons.CacheException: ISPN000332: Remote transaction GlobalTransaction:<dev213>:7915312:remote rolled back because originator is no longer in the cluster
      [JBossINF] 	at org.infinispan.interceptors.TxInterceptor.verifyRemoteTransaction(TxInterceptor.java:518)
      [JBossINF] 	at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:161)
      [JBossINF] 	at org.infinispan.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:145)
      [JBossINF] 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)
      [JBossINF] 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      [JBossINF] 	at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)
      [JBossINF] 	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
      [JBossINF] 	at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitPrepareCommand(TransactionSynchronizerInterceptor.java:39)
      [JBossINF] 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)
      [JBossINF] 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      [JBossINF] 	at org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:229)
      [JBossINF] 	at org.infinispan.statetransfer.StateTransferInterceptor.visitPrepareCommand(StateTransferInterceptor.java:87)
      [JBossINF] 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)
      [JBossINF] 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      [JBossINF] 	at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)
      [JBossINF] 	at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:79)
      [JBossINF] 	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
      [JBossINF] 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)
      [JBossINF] 	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      [JBossINF] 	at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)
      [JBossINF] 	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
      [JBossINF] 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)
      [JBossINF] 	at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:335)
      [JBossINF] 	at org.infinispan.commands.tx.PrepareCommand.perform(PrepareCommand.java:100)
      [JBossINF] 	at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokePerform(BasePerCacheInboundInvocationHandler.java:92)
      [JBossINF] 	at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:34)
      [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      [JBossINF] 	at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
      [JBossINF] 	... 1 more
      

      Link to server log:
      http://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/job/eap-7x-stress-heavyread-session-dist-sync/41/console-dev213/

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  pferraro Paul Ferraro
                  Reporter:
                  pferraro Paul Ferraro
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: