-
Bug
-
Resolution: Done
-
Blocker
-
7.1.0.DR2
-
None
Seen in http-session scenarios with DIST cache only. REPL cache scenarios seem not to be affected.
TimeoutExceptions occur right away after sending first request and definitely result in client getting 500:
[JBossINF] [0m[31m04:31:39,033 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-1) ISPN000136: Error executing command GetKeyValueCommand, writing keys []: org.infinispan.util.concurrent.TimeoutException: Replication timeout for perf19 [JBossINF] at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:801) [JBossINF] at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642) [JBossINF] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) [JBossINF] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) [JBossINF] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [JBossINF] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) [JBossINF] at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.staggeredProcessNext(CommandAwareRpcDispatcher.java:375) [JBossINF] at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.lambda$processCallsStaggered$3(CommandAwareRpcDispatcher.java:357) [JBossINF] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [JBossINF] at java.lang.Thread.run(Thread.java:745)
Client:
2016/08/08 04:31:39:068 EDT [WARN ][Runner - 4] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error sampling data: Internal Server Error> org.jboss.smartfrog.loaddriver.RequestProcessingException: Invalid response code: 500 Content: Internal Server Error at org.jboss.smartfrog.loaddriver.http.HttpRequestProcessorFactoryImpl$HttpRequestProcessor.processRequest(HttpRequestProcessorFactoryImpl.java:163) at org.jboss.smartfrog.loaddriver.CompoundRequestProcessorFactoryImpl$CompoundRequestProcessor.processRequest(CompoundRequestProcessorFactoryImpl.java:52) at org.jboss.smartfrog.loaddriver.Runner.run(Runner.java:103) at java.lang.Thread.run(Thread.java:745)
I was able to reproduce it even with a very small load of 30 sessions.
Server link (run with 30 sessions):
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/mvinkler_eap-7x-failover-http-session-jvmkill-dist-sync/27/console-perf21/consoleText
Client link (run with 30 sessions):
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/mvinkler_eap-7x-failover-http-session-jvmkill-dist-sync/27/console-perf17/consoleText
This issue occurs under different conditions than JBEAP-3779 (which is similar), that's why I opened new issue.
Number of occurrences of TimeoutExceptions is very high. The server logs are full of them.
This issue makes clustering with DIST cache unusable, thus giving blocker priority.
Also, there is a issue with shutting down the servers in the end of the test, I don't know if it can be related, but I am hitting it constantly in the http-session DIST scenarios.
The server shutdown usually gets stuck on this command:
[JBossINF] [0m[0m04:38:51,694 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-7) ISPN000082: Stopping the RpcDispatcher for channel web
EDIT:
Also ejb-remote tests with DIST cache are affected (even if only 20 sessions are created).
Server log for ejb-remote run:
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/mvinkler_eap-7x-failover-ejb-ejbremote-shutdown-dist-async/3/console-perf20/
Client log for ejb-remote run:
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/mvinkler_eap-7x-failover-ejb-ejbremote-shutdown-dist-async/3/console-perf17/
EDIT #2:
Can be very easily reproduced locally. Just start a cluster of at least 3 servers with clusterbench deployed. Then do few requests to various nodes until you get 500 as a response. Also notice, when shutting down a node, which logged TimeoutException before, the node won't stop and get stuck during the shutdown.
- causes
-
JBEAP-4398 ERROR MSC000001: Failed to start service jboss.clustering.registry.ejb.client-mappings due to TimeoutException: Replication timeout for <nodename>
- Verified
- is caused by
-
ISPN-6925 Race condition in staggered gets
- Closed
- is cloned by
-
WFLY-6926 TimeoutException: Replication timeout when handling request with DIST cache
- Closed