Loading...

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: 7.1.0.DR4
Affects Version/s: 7.1.0.DR2
Component/s: Clustering
Labels:
None

CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.1.0.GA
Git Pull Request:
https://github.com/jbossas/jboss-eap7/pull/696

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Seen in http-session scenarios with DIST cache only. REPL cache scenarios seem not to be affected.

TimeoutExceptions occur right away after sending first request and definitely result in client getting 500:

[JBossINF] [0m[31m04:31:39,033 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-1) ISPN000136: Error executing command GetKeyValueCommand, writing keys []: org.infinispan.util.concurrent.TimeoutException: Replication timeout for perf19
[JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:801)
[JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
[JBossINF] 	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
[JBossINF] 	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
[JBossINF] 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[JBossINF] 	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
[JBossINF] 	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.staggeredProcessNext(CommandAwareRpcDispatcher.java:375)
[JBossINF] 	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.lambda$processCallsStaggered$3(CommandAwareRpcDispatcher.java:357)
[JBossINF] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[JBossINF] 	at java.lang.Thread.run(Thread.java:745)

Client:

2016/08/08 04:31:39:068 EDT [WARN ][Runner - 4] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error sampling data:  Internal Server Error>
        org.jboss.smartfrog.loaddriver.RequestProcessingException: Invalid response code: 500 Content: Internal Server Error
	at org.jboss.smartfrog.loaddriver.http.HttpRequestProcessorFactoryImpl$HttpRequestProcessor.processRequest(HttpRequestProcessorFactoryImpl.java:163)
	at org.jboss.smartfrog.loaddriver.CompoundRequestProcessorFactoryImpl$CompoundRequestProcessor.processRequest(CompoundRequestProcessorFactoryImpl.java:52)
	at org.jboss.smartfrog.loaddriver.Runner.run(Runner.java:103)
	at java.lang.Thread.run(Thread.java:745)

I was able to reproduce it even with a very small load of 30 sessions.

Server link (run with 30 sessions):
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/mvinkler_eap-7x-failover-http-session-jvmkill-dist-sync/27/console-perf21/consoleText

Client link (run with 30 sessions):
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/mvinkler_eap-7x-failover-http-session-jvmkill-dist-sync/27/console-perf17/consoleText

This issue occurs under different conditions than ~~JBEAP-3779~~ (which is similar), that's why I opened new issue.
Number of occurrences of TimeoutExceptions is very high. The server logs are full of them.
This issue makes clustering with DIST cache unusable, thus giving blocker priority.

Also, there is a issue with shutting down the servers in the end of the test, I don't know if it can be related, but I am hitting it constantly in the http-session DIST scenarios.
The server shutdown usually gets stuck on this command:

[JBossINF] [0m[0m04:38:51,694 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-7) ISPN000082: Stopping the RpcDispatcher for channel web

EDIT:
Also ejb-remote tests with DIST cache are affected (even if only 20 sessions are created).

Server log for ejb-remote run:
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/mvinkler_eap-7x-failover-ejb-ejbremote-shutdown-dist-async/3/console-perf20/

Client log for ejb-remote run:
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/mvinkler_eap-7x-failover-ejb-ejbremote-shutdown-dist-async/3/console-perf17/

EDIT #2:
Can be very easily reproduced locally. Just start a cluster of at least 3 servers with clusterbench deployed. Then do few requests to various nodes until you get 500 as a response. Also notice, when shutting down a node, which logged TimeoutException before, the node won't stop and get stuck during the shutdown.

causes

JBEAP-4398 ERROR MSC000001: Failed to start service jboss.clustering.registry.ejb.client-mappings due to TimeoutException: Replication timeout for <nodename>

Verified

is caused by

ISPN-6925 Race condition in staggered gets

Closed

is cloned by

WFLY-6926 TimeoutException: Replication timeout when handling request with DIST cache

Closed

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates