Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 7.1.0.DR17
Affects Version/s: 7.1.0.DR13
Component/s: JCA
Labels:
- KK-DR17

CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.1.0.GA
Steps to Reproduce:
Hide

To reproduce, you'll need two EAP instances. Both need to have a distributed workmanager configured:

batch /subsystem=jca/distributed-workmanager=newdwm:add(name=newdwm) /subsystem=jca/distributed-workmanager=newdwm/short-running-threads=newdwm:add(queue-length=10,max-threads=10) /subsystem=jca/bootstrap-context=customContext1:add(name=customContext1,workmanager=newdwm) run-batch reload

Then start both of the instances, for example:

bin/standalone.sh -c standalone-ha.xml -Djboss.node.name=node1 bin/standalone.sh -c standalone-ha.xml -Djboss.node.name=node2 -Djboss.socket.binding.port-offset=100 -Djboss.server.base.dir=standalone2

After the second instance is started, the first one will log the error.
Show
To reproduce, you'll need two EAP instances. Both need to have a distributed workmanager configured: batch /subsystem=jca/distributed-workmanager=newdwm:add(name=newdwm) /subsystem=jca/distributed-workmanager=newdwm/ short -running-threads=newdwm:add(queue-length=10,max-threads=10) /subsystem=jca/bootstrap-context=customContext1:add(name=customContext1,workmanager=newdwm) run-batch reload Then start both of the instances, for example: bin/standalone.sh -c standalone-ha.xml -Djboss.node.name=node1 bin/standalone.sh -c standalone-ha.xml -Djboss.node.name=node2 -Djboss.socket.binding.port-offset=100 -Djboss.server.base.dir=standalone2 After the second instance is started, the first one will log the error.

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

When starting two EAP instances with a distributed workmanager configured, the following exception is logged on the first instance ~6 seconds after the second instance starts:

16:11:24,204 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: JBoss EAP 7.1.0.Alpha1 (WildFly Core 3.0.0.Beta6-redhat-1) started in 5905ms - Started 467 of 700 services (472 services are lazy, passive or on-demand)
16:11:42,066 ERROR [org.jboss.jca.core.workmanager.transport.remote.jgroups.JGroupsTransport] (thread-2) ViewAccepted: org.jgroups.TimeoutException: timeout waiting for response from node2, request: UnicastRequest, mode=GET_ALL, target=node2: javax.resource.spi.work.WorkException: org.jgroups.TimeoutException: timeout waiting for response from node2, request: UnicastRequest, mode=GET_ALL, target=node2
	at org.jboss.jca.core.workmanager.transport.remote.jgroups.JGroupsTransport.sendMessage(JGroupsTransport.java:589)
	at org.jboss.jca.core.workmanager.transport.remote.jgroups.JGroupsTransport.viewAccepted(JGroupsTransport.java:943)
	at org.jgroups.blocks.MessageDispatcher.handleUpEvent(MessageDispatcher.java:618)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:666)
	at org.jgroups.JChannel.up(JChannel.java:738)
	at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:124)
	at org.jgroups.stack.Protocol.up(Protocol.java:380)
	at org.jgroups.protocols.FORK.up(FORK.java:118)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
	at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:727)
	at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange(CoordGmsImpl.java:225)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:917)
	at org.jgroups.stack.Protocol.up(Protocol.java:418)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:487)
	at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:989)
	at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:919)
	at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:851)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:611)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
	at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:325)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:292)
	at org.jgroups.protocols.Discovery.up(Discovery.java:296)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1657)
	at org.jgroups.protocols.TP$3.run(TP.java:1591)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.jboss.as.clustering.jgroups.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:52)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.jgroups.TimeoutException: timeout waiting for response from node2, request: UnicastRequest, mode=GET_ALL, target=node2
	at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:442)
	at org.jgroups.blocks.RpcDispatcher.callRemoteMethod(RpcDispatcher.java:241)
	at org.jboss.jca.core.workmanager.transport.remote.jgroups.JGroupsTransport.sendMessage(JGroupsTransport.java:579)
	... 31 more

Judging by the stacktrace, it looks like a view of both of the wms is never created and the two workmanagers never manage to communicate with each other. That would also explain why it looks like work is never done on a different node than where it's scheduled (even though there are proper selector and policy settings). I'll file another issue for that where I'll add a reproducing test.

Since it's a timeout exception, I've been trying to find out if the error is on my end (network issues), but it doesn't look like that - common clustering sessions, which use the JGroups stack too, are replicated properly.

is cloned by

WFLY-8617 Distributed workmanager fails to obtain view

Closed

is related to

JBEAP-10418 IronJacamar to 1.4.4 from 1.4.3

Closed

relates to

JBEAP-9422 Distributed workmanager does not execute work on other nodes than where the work was scheduled

Closed

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates