-
Bug
-
Resolution: Done
-
Minor
-
None
The RA's configuration to avoid continuous reconnections [1] works fine and we end up with one TX Recovery thread at the RA level, instead of one per connection.
The only problem I see in a network of brokers is that if the node to which the RA is connected goes down, then the Recovery thread remains stuck with an invalid connection, unlike the failover transport which reconnects to another node. In this situation, we have no recovery active and a continuous stream of errors (every 2 min or configured recovery period) until the node is restarted.
2020-01-26 16:35:45,303 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016027: Local XARecoveryModule.xaRecovery got XA exception XAException.XAER_RMFAIL: javax.transaction.xa.XAException: The JMS connection has failed: java.io.EOFException at org.apache.activemq.TransactionContext.toXAException(TransactionContext.java:817) at org.apache.activemq.TransactionContext.recover(TransactionContext.java:710) at org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.recover(XAResourceWrapperImpl.java:185) at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass(XARecoveryModule.java:634) at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:226) at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:171) at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:770) at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382) Caused by: org.apache.activemq.ConnectionFailedException: The JMS connection has failed: java.io.EOFException at org.apache.activemq.ActiveMQConnection.checkClosedOrFailed(ActiveMQConnection.java:1443) at org.apache.activemq.TransactionContext.recover(TransactionContext.java:698) ... 6 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:259) at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196) at java.lang.Thread.run(Thread.java:748)
This is not a huge problem in practice, because the broker would be restored eventually, but I was wondering if we could make the Recovery thread to check and refresh the connection in some way.