Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-26490

[GSS](8.0.z) "AMQ229014: Did not receive data from invm:0 within the -1ms connection TTL" occurs due to a race condition

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 8.0 Update 2
    • 7.4.14.GA
    • JMS
    • None
    • False
    • None
    • False
    • +
    • Known Issue
    • Hide

      A workaround is to set "connection-ttl" to a long enough time that never expires in <connection-factory> for in-vm. For example, connection-ttl="31536000000" (1 year):

      <connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm" connection-ttl="31536000000"/>
      

      Note: "connection-ttl" can accept "long" value. However, since the timeout checker makes comparisons with "long >= long + long" as in "if (now >= entry.lastCheck + entry.ttl)" where "now" and "lastCheck" come from System.currentTimeMillis(), the right side will overflow when "connection-ttl" is huge (e.g. close to Long.MAX_VALUE), and the connection timeout always happens...
      Therefore, "connection-ttl" should be a reasonably long time (large enough that never expires for long-running but small enough to be safe from overflow) for workaround.

      Show
      A workaround is to set "connection-ttl" to a long enough time that never expires in <connection-factory> for in-vm. For example, connection-ttl="31536000000" (1 year): <connection-factory name= "InVmConnectionFactory" entries= "java:/ConnectionFactory" connectors= "in-vm" connection-ttl= "31536000000" /> Note: "connection-ttl" can accept "long" value. However, since the timeout checker makes comparisons with "long >= long + long" as in "if (now >= entry.lastCheck + entry.ttl)" where "now" and "lastCheck" come from System.currentTimeMillis(), the right side will overflow when "connection-ttl" is huge (e.g. close to Long.MAX_VALUE), and the connection timeout always happens... Therefore, "connection-ttl" should be a reasonably long time (large enough that never expires for long-running but small enough to be safe from overflow) for workaround.
    • Hide

      See README.md in the attached helloworld-jms.zip

      Show
      See README.md in the attached helloworld-jms.zip

      The following WARN "AMQ212037: Connection failure to invm:0 has been detected: AMQ229014: Did not receive data from invm:0 within the -1ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]" can occur due to a race condition issue between the initialization of ConnectionEntry (org.apache.activemq.artemis.spi.core.protocol.ConnectionEntry) by application and the periodic check by RemotingServiceImpl$FailureCheckAndFlushThread.

      WARN  [org.apache.activemq.artemis.core.client] (Thread-7 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl@2c1c0981)) AMQ212037: Connection failure to invm:0 has been detected: AMQ229014: Did not receive data from invm:0 within the -1ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
      

      Also, the following ERROR message due to "ActiveMQNotConnectedException[errorType=NOT_CONNECTED message=AMQ219010: Connection is destroyed]" can happen at the same time:

      ERROR [io.undertow.request] (default task-1) UT005023: Exception handling request to /helloworld-jms/producer: javax.jms.JMSRuntimeException: AMQ219010: Connection is destroyed
        at org.apache.activemq.artemis.jms.client.JmsExceptionUtils.convertToRuntimeException(JmsExceptionUtils.java:88)
        at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createContext(ActiveMQConnectionFactory.java:326)
        at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createContext(ActiveMQConnectionFactory.java:309)
        at org.wildfly.extension.messaging.activemq.deployment.injection.AbstractJMSContext.createContext(AbstractJMSContext.java:60)
        at org.wildfly.extension.messaging.activemq.deployment.injection.AbstractJMSContext.getContext(AbstractJMSContext.java:48)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.jboss.weld.bean.proxy.AbstractBeanInstance.invoke(AbstractBeanInstance.java:38)
        at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:106)
        at org.wildfly.extension.messaging.activemq.deployment.injection.RequestedJMSContext$Proxy$_$$_WeldClientProxy.getContext(Unknown Source)
        at org.wildfly.extension.messaging.activemq.deployment.injection.InjectedJMSContext.getDelegate(InjectedJMSContext.java:92)
        at org.wildfly.extension.messaging.activemq.deployment.injection.JMSContextWrapper.createProducer(JMSContextWrapper.java:66)
        at org.jboss.as.quickstarts.servlet.JMSProducerServlet.doGet(JMSProducerServlet.java:102)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
        at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:74)
        at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62)
        at io.undertow.servlet.handlers.ServletChain.handleRequest(ServletChain.java:68)
        at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36)
        at org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(SecurityContextAssociationHandler.java:78)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
        at io.undertow.servlet.handlers.RedirectDirHandler.handleRequest(RedirectDirHandler.java:68)
        at io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:117)
        at io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:57)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
        at io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:46)
        at io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:64)
        at io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:60)
        at io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:77)
        at io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(NotificationReceiverHandler.java:50)
        at io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(AbstractSecurityContextAssociationHandler.java:43)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
        at org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(JACCContextIdHandler.java:61)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
        at org.wildfly.extension.undertow.deployment.GlobalRequestControllerHandler.handleRequest(GlobalRequestControllerHandler.java:68)
        at io.undertow.servlet.handlers.SendErrorPageHandler.handleRequest(SendErrorPageHandler.java:52)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
        at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:275)
        at io.undertow.servlet.handlers.ServletInitialHandler.access(ServletInitialHandler.java:79)
        at io.undertow.servlet.handlers.ServletInitialHandler.call(ServletInitialHandler.java:134)
        at io.undertow.servlet.handlers.ServletInitialHandler.call(ServletInitialHandler.java:131)
        at io.undertow.servlet.core.ServletRequestContextThreadSetupAction.call(ServletRequestContextThreadSetupAction.java:48)
        at io.undertow.servlet.core.ContextClassLoaderSetupAction.call(ContextClassLoaderSetupAction.java:43)
        at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create(SecurityContextThreadSetupAction.java:105)
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create(UndertowDeploymentInfoService.java:1555)
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create(UndertowDeploymentInfoService.java:1555)
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create(UndertowDeploymentInfoService.java:1555)
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create(UndertowDeploymentInfoService.java:1555)
        at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:255)
        at io.undertow.servlet.handlers.ServletInitialHandler.access(ServletInitialHandler.java:79)
        at io.undertow.servlet.handlers.ServletInitialHandler.handleRequest(ServletInitialHandler.java:100)
        at io.undertow.server.Connectors.executeRootHandler(Connectors.java:393)
        at io.undertow.server.HttpServerExchange.run(HttpServerExchange.java:852)
        at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
        at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990)
        at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
        at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
        at org.xnio.XnioWorker$WorkerThreadFactory.run(XnioWorker.java:1282)
        at java.lang.Thread.run(Thread.java:750)
      Caused by: javax.jms.JMSException: AMQ219010: Connection is destroyed
        at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:465)
        at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:439)
        at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQClientProtocolManager.createSessionContext(ActiveMQClientProtocolManager.java:300)
        at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQClientProtocolManager.createSessionContext(ActiveMQClientProtocolManager.java:249)
        at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createSessionChannel(ClientSessionFactoryImpl.java:1415)
        at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:712)
        at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createSession(ClientSessionFactoryImpl.java:318)
        at org.apache.activemq.artemis.jms.client.ActiveMQConnection.authorize(ActiveMQConnection.java:684)
        at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnectionInternal(ActiveMQConnectionFactory.java:920)
        at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createContext(ActiveMQConnectionFactory.java:321)
        ... 59 more
      Caused by: ActiveMQNotConnectedException[errorType=NOT_CONNECTED message=AMQ219010: Connection is destroyed]
        ... 69 more
      

      First, the above WARN is output by org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread (activemq-failure-check-thread) at line 780. Note that it has the following check "if (entry.ttl != -1)" at line 741. So, basically in-vm connection (ttl = -1) should not fail. However, somehow, in-vm connection pass the check and goes to line 744 in problematic case.

      org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread
      728       @Override
      729       public void run() {
      730          while (!closed) {
      731             try {
      732                long now = System.currentTimeMillis();
      733 
      734                Set<Pair<Object, Long>> toRemove = new HashSet<>();
      735 
      736                for (ConnectionEntry entry : connections.values()) {
      737                   final RemotingConnection conn = entry.connection;
      738 
      739                   boolean flush = true;
      740 
      741                   if (entry.ttl != -1) {
      742                      if (!conn.checkDataReceived()) {
      743                         if (now >= entry.lastCheck + entry.ttl) {
      744                            toRemove.add(new Pair<>(conn.getID(), entry.ttl));  // somehow, in-vm connection goes here in problematic case...
      745 
      746                            flush = false;
      747                         }
      748                      } else {
      749                         entry.lastCheck = now;
      750                      }
      751                   }
       :
      770 
      771                for (final Pair<Object, Long> pair : toRemove) {
      772                   final RemotingConnection conn = getConnection(pair.getA());
      773                   if (conn != null) {
      774                      // In certain cases (replicationManager for instance) calling fail could take some time
      775                      // We can't pause the FailureCheckAndFlushThread as that would lead other clients to fail for
      776                      // missing pings
      777                      flushExecutor.execute(new Runnable() {
      778                         @Override
      779                         public void run() {
      780                            conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(), pair.getB())); // <-- the WARN is output here
      781                         }
      782                      });
      783                      removeConnection(pair.getA());
      784                   }
      785                }
      

      I've found that the ConnectionEntry (org.apache.activemq.artemis.spi.core.protocol.ConnectionEntry) is created a bit wierdly. ConnectionEntry's ttl is always initiallized with ActiveMQClient.DEFAULT_CONNECTION_TTL (60000 milliseconds) at CoreProtocolManager#createConnectionEntry() even if it's in-vm conection. Then, the ttl is updated by ConnectionTTL in Ping by client later.

      org.apache.activemq.artemis.core.protocol.core.impl.CoreProtocolManager#createConnectionEntry()
      125    @Override
      126    public ConnectionEntry createConnectionEntry(final Acceptor acceptorUsed,
      127                                                 final Connection connection) {
      128       final Configuration config = server.getConfiguration();
      129 
      130       Executor connectionExecutor = server.getExecutorFactory().getExecutor();
      131 
      132       final CoreRemotingConnection rc = new RemotingConnectionImpl(new ServerPacketDecoder(server.getStorageManager()),
      133                                                                    connection, incomingInterceptors, outgoingInterceptors, server.getNodeID(),
      134                                                                    connectionExecutor);
      135 
      136       Channel channel1 = rc.getChannel(CHANNEL_ID.SESSION.id, -1);
      137 
      138       ChannelHandler handler = new ActiveMQPacketHandler(this, server, channel1, rc);
      139 
      140       channel1.setHandler(handler);
      141 
      !142       long ttl = ActiveMQClient.DEFAULT_CONNECTION_TTL; // <-- ConnectionEntry ttl is always initialized with 60000 millseconds even if it's in-vm conection
      144       if (config.getConnectionTTLOverride() != -1) {
      145          ttl = config.getConnectionTTLOverride();
      146       }
      147 
      148       final ConnectionEntry entry = new ConnectionEntry(rc, connectionExecutor,
      149                                                         System.currentTimeMillis(), ttl);
      150 
      151       final Channel channel0 = rc.getChannel(ChannelImpl.CHANNEL_ID.PING.id, -1);
      152 
      153       channel0.setHandler(new LocalChannelHandler(config, entry, channel0, acceptorUsed, rc));
      154 
      155       server.getClusterManager()
      156          .addClusterChannelHandler(rc.getChannel(CHANNEL_ID.CLUSTER.id, -1), acceptorUsed, rc,
      157                                    server.getActivation());
      158 
      159       final Channel federationChannel =  rc.getChannel(CHANNEL_ID.FEDERATION.id, -1);
      160       federationChannel.setHandler(new LocalChannelHandler(config, entry, channel0, acceptorUsed, rc));
      161 
      162       return entry;
      163    }
       :
       :
      251    private class LocalChannelHandler implements ChannelHandler {
       :
      271       @Override
      272       public void handlePacket(final Packet packet) {
      273          if (packet.getType() == PacketImpl.PING) {
      274             Ping ping = (Ping) packet;
      275 
      276             if (config.getConnectionTTLOverride() == -1) {
      277                // Allow clients to specify connection ttl
      !278                entry.ttl = ping.getConnectionTTL();  // <-- ConnectionEntry ttl is updated by ConnectionTTL in Ping by client.
      279             }
      280 
      281             // Just send a ping back
      282             channel0.send(packet);
      

      So, this can cause a race condition issue like the following order:

      1. ConnectionEntry is initilalized with the default ActiveMQClient.DEFAULT_CONNECTION_TTL (60000 milliseconds) at CoreProtocolManager#createConnectionEntry()

      2. (activemq-failure-check-thread) RemotingServiceImpl$FailureCheckAndFlushThread run 2000 milliseconds periodically.
      As the ConnectionEntry ttl is 60000 milliseconds, it passess the check of "if (entry.ttl != -1)" at line 741.

      3. Ping is triggered. Then, ConnectionEntry ttl is updated to ActiveMQClient.DEFAULT_CONNECTION_TTL_INVM (-1) by ConnectionTTL in the Ping.

      4. (activemq-failure-check-thread) RemotingServiceImpl$FailureCheckAndFlushThread checks "if (now >= entry.lastCheck + entry.ttl)".
      As "ttl" has been updated to -1 here, the check pass (= expired) and the connection will be added to "toRemove".

      5. The WARN and ERROR occur.

      As this is a race condition issue, it is very hard to reproduce. However, I could figure out to reproduce this issue with the byteman rule utilizing "waitFor" and "signalWake" locally. I've attached an example application helloworld-jms.zip with the byteman rule (byteman-script.btm in helloworld-jms.zip).

        1. artemis-invm.patch
          3 kB
          Petr Beran

              istudens@redhat.com Ivo Studensky
              pberan@redhat.com Petr Beran
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: