Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-4931

[lts] Cannot connect if many core bridges for two-way TLS acceptor are defined

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Done
    • AMQ 7.8.0.GA
    • None
    • None
    • Hide
      • Define 30 core bridges for two-way TLS acceptor with client certificate based authentication on the source broker.
      • Create a two-way TLS acceptor with client certificate based authentication on the destination broker.
      • Use byteman to simulate wait (one second) in org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.getSslHandler
      • RULE delay for creating tls connection
        CLASS org.apache.activemq.artemis.core.remoting.impl.ssl.SSLSupport
        METHOD createContext
        AT ENTRY
        IF true
        DO traceln("[BYTEMAN] SSLSupport.createContext"); delay(1000); 
        ENDRULE
        
      Show
      Define 30 core bridges for two-way TLS acceptor with client certificate based authentication on the source broker. Create a two-way TLS acceptor with client certificate based authentication on the destination broker. Use byteman to simulate wait (one second) in org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.getSslHandler RULE delay for creating tls connection CLASS org.apache.activemq.artemis.core.remoting.impl.ssl.SSLSupport METHOD createContext AT ENTRY IF true DO traceln( "[BYTEMAN] SSLSupport.createContext" ); delay(1000); ENDRULE
    • Hide
      • Workaround 1) configuring more tls acceptors on the destination broker, and partitioning the core bridges across those acceptors

       

      • Workaround 2) using the OPENSSL provider instead of jdk provider.
        • The OPENSSL provider may be faster

       

      • Workaround 3) setting <retry-interval> and <max-retry-interval> to a large number
        • If set <retry-interval> and <max-retry-interval> to a large number, the queue of connection creation requests at the destination broker will be emptied repeatedly.
        • This means that each time the core bridge reconnects, several connections will succeed and eventually all connections will be connected.
        • There will be timeout exceptions and CLOSE_WAITs for a while until all connections are successful, but they will become fewer and fewer and they finally stop occurring.
        • <retry-interval>45000</retry-interval>
          <max-retry-interval>45000</max-retry-interval>
          

           Say that it will take one second to create a TLS connection, 30 core bridges are defined and <retry-interval>/<max-retry-interval> are set to 45000 , all the connections will be successful in about three minutes.

       

      Show
      Workaround 1) configuring more tls acceptors on the destination broker, and partitioning the core bridges across those acceptors   Workaround 2) using the OPENSSL provider instead of jdk provider. The OPENSSL provider may be faster   Workaround 3) setting <retry-interval> and <max-retry-interval> to a large number If set <retry-interval> and <max-retry-interval> to a large number, the queue of connection creation requests at the destination broker will be emptied repeatedly. This means that each time the core bridge reconnects, several connections will succeed and eventually all connections will be connected. There will be timeout exceptions and CLOSE_WAITs for a while until all connections are successful, but they will become fewer and fewer and they finally stop occurring. <retry-interval>45000</retry-interval> <max-retry-interval>45000</max-retry-interval>  Say that it will take one second to create a TLS connection, 30 core bridges are defined and <retry-interval>/<max-retry-interval> are set to 45000 , all the connections will be successful in about three minutes.   Wrokaround 4) using org.apache.activemq.artemis.core.remoting.impl.ssl.CachingSSLContextFactory Speeds up the connection process by caching the SSL related information. open the lib/artemis-core-client-2.16.0.redhat-00007.jar on the broker and modify /META-INF/services/org.apache.activemq.artemis.spi.core.remoting.ssl.SSLContextFactory file, and replace the file content to "org.apache.activemq.artemis.core.remoting.impl.ssl.CachingSSLContextFactory" For details, please refer to this document: http://activemq.apache.org/components/artemis/documentation/latest/configuring-transports.html#configuring-a-sslcontextfactory

    Description

      • [Problem] When many core bridges for two-way TLS acceptor with client certificate based authentication are defined, only some of them can connect.
        • For example:
          • If 30 core bridges are defined, only 10 core bridges can connect.
          • If 10 core bridges are defined, all the core bridges can connect.
      • [Cause]
        • To create a connection to connect to two-way TLS acceptor with client certificate based authentication, it can take about one second.
          • AMQ Broker executes a synchronized method to create a TLS connection[2], so it is not possible to create TLS connections in parallel, the connections are created sequentially.
          • You can confirm that many threads are blocked in org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.getSslHandler from the thread dump[2].
        • And a core bridge can wait only 10 seconds to create a TLS connection.
          • If it exceeds 10 seconds:
            • The following error occurs on the core bridge on the source broker.
              • AMQ214016: Failed to create netty connection: io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms[1]
              • and CLOSE_WAIT connections occur on the destination broker[3].
      • [For example]:
        • Say that it will take one second to create a TLS connection and 30 core bridges are defined.
          ==>
        • Only 10 core bridges can connect, but other 20 core bridges cannot connect.
          • The first 10 core bridges can be connected because the broker will respond within 10 seconds.
        • FYI, the re-connect feature of the core bridge works as expect, but useless.
          • Other than the first 10 core bridges, the remaining 20 core bridges that could not be connected are repeatedly trying to reconnect,
            • but they are always in the queue in the destination broker and it repeatedly takes more than 20 seconds for them to respond.
          • It may seem somewhat counterintuitive, but If 20 core bridges are defined, all the core bridges can connect,
            • because other than the first 10 core bridges, the remaining 10 core bridges can receive a response within 10 seconds by reconnecting after the timeout exception.
      •  [1] io.netty.handler.ssl.SslHandshakeTimeoutException
        • 2021-04-07 08:47:16,912 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection: io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms
                  at io.netty.handler.ssl.SslHandler$5.run(SslHandler.java:2062) [netty-all-4.1.51.Final-redhat-00001.jar:4.1.51.Final-redhat-00001]
                  at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-all-4.1.51.Final-redhat-00001.jar:4.1.51.Final-redhat-00001]
                  at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-all-4.1.51.Final-redhat-00001.jar:4.1.51.Final-redhat-00001]
                  at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-all-4.1.51.Final-redhat-00001.jar:4.1.51.Final-redhat-00001]
                  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-all-4.1.51.Final-redhat-00001.jar:4.1.51.Final-redhat-00001]
                  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-all-4.1.51.Final-redhat-00001.jar:4.1.51.Final-redhat-00001]
                  at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.51.Final-redhat-00001.jar:4.1.51.Final-redhat-00001]
                  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.51.Final-redhat-00001.jar:4.1.51.Final-redhat-00001]
                  at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.16.0.redhat-00007.jar:2.16.0.redhat-00007]
          
          
      • [3] CLOSE_WAIT connections on the destination broker
        • $ netstat -aon -p|grep java|sort
          tcp6       0      0 172.31.32.201:61617     18.179.5.82:43429       ESTABLISHED 21596/java           keepalive (7191.27/0/0)
          tcp6       0      0 172.31.32.201:61617     18.179.5.82:43430       ESTABLISHED 21596/java           keepalive (7191.27/0/0)
          tcp6       0      0 172.31.32.201:61617     18.179.5.82:43431       ESTABLISHED 21596/java           keepalive (7191.27/0/0)
          tcp6       0      0 172.31.32.201:61617     18.179.5.82:43435       ESTABLISHED 21596/java           keepalive (7191.27/0/0)
          tcp6       0      0 172.31.32.201:61617     18.179.5.82:43436       ESTABLISHED 21596/java           keepalive (7191.27/0/0)
          tcp6       0      0 172.31.32.201:61617     18.179.5.82:43437       ESTABLISHED 21596/java           keepalive (7191.27/0/0)
          tcp6       0      0 :::61616                :::*                    LISTEN      21596/java           off (0.00/0/0)
          tcp6       0      0 :::61618                :::*                    LISTEN      21596/java           off (0.00/0/0)
          tcp6       0      0 :::61619                :::*                    LISTEN      21596/java           off (0.00/0/0)
          tcp6       0      0 :::8161                 :::*                    LISTEN      21596/java           off (0.00/0/0)
          tcp6      35      0 :::61617                :::*                    LISTEN      21596/java           off (0.00/0/0)
          tcp6     191      0 172.31.32.201:61617     18.179.5.82:43428       CLOSE_WAIT  21596/java           keepalive (7191.27/0/0)
          tcp6     191      0 172.31.32.201:61617     18.179.5.82:43433       CLOSE_WAIT  21596/java           keepalive (7191.27/0/0)
          tcp6     191      0 172.31.32.201:61617     18.179.5.82:43434       CLOSE_WAIT  21596/java           keepalive (7191.27/0/0)
          tcp6     191      0 172.31.32.201:61617     18.179.5.82:43439       CLOSE_WAIT  21596/java           keepalive (7224.04/0/0)
          tcp6     191      0 172.31.32.201:61617     18.179.5.82:43440       CLOSE_WAIT  21596/java           keepalive (7224.04/0/0)
          tcp6     191      0 172.31.32.201:61617     18.179.5.82:43441       CLOSE_WAIT  21596/java           keepalive (7224.04/0/0)
          tcp6     191      0 172.31.32.201:61617     18.179.5.82:43442       CLOSE_WAIT  21596/java           keepalive (7224.04/0/0)
          tcp6     191      0 172.31.32.201:61617     18.179.5.82:43445       CLOSE_WAIT  21596/java           keepalive (7224.04/0/0)
          unix  2      [ ]         STREAM     CONNECTED     5799590  21596/java
          unix  2      [ ]         STREAM     CONNECTED     5799592  21596/java
          

      Attachments

        Issue Links

          Activity

            People

              gtully@redhat.com Gary Tully
              rhn-support-tyamashi Tomonari Yamashita
              Roman Vais Roman Vais
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: