When the client is killed and the ttl is timeout, the failure check thread should detect this and release the connection. However, we run into a scenario when the failure check thread is throwing and NPE:
2:53:10,976 WARN [RemotingConnectionImpl] Connection failure has been detected: Did not receive data from /172.20.180.25:53289. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. You also might have configured connection-ttl and client-failure-check-period incorrectly.
until you got this:
13:37:44,156 ERROR [STDERR] Exception in thread "hornetq-failure-check-thread"
13:37:44,156 ERROR [STDERR] java.lang.NullPointerException
13:37:44,156 ERROR [STDERR] at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread.run(RemotingServiceImpl.java:537)
18:25:11,684 ERROR [NettyConnector] Failed to create netty connection
What could happen, if on a race the client could eventually connect to the server and remove the connection while the failure was also happening.
What killed the failure-check thread, hence nothing else would cleanup clients after this have happened.