-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
Documentation (Ref Guide, User Guide, etc.)
Is there a documentation that states how idle timeouts and heartbeats should work, with AMQP and also other protocols and client libraries?
As far as I can tell, this is documented at https://activemq.apache.org/artemis/docs/1.2.0/connection-ttl.html but it does not go into details.
Looking at the pull request, it seems to me in Artemis it works the same way as in RabbitMQ. In RabbitMQ, heartbeats are sent every idle_timeout / 2 and if two consecutive heartbeats are missed, it is considered a failure. https://www.rabbitmq.com/heartbeats.html
In Qpid it worked the other way. Heartbeats are sent every idle_timeout seconds and if two consecutive are missed (after idle_timeout * 2), server terminates the connection. Some qpid-proton clients advertise to the server that their idle_timeout is 1/2 of what the programmer set it to be, some do not. https://bugzilla.redhat.com/show_bug.cgi?id=1151446
According to AMQP 1.0 specification (2.4.5 Idle Timeout Of A Connection), connection is closed if no data are sent for idle_timeout milliseconds. "To avoid spurious timeouts, the value in idle-time-out SHOULD be half the peer’s actual timeout threshold." Heartbeats (empty frames) may be used, but specification does not say how often to send them.
Some sort of heartbeat feature in WebSphere is configured using IBM_CS_FD_PERIOD and IBM_CS_FD_CONSECUTIVE_MISSED. Heartbeat is sent every IBM_CS_FD_PERIOD and if more than IBM_CS_FD_CONSECUTIVE_MISSED heartbeats are not received, it is considered a failure. https://www.ibm.com/support/knowledgecenter/SSTVLU_8.5.0/com.ibm.websphere.extremescale.doc/txsfailover.html
Q1: I am confused regarding all the /2 and *2. Should clients advertise 1/2 of their timeout or 1x?
Q2: How are we (QA) supposed to test ARTEMIS-143 expose AMQP heartbeat functionality?
Q3: Are we supposed to call it "idle timeout" or "connection TTL?" Linked Artemis documentation uses the latter, yet it seems to me that TTL is usually a different concept measured in "hops" (as in TCP). not in seconds.
Q4: More terminology. Why does not Artemis doc use the term "heartbeat"? I guess it could be also called a "keep-alive".
- is related to
-
AMQDOC-2107 QE feedback: Document details for how AMQ handles timeouts and heartbeats
- Closed
- follows up on
-
ARTEMIS-143 Loading...