-
Bug
-
Resolution: Done
-
Major
-
4.0.22
-
None
Apparently there is an issue in the FD protocol. When a cluster nodes is disconnected and the disconnect isn't handled by FD_SOCK, FD stops sending heartbeats after a while. This only happens when the queue of the TrasferQueueBundler fills up before the node is suspected.
The stack trace shows that the FD$Monitor is blocked by the bundler. This is probably the reason why the heartbeat timeouts are not noticed.