-
Bug
-
Resolution: Done
-
Major
-
jboss-fuse-6.2.1
-
None
-
%
-
-
This problem seems to be related to ENTESB-6254; however that bug was verified as being fixed in 6.2.1 R7 (I checked it myself), whilst this current problem is reproducible in R7. So either ENTESB-6254 was not fully fixed, or we have discovered a new way to elicit a very similar-looking failure.
The problem I can reproduce is one in which the Fabric8 MQ gateway does not realize that there are brokers available after a network outage, even though one of the brokers is master and the other slave, and both are reachable on the network. It seems that the outage has to be sufficient to bring about a sub-quorum state in the ZK ensemble, so that neither broker is master for some time. However, it's possible that other ZK events may have a similar effect.
In fact, I rather suspect that there are similar, but distinct, modes of failure, depending on exactly where in the topology the outage occurs, and what ZK roles each node is playing at the time. I have logs from the customer showing a situation where both brokers get stuck as slaves after the outage is resolved – but I have not so far been able to reproduce that.
In all cases the practical impact is that after an outage, the gateway/broker system does not recover, and manual action always seems to be necessary to restore service.
- relates to
-
ENTESB-6845 if an ensemble server is stopped, the gateway will intermittently send traffic to the wrong broker
-
- Closed
-