-
Bug
-
Resolution: Done
-
Blocker
-
JBoss A-MQ 6.0
-
None
-
None
When a zookeeper session is expired and a new session created from the container (as below) the container is listed as "active" in the container-list but the cluster-list does not list the broker running within that container.| 2013-06-22 00:23:27,246 | INFO| .40.26.207:2181) | ClientCnxn | .zookeeper.ClientCnxn$SendThread 1049 | 58 - org.fusesource.fabric.fabric-linkedin-zookeeper -|
7.2.0.redhat-024 | Unable to reconnect to ZooKeeper service, session 0x23f655f62400001 has expired, closing socket connection |
... shortly after a new session is created| 2013-06-22 00:23:27,586 | INFO| .40.26.209:2181) | ClientCnxn | .zookeeper.ClientCnxn$SendThread 1175 | 58 - org.fusesource.fabric.fabric-linkedin-zookeeper -|
7.2.0.redhat-024 | Session establishment complete on server <myip_address>:2181, sessionid = 0x23f655f6240001c, negotiated timeout = 30000 |
I am assuming this is because the ephemeral node for the broker cluster is not recreated when the zookeeper session is restarted after expiry. I think this behavior is problematic:1. potential loss of slave instances from the cluster groupzookeeper session expires on slave instanceephemeral zknode is removed as it is associated with that sessionnew zookeeper session is created but the ephemeral node is not recreated in the clusterthe instance will not be "discovered" as part of the mq-discovery mechanism as no node is registered in zookeeper2. potentially have two active brokers in the cluster group (two masters)zookeeper session expires on master instanceephemeral zknode is removed as it is associated with that sessionnew zookeeper session is created but the ephemeral node is not recreated in the clusterslave broker is promoted to masteroriginal master broker is still running (but is not listed in the group cluster).HOW TO REPLICATE
=============(scenario 1):
-----------------issue following karaf/fabric commands| 1.fabric:create|
2.fabric:mq-create --group mq_g50 --create-container child_1,child_2 my_mq_profile |
Assuming child_1 is the master; pause container child_2 for >30 seconds (using the "kill -17 PID" to pause and "kill -19 PID" to resume)| 3. container-list - will show child_2 container as active again (as expected)|
4. cluster-list - will show no reference to child_2 broker |
(Scenario 2)
--------------------
setup same as scenario 1 BUT
1. ensure the kahadb is not sharing the same master slave lock
2. pause master container rather than slave.
- is related to
-
ENTMQ-408 Need a solution for ENTMQ-382 on JBoss Fuse 6.0
- Closed