-
Bug
-
Resolution: Obsolete
-
Major
-
JBossAS-3.2.6 Final
-
None
SourceForge Submitter: iankenn .
Original posting on JBoss.org Clustering forum:
Hi
I'm currently developing a system which uses JMS
queuing for async processing of messages. I'm looking
at deploying to a cluster of two JBoss 3.2.3 servers to
provide some level of fail-over/resilience.
During testing of the JMS fail-over I've tried killing
one of the JBoss instances (the one running the JMS
server) and see that the JMS queues are migrated to the
other node. But when I tried to simulate a temporary
loss of network connectivity between the two machines
(by removing one of the network cables and then
replacing it) the cluster seems to break and both
machines start to run the JMS queues.
When the network cable is reconnected, neither node
appear to know that there is another node in the same
partition. Effectively the cluster is not
re-established. The only way to make the two nodes see
each other again is to restart one of the nodes. Is
there something that I have miss-configured/not
configured, I am new to clustering and would appreciate
some advice. - I am currently testing on two windows
machines but intend to deploy to Linux boxes.
Thanks,
Ian
See posting
http://www.jboss.org/index.html?module=bb&op=viewtopic&t=45901
Configuration (both machines)
OS: Windows 2000
JDK: 1.4.2_03
JBoss: 3.2.3
The attached zip contains the cluster.log files for
both servers:
Node 'A' - Node_A_cluster.log
Node 'B' - Node_B_cluster.log
Steps
1. Turn on logging for clustering in /conf/log4j.xml
2. Start JBoss on Node 'A'
3. Start JBoss on Node 'B'
4. Deploy EAR to farm dir on Node 'A''
This is farmed to Node 'B'
5. Submit Msg to Node 'A' (Http request to application)
6. Submit Msg to Node 'B' (Http request to application)
7. Look at the HAILSharedState ServerAddress for the
JBoss MQ on the jmx-console - this shows the IP address
of Node 'A' on both nodes.
8. Remove network cable from Node 'A'
9. The following messages are displayed in the console:
Node 'A'
10:40:53,921 INFO [DefaultPartition] New cluster view
(id: 2, delta: -1) : [192.168.0.34:1099]
10:40:53,921 INFO [DefaultPartition:ReplicantManager]
Dead members: 1
10:40:58,015 INFO [DefaultPartition] Suspected member:
wizcom-desk01:4950 (additional data: 17 byte
s)
Node 'B'
10:40:53,376 INFO [DefaultPartition] New cluster view
(id: 2, delta: -1) : [192.168.0.46:1099]
10:40:53,376 INFO [DefaultPartition:ReplicantManager]
Dead members: 1
10:40:53,516 INFO [HAILServerILService] Notified to
become singleton
10. The jmx-console on Node 'B' now shows it's own IP
address as the HAILSharedState ServerAddress.
11. The jmx-console on Node 'A' still shows it's own IP
address as the HAILSharedState ServerAddress.
11. Reconnect the network cable to Node 'A'
12. The following message appears in the console:
Node 'A'
10:45:05,171 INFO [DefaultPartition] New cluster view
(id: 3, delta: 1) : [192.168.0.34:1099, 192.168.0.46:1099]
10:45:05,171 INFO [DefaultPartition:ReplicantManager]
Merging partitions...
10:45:05,171 INFO [DefaultPartition:ReplicantManager]
Dead members: 0
10:45:05,187 INFO [DefaultPartition:ReplicantManager]
Originating groups: [[wizcom-comp2:1277 (additional
data: 17 bytes)|2] [wizcom-comp2:1277 (additional data:
17 bytes)], [wizcom-desk01:4950 (additional data: 17
bytes)|2] [wizcom-desk01:4950 (additional data: 17 bytes)]]
10:45:05,233 INFO [DefaultPartition:ReplicantManager]
Start merging members in DRM service...
10:45:05,655 INFO [DefaultPartition:ReplicantManager]
..Finished merging members in DRM service
Node 'B'
10:45:05,740 INFO [DefaultPartition] New cluster view:
3 ([192.168.0.34:1099, 192.168.0.46:1099] delta: 1)
10:45:05,756 INFO [DefaultPartition:ReplicantManager]
Merging partitions...
10:45:05,756 INFO [DefaultPartition:ReplicantManager]
Dead members: 0
10:45:05,756 INFO [DefaultPartition:ReplicantManager]
Originating groups: [[wizcom-comp2:1277 (additional
data: 17 bytes)|2] [wizcom-comp2:1277 (additional data:
17 bytes)], [WIZCOM-DESK01:4950 (additional data: 17
bytes)|2] [WIZCOM-DESK01:4950 (additional data: 17 bytes)]]
10:45:05,818 INFO [DefaultPartition:ReplicantManager]
Start merging members in DRM service...
10:45:05,943 INFO [HAILServerILService] Notified to
stop acting as singleton.
10:45:05,943 INFO [DefaultPartition:ReplicantManager]
..Finished merging members in DRM service
13. Refresh the HAILSharedState in the jmx-console,
both nodes have their own IP address as the ServerAddress.
Thanks
Ian