Uploaded image for project: 'Application Server 3  4  5 and 6'
  1. Application Server 3 4 5 and 6
  2. JBAS-896

JMS started on both nodes in cluster after network glitch

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • No Release
    • JBossAS-3.2.6 Final
    • Clustering
    • None

      SourceForge Submitter: iankenn .
      Original posting on JBoss.org Clustering forum:

      Hi

      I'm currently developing a system which uses JMS
      queuing for async processing of messages. I'm looking
      at deploying to a cluster of two JBoss 3.2.3 servers to
      provide some level of fail-over/resilience.

      During testing of the JMS fail-over I've tried killing
      one of the JBoss instances (the one running the JMS
      server) and see that the JMS queues are migrated to the
      other node. But when I tried to simulate a temporary
      loss of network connectivity between the two machines
      (by removing one of the network cables and then
      replacing it) the cluster seems to break and both
      machines start to run the JMS queues.

      When the network cable is reconnected, neither node
      appear to know that there is another node in the same
      partition. Effectively the cluster is not
      re-established. The only way to make the two nodes see
      each other again is to restart one of the nodes. Is
      there something that I have miss-configured/not
      configured, I am new to clustering and would appreciate
      some advice. - I am currently testing on two windows
      machines but intend to deploy to Linux boxes.

      Thanks,

      Ian

      See posting
      http://www.jboss.org/index.html?module=bb&op=viewtopic&t=45901

      Configuration (both machines)
      OS: Windows 2000
      JDK: 1.4.2_03
      JBoss: 3.2.3

      The attached zip contains the cluster.log files for
      both servers:
      Node 'A' - Node_A_cluster.log
      Node 'B' - Node_B_cluster.log

      Steps


      1. Turn on logging for clustering in /conf/log4j.xml
      2. Start JBoss on Node 'A'
      3. Start JBoss on Node 'B'
      4. Deploy EAR to farm dir on Node 'A''
      This is farmed to Node 'B'
      5. Submit Msg to Node 'A' (Http request to application)
      6. Submit Msg to Node 'B' (Http request to application)
      7. Look at the HAILSharedState ServerAddress for the
      JBoss MQ on the jmx-console - this shows the IP address
      of Node 'A' on both nodes.
      8. Remove network cable from Node 'A'
      9. The following messages are displayed in the console:
      Node 'A'
      10:40:53,921 INFO [DefaultPartition] New cluster view
      (id: 2, delta: -1) : [192.168.0.34:1099]
      10:40:53,921 INFO [DefaultPartition:ReplicantManager]
      Dead members: 1
      10:40:58,015 INFO [DefaultPartition] Suspected member:
      wizcom-desk01:4950 (additional data: 17 byte
      s)

      Node 'B'
      10:40:53,376 INFO [DefaultPartition] New cluster view
      (id: 2, delta: -1) : [192.168.0.46:1099]
      10:40:53,376 INFO [DefaultPartition:ReplicantManager]
      Dead members: 1
      10:40:53,516 INFO [HAILServerILService] Notified to
      become singleton

      10. The jmx-console on Node 'B' now shows it's own IP
      address as the HAILSharedState ServerAddress.
      11. The jmx-console on Node 'A' still shows it's own IP
      address as the HAILSharedState ServerAddress.
      11. Reconnect the network cable to Node 'A'
      12. The following message appears in the console:
      Node 'A'
      10:45:05,171 INFO [DefaultPartition] New cluster view
      (id: 3, delta: 1) : [192.168.0.34:1099, 192.168.0.46:1099]
      10:45:05,171 INFO [DefaultPartition:ReplicantManager]
      Merging partitions...
      10:45:05,171 INFO [DefaultPartition:ReplicantManager]
      Dead members: 0
      10:45:05,187 INFO [DefaultPartition:ReplicantManager]
      Originating groups: [[wizcom-comp2:1277 (additional
      data: 17 bytes)|2] [wizcom-comp2:1277 (additional data:
      17 bytes)], [wizcom-desk01:4950 (additional data: 17
      bytes)|2] [wizcom-desk01:4950 (additional data: 17 bytes)]]
      10:45:05,233 INFO [DefaultPartition:ReplicantManager]
      Start merging members in DRM service...
      10:45:05,655 INFO [DefaultPartition:ReplicantManager]
      ..Finished merging members in DRM service

      Node 'B'
      10:45:05,740 INFO [DefaultPartition] New cluster view:
      3 ([192.168.0.34:1099, 192.168.0.46:1099] delta: 1)
      10:45:05,756 INFO [DefaultPartition:ReplicantManager]
      Merging partitions...
      10:45:05,756 INFO [DefaultPartition:ReplicantManager]
      Dead members: 0
      10:45:05,756 INFO [DefaultPartition:ReplicantManager]
      Originating groups: [[wizcom-comp2:1277 (additional
      data: 17 bytes)|2] [wizcom-comp2:1277 (additional data:
      17 bytes)], [WIZCOM-DESK01:4950 (additional data: 17
      bytes)|2] [WIZCOM-DESK01:4950 (additional data: 17 bytes)]]
      10:45:05,818 INFO [DefaultPartition:ReplicantManager]
      Start merging members in DRM service...
      10:45:05,943 INFO [HAILServerILService] Notified to
      stop acting as singleton.
      10:45:05,943 INFO [DefaultPartition:ReplicantManager]
      ..Finished merging members in DRM service

      13. Refresh the HAILSharedState in the jmx-console,
      both nodes have their own IP address as the ServerAddress.

      Thanks

      Ian

              Unassigned Unassigned
              sourceforge-user SourceForge legacy user (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: