Uploaded image for project: 'Application Server 3  4  5 and 6'
  1. Application Server 3 4 5 and 6
  2. JBAS-9481

race condition can break HASingleton functionality

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • JBossAS-5.1.0.GA
    • Clustering
    • None
    • Workaround Exists
    • Hide

      Stagger startup of instances.
      The race condition can only occur while the HASingleton is being deployed.

      Show
      Stagger startup of instances. The race condition can only occur while the HASingleton is being deployed.
    • Hide

      Deploy attached test.jar (contains an HASingleton that logs start and stop) on both nodes of a 2-node cluster.

      Start one instance of the cluster.

      On the second node, use the attached byteman script.
      Start the second node.

      While the second node is paused ("[Pausing]" printed in the log), stop the first node (triggering a view change, and starting the HASingleton).
      (Note: it will pause once for each HASingleton, since narrowing it further would require much more complication in the byteman script.
      If you don't see the HASingleton starting when you stop the first node, it was the wrong pause).

      When the pause ends...

      Expected result: the HASingleton is still running (it's on the only member of the cluster)
      Actual result: the HASingleton is stopped, causing it to not be running anywhere in the cluster.

      Show
      Deploy attached test.jar (contains an HASingleton that logs start and stop) on both nodes of a 2-node cluster. Start one instance of the cluster. On the second node, use the attached byteman script. Start the second node. While the second node is paused (" [Pausing] " printed in the log), stop the first node (triggering a view change, and starting the HASingleton). (Note: it will pause once for each HASingleton, since narrowing it further would require much more complication in the byteman script. If you don't see the HASingleton starting when you stop the first node, it was the wrong pause). When the pause ends... Expected result: the HASingleton is still running (it's on the only member of the cluster) Actual result: the HASingleton is stopped, causing it to not be running anywhere in the cluster.

    Description

      HASingletonImpl#registerDRMListener has a race condition with partitionTopologyChanged, which can cause views to be processed
      out of order, and HASingletons to be started when they should be stopped, or stopped when they should be started.

      The problem is that the thread calling registerDRMListener (which calls partitionTopologyChanged) is not synchronized against other threads that call partitionTopologyChanged.

      This was introduced by the fix for https://issues.jboss.org/browse/JBAS-2647.

      To fix the issue, partitionTopology must process the view saved in viewReference in the correct order, and registerDRMListener's
      call to partitionTopology must be synchronized against other threads calling it (without causing a regression of JBAS-2647).

      Attachments

        1. JBAS-9481.btm
          0.3 kB
        2. test.jar
          2 kB

        Issue Links

          Activity

            People

              pferraro@redhat.com Paul Ferraro
              rhn-support-dereed Dennis Reed
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: