Uploaded image for project: 'Application Server 3  4  5 and 6'
  1. Application Server 3 4 5 and 6
  2. JBAS-1151 deadlock during cluster failover
  3. JBAS-2647

Remove potential deadlock condition from HASingletonSupport

XMLWordPrintable

    • Icon: Sub-task Sub-task
    • Resolution: Done
    • Icon: Major Major
    • JBossAS-5.0.0.CR2
    • None
    • Clustering
    • None

      The startService() implementation HASingletonSupport inherits from HAServiceMBeanSupport has a slight potential for deadlock is a cluster topology change occurs while the singleton service itself is being deployed. The only known use case where this would occur is with the HASingletonDeployer service.

      Details:

      In Thread A

      1) HASingletonDeployerServices is being deployed, and therefore has synchronized on org.jboss.system.ServiceController.
      2) Calls DRM.registerListener()
      3) Call DRM.add() (this is the next line of code)
      4) As part of add processing, DRM callsback to the HASingleton.
      5) Inside a synchronized block in the callback method, singleton determines if it is the master node, goes on to do its work.

      Problem occurs if a cluster topology change occurs between steps 2 and 3. In that case, the following would happen in another thread, Thread B.

      1) Topology changes, so DRM notifies listeners.
      2) Our HASingleton is registered as a listener, so step 5 above occurs.
      3) Since its the master, goes and tries to deploy things in deploy-hasingleton.
      4) Deployment can't proceed because Thread A has synchronized on org.jboss.system.ServiceController.
      5) Thread A can't proceed because Thread B is stuck inside the synchronized block in the callback method. Deadlock.

      This is an unlikely scenario, but I'm marking this issue as major since if it does occur it deadlocks the node.

      A likely fix will involve overriding the startService() implemetation so it doesn't rely on the callback to determine whether or not its the master node. Instead it directly does what the callback code does, and then registers as a listener. Have to be careful not to drop any topology changes in the middle.

              pferraro@redhat.com Paul Ferraro
              bstansbe@redhat.com Brian Stansberry
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: