Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-4748

Singleton service fails to start after repetitive cluster split with "Failed to reach quorum of 1"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 10.0.0.Alpha6
    • 9.0.0.CR1, 10.0.0.Alpha2
    • Clustering
    • None

      When cluster of two nodes with deployed singleton service (f.i. cluster-ha-singleton quickstart app) splits, merges, and splits again, one of the nodes fails to run the singleton service with error message "WFLYCLSV0006: Failed to reach quorum of 1 for jboss.quickstart.ha.singleton.default2 service. No singleton master will be elected." - note the "quorum of 1".

      This only happens after the second and other successive splits. After the first split both nodes execute the service correctly.

      After analysis, it appears that nodes are never being added back to service providers cache upon cluster merge, because CacheServiceProviderRegistrationFactory#membershipChanged() is never called with 'merged' attribute set to 'true'.

      I presume that call should come from ChannelCommandDispatcherFactory#viewAccepted():

      public void viewAccepted(View view) {
          // ...
          for (Listener listener: this.listeners) {
              listener.membershipChanged(oldNodes, newNodes, view instanceof MergeView);
          }
      }
      

      This method gets called, but the problem is that the 'listeners' list is empty, so no listener is actually notified.

              pferraro@redhat.com Paul Ferraro
              thofman Tomas Hofman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: