[WFLY-12022] Concurrent singleton service installation can cause service to run simultaneously on 2 members. - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 17.0.0.Alpha1, 17.0.0.Final
Affects Version/s: 16.0.0.Final
Component/s: Clustering
Labels:
None

Git Pull Request:
https://github.com/wildfly/wildfly/pull/12249

There exists a race condition between concurrent elections triggered by different nodes. In general, only 1 node actually runs the election for a given set of singleton candidates. During a deployment replace, there are a rapid series of changes to the candidates as the deployment is stopped and restarted. While each node processes them 1 at a time, this processing isn't synchronized across members. This is the root of the problem, as a new election can be triggered on one node while another node is still in the process of completing its election. Here's the scenario where observed the race condition:

Before deployment replace, sever-one is the primary provider of the singleton service
Each node undeploys its application, and restarts. As each node redeploys, and the singleton service is reinstalled, each node register itself as providing the singleton service. The redeploy happens concurrently, but the registration order appears the same on all nodes.
In this case, the registration order was server-three, server-two, server-one.

server-three registers first, it elects itself and starts its service
server-two registers next
1. server-three defers election to server-two
2. server-two runs the election:
  1. Elects itself
  2. Sends a synchronous service stop message to server-three
  3. Starts its service
server-one registers next, while server-two is in the process of stopping the service on server-three
1. server-three defers election to server-one
2. server-two is still in the election process, but will defer election to server-one once complete
3. server-one runs the election:
  1. Elects itself
  2. Sends service stop message to server-two and server-three
    1. server-three is no longer running its service
    2. server-two hasn't yet started its service, but it will soon (This is the problem)
  3. server-one starts its service
  4. Meanwhile, server-two just received its response that the stop of service (2.B.II.) and commences its own service start (1.B.III.)

Now server-one and server-two are both running the service.

is cloned by

JBEAP-16810 [GSS](7.2.z) WFLY-12022 - Concurrent singleton service installation can cause service to run simultaneously on 2 members.

Closed

Assignee:: Paul Ferraro

Reporter:: Paul Ferraro

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2019/04/26 9:21 PM

Updated:: 2019/10/02 6:37 PM

Resolved:: 2019/04/29 6:57 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates