-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
1.0.0.Alpha1
We are running wildfly in domain mode with the following configuration.
host A running domain controlller
host B running host controller with one app sever
host C running host controller with one app server
host D running host controller with one app server
When we deloy war using jboss-cli the web console is blocked for usage until deploy completes. I have run jvisualvm and it does not appear that domain controller process is starved for resources (cpu, memory, threads).
- is cloned by
-
JBEAP-7943 wildfly web management console hangs during deploy from cli
-
- Closed
-
- relates to
-
WFCORE-586 domain controller does not timeout on bad app deploy
-
- Closed
-
1.
|
Avoiding unnecessary 2-phase execution of composite operations in a managed domain |
|
Open | |
Unassigned |
2.
|
Guard domain topology changes with separate locks from the controller lock |
|
Open | |
Unassigned |
I briefly considered not holding any long lasting topology lock and simply getting the set of hosts under a short lived lock. But that is not reliable:
1) T1 is doing a domain-wide write, on DC OperationCoordinatorStepHandler gathers the registered servers and creates DomainSlaveHandler to do the HC rollout.
2) New HC starts, connects, gets exclusive lock, starts registration stuff.
3) T1 gets to the Stage.MODEL handler that detects a write, tries to get exclusive lock, blocks
4) New HC reg is completed, exclusive lock released
5) T1 gets lock, proceeds
6) T1 gets to DomainSlaveHandler, rolls out the change to the set of slaves provided in 1) above, which does not include New HC.
7) New HC misses the update.
The situation with servers I believe is simpler. There the set of host and server proxies is a ref to the complete, dynamically updated set. Which servers get called depends on the rollout plan. The rollout plan is created after Stage.MODEL, so the exclusive lock will be held when it is created. So any "New Server" joining in a race with the change will either a) block in registration acquiring the exclusive lock until after the change is complete or b) cause the change to block in Stage.MODEL until reg is complete, with New Server then being picked up by DomainRolloutStepHandler the same as if it had been registered before the change op even began.
The way the server case is handled by DomainRolloutStepHandler suggests a possible easy fix for the host case as well. DomainSlaveHandler should be constructed with a ref to the complete dynamically updated map of host proxies (the way DomainRolloutStepHandler is). It should also be given the set of host names to update, or null if the update is global. If the list of host names is not null, that means the op only targets particular hosts, with no possibility of that set being added to in the course of execution. So, if if the change is global, the write lock in a Stage.MODEL step will ensure that any new host is either registered before DomainSlaveHandler executes, or is blocking waiting for the change op to complete. If the change is not global, the registration of a new slave is irrelevant to DomainSlaveHandler; it just works with the set of hosts it knows about.
Reads still need some thought though. The current behavior of overly aggressively taking the exclusive lock prevents some possible scenarios, like a client periodically reading a bunch of metrics getting a failure because a host or server is removed by another op in the middle of the read. This could be a real scenario now that things like multi-process reads and the query op are supported.