-
Bug
-
Resolution: Done
-
Major
-
EAP 5.0.0.CR2
-
None
-
Brno QE lab (porting to cluster lab as well)
-
Fixed in
JGRP-1282which was committed to EAP_EWP 5.1.1.x -
Not Required
Brian:
The tests are of web session replication in a 4 node cluster. Nodes are randomly brought in and out of the cluster.
I'm attaching the server logs from the 4 nodes (jawa01-4) plus from the test control node (jawa12). Failure happens at 16:26:26 as node1 aka jawa02 tries to join the group. The coordinator, node0 aka jawa01 attempts a flush to install the view, which doesn't succeed. I believe due to not receiving a response from node2/jawa03 in time. Some attempt at "reconciling" is made, and thereafter things don't recover.
The jawa12.log just gives you an overall picture of what is going on. Keep in mind that it refers to the nodes it is managing as node0 - node3 (zero based) while the actual machine names are jawa01 - jawa04 (one based). I've confused myself that way more than a few times.
The logs for jawa01-04 have a ton of logging from JBC's RpcDispatcher subclass grepped out. Sorta. A bunch of blank lines, misc garbage is left. I'm not cleaning it out as it doesn't seriously impact readability and gives you a feel for the volume of RPCs that are going on at various points.
If you're interested in the full, unmassaged logs, they are available at the links on the left at
- blocks
-
JBPAPP-1260 EAP5 failover testing issues - Tracker JIRA
- Closed
- is blocked by
-
JGRP-1282 Race condition in FLUSH when master leaves cluster
- Resolved
-
JBPAPP-2522 Upgrade to JGroups 2.6.12
- Closed
-
JBPAPP-4946 Upgrade JGroups to 2.6.19
- Closed