-
Bug
-
Resolution: Obsolete
-
Major
-
JBossAS-5.1.0.GA
-
None
JBC is different from JBC 1.4 in how it handles suspected nodes during data gravitation. JBC would ignore them; w/ JBC 3 they propagate.
These can happen easily with a cluster under load and a node failing. LB detects failure before view changes, node that has failing node as it backup starts gravitating, replication of the gravitated data to the (failed) backup throws a SuspectException.
The clustering integration needs to handle this better. Right now gravitation attempts are wrapped in txs, so the SuspectException fails the tx commit. That's pretty non-recoverable unless we catch the commit failure and retry. A possibility is to not wrap the gravitation in a tx (not really needed except for FIELD) and use JBossCacheWrapper's get() retry logic to redo the gravitation.
We already catch the exception and allow the request to continue if the data was actually retrieved. Actually, that only works because we wrap w/ the tx; JBC wouldn't return from the gravitation read without the tx causing the replication write to wait for tx commit. Hmm...
Problem this causes now is 1) gravitated data doesn't replicate to buddy until a request changes it and causes a normal write 2) DataGravitationCleanupCommand is not issued, so stale data is left in the cache. Some of the changes made for JBCACHE-1530 reduce the likelihood of that stale data being used; it's only used if a request fails over to the node where it's stored, leading to gravitation from (stale) local backup tree.