-
Bug
-
Resolution: Done
-
Major
-
8.2.0.CR1
-
None
When a cache stops, it first removes the component registry from the GlobalComponentsRegistry's namedComponents map, which means the node (let's call it A) will reply with a CacheNotFoundResponse to any remote command.
Another node B trying to execute a write/transactional command will receive the CacheNotFoundResponse, assume that a new cache topology with id current topology id + 1 is coming soon, and wait for that new topology before retrying.
Normally this is not a problem, because StateTransferManagerImpl.stop() sends a CacheTopologyControlCommand(LEAVE) to the coordinator quickly enough, then B receives the current topology id + 1 topology and retries the command.
But in some cases, the cache components that stop before StateTransferManagerImpl can take a long time to do so. In particular, because of ISPN-5507, TransactionTable can block for cacheStopTimeout if there are remote transactions in progress, even though the cache can no longer process remote commands.
We should give StateTransferManagerImpl.stop() a priority of 0, so that the CacheTopologyControlCommand(LEAVE) comand is sent as soon as possible.
- blocks
-
JDG-83 StateTransferManager should be the first component to stop
- Closed
- is cloned by
-
JBEAP-4620 StateTransferManager should be the first component to stop
- Closed
- is incorporated by
-
ISPN-6433 Backport to 8.1.x branch
- Closed
- is related to
-
ISPN-5507 Transactions committed immediately before cache stop can block shutdown
- Closed
-
JBEAP-4626 Transactions committed immediately before cache stop can block shutdown
- Closed