-
Bug
-
Resolution: Done
-
Blocker
-
8.2.0.CR1
-
None
The issue was reliably reproduced on a cluster of 64 nodes with default clustered.xml configuration. The nodes receive a view and hang. Coordinator's log is cluttered with CacheNotFoundResponse & JGroups NPE's (included below). Other nodes just receive replication timeout while getting rebalancing status.
Thread dump (coordinator): https://gist.github.com/mcimbora/b65914bcd8141427fbf4
Exceptions (coordinator): https://gist.github.com/mcimbora/0dd9a5b53344c1cdb18b
Exceptions(other nodes): https://gist.github.com/mcimbora/49d8b9f1b983d17ca5b1