Details
-
Bug
-
Resolution: Done
-
Major
-
9.4.16.Final, 10.1.0.Beta1
-
None
-
DataGrid Sprint #37, DataGrid Sprint #38, DataGrid Sprint #39
Description
Sometimes a node is excluded from the cluster view but it can still receive multicast messages like FD_ALL heartbeats and topology updates from the coordinator.
Because it is still receiving heartbeats, the excluded is not becoming coordinator itself and installing a new view. If MERGE3 doesn't merge the partitions, it could keep the outdated view for a long time, and LocalTopologyManagerImpl will block many transport threads waiting for the right view to process the topology updates that keep coming from the coordinator:
11:31:01,052 INFO [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8} [33m11:31:05,281 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-79,edg-perf03-47882) edg-perf03-47882: not member of view [edg-perf01-21541|6]; discarding it 11:31:11,041 INFO [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8} [0m[33m11:31:16,267 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-80,edg-perf03-47882) edg-perf03-47882: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([edg-perf01-21541|6]) doesn't match the current view-id ([edg-perf01-21541|5]); discarding delta view [edg-perf01-21541|7], ref-view=[edg-perf01-21541|6], left=[edg-perf06-47720] [0m[33m11:31:16,274 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-80,edg-perf03-47882) edg-perf03-47882: not member of view [edg-perf01-21541|7]; discarding it 11:31:21,035 INFO [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8} 11:31:31,040 INFO [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8} 11:31:41,047 INFO [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8} 11:31:51,033 INFO [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8} 11:32:01,035 INFO [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8} [0m[33m11:32:03,051 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-80,edg-perf03-47882) edg-perf03-47882: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([edg-perf01-21541|7]) doesn't match the current view-id ([edg-perf01-21541|5]); discarding delta view [edg-perf01-21541|8], ref-view=[edg-perf01-21541|7], left=[edg-perf04-19840] [0m[33m11:32:03,063 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-80,edg-perf03-47882) edg-perf03-47882: not member of view [edg-perf01-21541|8]; discarding it [0m[31m11:32:05,321 ERROR [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p5-t5) ISPN000452: Failed to update topology for cache memcachedCache: org.infinispan.util.concurrent.TimeoutException: ISPN000451: Timed out waiting for view 6, current view is 5 at org.infinispan.topology.LocalTopologyManagerImpl.waitForView(LocalTopologyManagerImpl.java:571) at org.infinispan.topology.LocalTopologyManagerImpl.doHandleTopologyUpdate(LocalTopologyManagerImpl.java:302) at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleTopologyUpdate$1(LocalTopologyManagerImpl.java:286) at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
LocalTopologyManagerImpl.doHandleTopologyUpdate() could check if the local node is a member of the new topology first, avoid blocking, and avoid logging an error message.