Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-11000

LocalTopologyManager should not wait for view if the local node is not a member

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 10.1.0.Final
    • 9.4.16.Final, 10.1.0.Beta1
    • Core
    • None
    • DataGrid Sprint #37, DataGrid Sprint #38, DataGrid Sprint #39

    Description

      Sometimes a node is excluded from the cluster view but it can still receive multicast messages like FD_ALL heartbeats and topology updates from the coordinator.

      Because it is still receiving heartbeats, the excluded is not becoming coordinator itself and installing a new view. If MERGE3 doesn't merge the partitions, it could keep the outdated view for a long time, and LocalTopologyManagerImpl will block many transport threads waiting for the right view to process the topology updates that keep coming from the coordinator:

      11:31:01,052 INFO  [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8}
      [33m11:31:05,281 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-79,edg-perf03-47882) edg-perf03-47882: not member of view [edg-perf01-21541|6]; discarding it
      11:31:11,041 INFO  [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8}
      [0m[33m11:31:16,267 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-80,edg-perf03-47882) edg-perf03-47882: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([edg-perf01-21541|6]) doesn't match the current view-id ([edg-perf01-21541|5]); discarding delta view [edg-perf01-21541|7], ref-view=[edg-perf01-21541|6], left=[edg-perf06-47720]
      [0m[33m11:31:16,274 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-80,edg-perf03-47882) edg-perf03-47882: not member of view [edg-perf01-21541|7]; discarding it
      11:31:21,035 INFO  [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8}
      11:31:31,040 INFO  [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8}
      11:31:41,047 INFO  [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8}
      11:31:51,033 INFO  [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8}
      11:32:01,035 INFO  [org.radargun.service.InfinispanRestAPI] (pool-2-thread-1) CacheManagerInfo{clusterMembers=[edg-perf01-21541, edg-perf02-54831, edg-perf05-28640, edg-perf03-47882, edg-perf06-47720, edg-perf04-19840, edg-perf07-34498, edg-perf08-52975], clusterSize=8}
      [0m[33m11:32:03,051 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-80,edg-perf03-47882) edg-perf03-47882: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([edg-perf01-21541|7]) doesn't match the current view-id ([edg-perf01-21541|5]); discarding delta view [edg-perf01-21541|8], ref-view=[edg-perf01-21541|7], left=[edg-perf04-19840]
      [0m[33m11:32:03,063 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-80,edg-perf03-47882) edg-perf03-47882: not member of view [edg-perf01-21541|8]; discarding it
      [0m[31m11:32:05,321 ERROR [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p5-t5) ISPN000452: Failed to update topology for cache memcachedCache: org.infinispan.util.concurrent.TimeoutException: ISPN000451: Timed out waiting for view 6, current view is 5
      	at org.infinispan.topology.LocalTopologyManagerImpl.waitForView(LocalTopologyManagerImpl.java:571)
      	at org.infinispan.topology.LocalTopologyManagerImpl.doHandleTopologyUpdate(LocalTopologyManagerImpl.java:302)
      	at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleTopologyUpdate$1(LocalTopologyManagerImpl.java:286)
      	at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
      	at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
      	at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      	at java.base/java.lang.Thread.run(Thread.java:834)
      

      LocalTopologyManagerImpl.doHandleTopologyUpdate() could check if the local node is a member of the new topology first, avoid blocking, and avoid logging an error message.

      Attachments

        Activity

          People

            dberinde@redhat.com Dan Berindei (Inactive)
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: