Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-4480

Messages sent to leavers can clog the JGroups bundler thread

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 7.0.0.Beta1
    • 6.0.2.Final
    • Core
    • None

    Description

      In a stress test that repeatedly kills nodes while performing read/write operations, the TransferQueueBundler thread seems to spend a lot of time waiting for physical addresses:

      06:40:10,316 WARN  [org.radargun.utils.Utils] (pool-5-thread-1) Stack for thread TransferQueueBundler,default,apex953-14666:
      java.lang.Thread.sleep(Native Method)
      org.jgroups.util.Util.sleep(Util.java:1504)
      org.jgroups.util.Util.sleepRandom(Util.java:1574)
      org.jgroups.protocols.TP.sendToSingleMember(TP.java:1685)
      org.jgroups.protocols.TP.doSend(TP.java:1670)
      org.jgroups.protocols.TP$TransferQueueBundler.sendBundledMessages(TP.java:2476)
      org.jgroups.protocols.TP$TransferQueueBundler.sendMessages(TP.java:2392)
      org.jgroups.protocols.TP$TransferQueueBundler.run(TP.java:2383)
      java.lang.Thread.run(Thread.java:744)
      

      There are 2 bugs related to this already fixed in JGroups 3.5.0.Beta2+: JGRP-1814, JGRP-1815

      There is also a special case where the physical address could be removed from the cache too soon, exacerbating the effect of JGRP-1815: JGRP-1858

      We can work around the problem by changing the JGroups configuration:

      • TP.logical_addr_cache_expiration=86400000
        • Only expire addresses after 1 day
      • TP.physical_addr_max_fetch_attempts=1
        • Sleep for only 20ms waiting for the physical address (default 3 - 1500ms)
      • UNICAST3_conn_close_timeout=10000
        • Drop the pending messages to leavers sooner

      Attachments

        Activity

          People

            dberinde@redhat.com Dan Berindei (Inactive)
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: