Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1957

S3_PING: Nodes never removed from .list file

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Minor Minor
    • 3.6.6
    • 3.6.4
    • None
    • Workaround Exists
    • Hide
      • reduce the logical_addr_cache_expiration to 1 second
      • reduce the logical_addr_cache_reaper_interval to 10 seconds
      • increase the min_interval and max_interval for MERGE3 to 30 and 60 seconds, respectively
      • set remove_all_files_on_view_change to true

      With these settings, the expired nodes do seem to get removed from the file as expected. The drawback is that these settings will result in increased frequency of updates to the backend store.

      The effect of these settings is to ensure that cache entries expire and are reaped more quickly, and that the info writer thread kicks in to write the updated cache before the MERGE3 protocol is able to send a new FIND_MBRS event.

      Show
      reduce the logical_addr_cache_expiration to 1 second reduce the logical_addr_cache_reaper_interval to 10 seconds increase the min_interval and max_interval for MERGE3 to 30 and 60 seconds, respectively set remove_all_files_on_view_change to true With these settings, the expired nodes do seem to get removed from the file as expected. The drawback is that these settings will result in increased frequency of updates to the backend store. The effect of these settings is to ensure that cache entries expire and are reaped more quickly, and that the info writer thread kicks in to write the updated cache before the MERGE3 protocol is able to send a new FIND_MBRS event.

      I'm not 100% sure, but it seems like there might be a defect here.

      I'm using TCP, S3_PING, and MERGE3.

      I've set logical_addr_cache_max_size to 2 for testing purposes, although I don't think the value of this setting affects my test results.

      I start a single node, node A. Then I start a second node, node B.

      I then repeatedly shutdown and restart node B.

      Each time node B starts, a new row is added to the .list file stored in S3.

      But even if I continue this process for 15 minutes, old rows are never removed from the .list file, so it continues to grow in size.

      I've read the docs and mailing list threads, so I'm aware that the list is not immediately updated as soon as a member leaves. But I was expecting that when a view change occurs, nodes no longer in the view would be marked for removal (line 2193 of TP.java) and then after the logical_addr_cache_expiration has been reached and the reaper kicks in, once a new node joins, the expired cache entries would be purged from the file.

      I dug in to the code a bit, and what seems to be happening is that the MERGE3 protocol periodically generates a FIND_MBRS event. S3_PING retrieves the membership from the .list file, which includes expired nodes. And then all of these members are re-added to the logical address cache (line 157 of S3_PING.java, line 533 of Discovery.java, line 2263 of TP.java).

      So expired nodes are continually re-added to the logical address cache, preventing them from ever being reaped.

              rhn-engineering-bban Bela Ban
              nsawadsky Nick Sawadsky (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: