Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1448

FILE_PING: Fail to read node file

    XMLWordPrintable

Details

    • Patch
    • Resolution: Done
    • Major
    • 3.0.11, 3.1
    • 2.12.3

    Description

      When using the FILE_PING protocol it will periodically print the following in the log:
      2012-03-19 16:20:41,057 [ Timer-5,<ADDR>] WARN [org.jgroups.protocols.FILE_PING] failed reading 83dc9dfe-8dd4-eff2-4474-d57dbaa96143.node: removing it

      This is most likely due to that all members write randomly to the same directory and reading is done without any synchronization to the writes.
      Hence running for long enough some point in time the read file will be corrupt.
      This occurs more often the slower the shared file system is (e.g. a slow NFS mount).

      I will uploaded a patch in which there are two modifications to the FILE_PING class.
      1) Writing to files are done in two steps.
      First we write to a temporary file in order to avoid that the "readAll" methods picks up a half written file.
      Then we do a semi-atomic move of the tmp file to the proper node fil

      2) Reading all node files will perform a few re-attempts should it fail to read a file.
      This is to provide a simple re-try mechanism should the file be half written and therefore not readable.

      Attachments

        1. FILE_PING.java
          12 kB
        2. FILE_PING.java
          12 kB
        3. JGRP-1448.patch
          9 kB

        Activity

          People

            rhn-engineering-bban Bela Ban
            peter.nerg Peter Nerg (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: