Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1246

FILE_PING: NullPointerException on empty/incorrect file, and the communication is dead

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.10.1, 2.11
    • 2.10
    • None
    • Workaround Exists
    • Hide

      remove those bad (or all) files from the directory

      Show
      remove those bad (or all) files from the directory

      If there is an empty or bad file in the directory (due to some reason - maybe, one of nodes had crashed during file write), you will get the following exception:

      java.lang.NullPointerException
      at org.jgroups.protocols.FILE_PING.handleView(FILE_PING.java:146)
      at org.jgroups.protocols.FILE_PING.down(FILE_PING.java:116)
      at org.jgroups.protocols.MERGE2.down(MERGE2.java:155)
      at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:332)
      at org.jgroups.protocols.FD.down(FD.java:276)
      at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:69)
      at org.jgroups.protocols.BARRIER.down(BARRIER.java:91)
      at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:639)
      at org.jgroups.protocols.UNICAST.down(UNICAST.java:444)
      at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:297)
      at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:596)
      at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:516)
      at org.jgroups.protocols.pbcast.ClientGmsImpl.becomeSingletonMember(ClientGmsImpl.java:344)
      at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:93)
      at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:38)
      at org.jgroups.protocols.pbcast.GMS.down(GMS.java:922)
      at org.jgroups.protocols.FC.down(FC.java:431)
      at org.jgroups.protocols.FRAG2.down(FRAG2.java:154)
      at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:894)
      at org.jgroups.JChannel.downcall(JChannel.java:1649)
      at org.jgroups.JChannel.connect(JChannel.java:420)
      ... 136 more

      This occurs at EVERY node, after that the whole communication is terminated. I even did not find any jgroups threads after that.
      Also, you can not connect new nodes after that - JChannel.connect() crashes for the same reason.
      The problem was reproduced today in our production system.

      Workaround:

      I would propose the following 2 fixes:
      1) when reading files, do not add null/empty/bad entries
      2) [for better reliability] surround the whole FILE_PING.handleView() with try/catch (maybe, for any Discovery protocol?) - even if Discovery fails, all other parts should NOT fail.

              rhn-engineering-bban Bela Ban
              vicnov Victor N (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: