Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2199

JDBC_PING cluster doesn't handle shutdown members

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 4.0.5
    • 4.0
    • None
    • Hide

      Add this to JDBC_PING:

      @Override
      public void stop()

      { super.stop(); if (is_coord) removeAll(cluster_name); }
      Show
      Add this to JDBC_PING: @Override public void stop() { super.stop(); if (is_coord) removeAll(cluster_name); }
    • Hide

      Using the jdbc-ping.xml file attached:
      1. start up a cluster of 3 nodes.
      2. kill -9 the coordinator
      3. attempt to start a new node

      Using the file-ping.xml file attached:
      1. start up a cluster of 3 nodes
      2. kill -9 the coordinator
      3. start a new node successfully

      Show
      Using the jdbc-ping.xml file attached: 1. start up a cluster of 3 nodes. 2. kill -9 the coordinator 3. attempt to start a new node Using the file-ping.xml file attached: 1. start up a cluster of 3 nodes 2. kill -9 the coordinator 3. start a new node successfully

      FILE_PING and JDBC_PING have different behavior when a cluster's coordinator stops.

      With FILE_PING the coordinator will delete the whole cluster's file on shutdown of the coordinator.

      JDBC_PING does not do this and reveals a problematic flaw in how node's are handled on shutdown.

      When I added my own logging to the source of these files I observed that they're both continuously writing to the database/file all of the members because write() is called very frequently.

      Current behavior:

      GIVEN a cluster of JDBC_PING registered nodes
      WHEN a node shuts down
      THEN it removes itself from the database table AND the coordinator almost immediately re-adds the shut down member to the table because of the List<PingData> sent to write()

      GIVEN a cluster of JDBC_PING registered nodes has only the coordinator left
      WHEN the coordinator shuts down
      THEN the coordinator removes itself from the database and because there's no coordinator left the database shows a list of only the 'members' with no coordinator

      GIVEN a cluster of JDBC_PING registered nodes
      WHEN the coordinator shuts down or crashes and does not have time to remove itself from the database
      THEN the next node to start will never finish negotiating membership with the cluster because a phantom coordinator still exists (see attachement: stuck_starting_up.log)

      I expected the behavior between JDBC_PING and FILE_PING to remain consistent

        1. stuck_starting_up.log
          28 kB
        2. jdbc-ping.xml
          3 kB
        3. file-ping.xml
          2 kB

              rhn-engineering-bban Bela Ban
              douglasryanadams Douglas Adams (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: