Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2633

Inconsistent view with TUNNEL and multiple Gossip Router

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • 5.2.5
    • 4.2.21
    • None
    • False
    • None
    • False

      In Openshift, I'm simulating a GossipRouter crash by doing oc delete <pod>.

      What I've noticed, is the relay connection drops and never comes back. I think the following events are happening:

      • VERIFY_SUSPECT message is forwarded to the crashed Gossip Router
      • New view is installed with cluster1 only installing view [_my-cluster-0-24934:pruivo1|2] (1) [_my-cluster-0-24934:pruivo1] (_my-cluster-0-50778:pruivo3 left)
      • On cluster2, the view from cluster1 is discarded [org.jgroups.protocols.pbcast.GMS] _my-cluster-0-50778:pruivo3: not member of view [_my-cluster-0-24934:pruivo1|2]; discarding it

      Eventually, cluster2 installs a view with only itself installing view [_my-cluster-0-50778:pruivo3|3] (1) [_my-cluster-0-50778:pruivo3] (_my-cluster-0-24934:pruivo1 left) but the relay connection never recoveries.

      The weird part is that MERGE3 never fixes the views.

      After the GossipRouter is back online, it logs both SiteMaster; but no merging ever happens and both clusters are isolated:

      added _my-cluster-0-24934 (10.129.0.31:34348) to group xsite
      added _my-cluster-0-50778 (10.129.0.25:45384) to group xsite
      

              rhn-engineering-bban Bela Ban
              pruivo@redhat.com Pedro Ruivo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: