Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1182

GET_MBRS_RSP are not all processed, Discovery step ends prematurely.

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • 2.10
    • 2.6.9, 2.6.14, 2.10

      I launch successively (nearly simultaneously) 5 nodes A B C D E on 5 hosts using the same protocol stack and one channel to communicate between themselves.

      UDP(mcast_addr=231.8.8.8;mcast_port=45578):PING(num_initial_members=5;timeout=800):MERGE2:FD:VERIFY_SUSPECT:pbcast.NAKACK:pbcast.STABLE:FRAG2:pbcast.GMS:pbcast.FLUSH

      Discovery sends up to n GET_MBRS_REQ to discover the members. Each GET_MBRS_REQ triggers a round of GET_MBRS_RSP which increases the initial_member count up to its limit in the Promise blocking the discovery. One GET_MBRS_RSP round may not be sufficient to discover all the members, the second RSP round then completes the count of the Promise, but depending on the order of RSP reception, the Promise condition may be signalled before all the RSP are processed, and those unprocessed RSP may belong to a Coordinator elected between the two REQ sent. => trouble.

      exemple:
      A B C D E are launched
      ...
      D sends GET_MBRS_REQ
      D receives 4 GET_MBRS_RSP from D A B C
      A becomes coordinator
      D sends GET_MBRS_REQ 400ms after the first
      D receives B GET_MBRS_RSP
      D receives E GET_MBRS_RSP and meets the discovery initial_members. Discovery ends in 428ms
      D receives A GET_MBRS_RSP A is coordinator but it's too late, it won't be counted in the set of responses
      D becomes coordinator.

      We have two coordinators.

      It may happen also if E is quicker and is part of the first RSP round.

      I am not sure yet of how to solve this problem. Obviously D should have been warned A was becoming coordinator or A was trying to at least.
      Perhaps if all the GET_MBRS traffic was multicast, each new member could spy it and try according the different REQ and RSP message find who is doing what.

      I'd see well discovery split in two phase, on phase where a new member would "silently" listen to the network then actively try to discover the other member with several GET_MBRS_REQ.

            [JGRP-1182] GET_MBRS_RSP are not all processed, Discovery step ends prematurely.

            You're right, I had overlooked UNICAST. In the past, I wasn't aware part of the service traffic was unicast and not all multicast. I should put it back.

            I have thought a bit now and came with a bunch of ideas.

            1. We could wipe the set of GET_MBRS_RSP prior sending a new GET_MBRS_REQ. It would improve a lot but not solve completely the problem.

            2. GET_MBRS_RSP could track the number of GET_MBRS_REQ so in the exemple, D would know A B C already sent GET_MBRS_REQ and are perhaps trying to elect themselves. Therefore D should wait.

            3. Increase the frequency of GET_MBRS_REQ so it can act as a heartbeat with period p for the duration of the Discovery. Every node waits up to 2p before starting to send its own GET_MBRS_REQ but responds to incoming GET_MBRS_REQ. Obviously, if you reply to one, you won't send any and wait for the other node to do the job and install the view.
            A kind of priority must be assigned to each node to break ties if following the same logic each node starts to speak at the same time after the 2p silence.

            Thanks for the advice, I agree it may help because of the break_on_coord_rsp. I'll reopen the issue if I manage to come with some java code for the 3.

            Renaud Devarieux (Inactive) added a comment - You're right, I had overlooked UNICAST. In the past, I wasn't aware part of the service traffic was unicast and not all multicast. I should put it back. I have thought a bit now and came with a bunch of ideas. 1. We could wipe the set of GET_MBRS_RSP prior sending a new GET_MBRS_REQ. It would improve a lot but not solve completely the problem. 2. GET_MBRS_RSP could track the number of GET_MBRS_REQ so in the exemple, D would know A B C already sent GET_MBRS_REQ and are perhaps trying to elect themselves. Therefore D should wait. 3. Increase the frequency of GET_MBRS_REQ so it can act as a heartbeat with period p for the duration of the Discovery. Every node waits up to 2p before starting to send its own GET_MBRS_REQ but responds to incoming GET_MBRS_REQ. Obviously, if you reply to one, you won't send any and wait for the other node to do the job and install the view. A kind of priority must be assigned to each node to break ties if following the same logic each node starts to speak at the same time after the 2p silence. Thanks for the advice, I agree it may help because of the break_on_coord_rsp. I'll reopen the issue if I manage to come with some java code for the 3.

            Bela Ban added a comment -

            Hmm, in ClientGmsImpl.joinInternal(), we get the list of responses (e.g. A D B C E) and

            • sort it: A B C D E
            • get the first element: A
            • compare A to our own address
            • if it's a match: become coordinator, else continue the loop in joinInternal()

            So I think my advice above is nevertheless correct: just set num_initial_members to 6 and the client handling code in ClientGmsImpl.joinInternal() [lines 99f] should determin the correct coordinator.

            I'm closing this issue, feel free to re-open.

            Bela Ban added a comment - Hmm, in ClientGmsImpl.joinInternal(), we get the list of responses (e.g. A D B C E) and sort it: A B C D E get the first element: A compare A to our own address if it's a match: become coordinator, else continue the loop in joinInternal() So I think my advice above is nevertheless correct: just set num_initial_members to 6 and the client handling code in ClientGmsImpl.joinInternal() [lines 99f] should determin the correct coordinator. I'm closing this issue, feel free to re-open.

            Bela Ban added a comment -

            I guess the only real workaround currently is to stagger startup, or at least start the coord first, then all subsequent members can be started in parallel.

            Bela Ban added a comment - I guess the only real workaround currently is to stagger startup, or at least start the coord first, then all subsequent members can be started in parallel.

            Bela Ban added a comment -

            I don't think this can be fixed, it's in the nature of concurrent startup without an existing coordinator that the first responses are all non-coord responses.

            A workaround to the problem is that break_on_coord_rsp is set to true (default anyway) and num_initial_members is set to a value greater than the max initial membership, so in your example above:

            break_on_coord_rsp="true" num_initial_members="6" timeout="3000"

            This way, concurrent startup without a pre-existing coordinator will wait for 6 members. D will get responses from D A B C and E, so it'll continue waiting. When it receives the 2nd GET_MBRS_RSP from A (this time as coord), break_on_coord_rsp will terminate the discovery phase.

            Actually, this may not work, as A runs the same logic, and A and D could become coordinators at exactly the same time...

            Bela Ban added a comment - I don't think this can be fixed, it's in the nature of concurrent startup without an existing coordinator that the first responses are all non-coord responses. A workaround to the problem is that break_on_coord_rsp is set to true (default anyway) and num_initial_members is set to a value greater than the max initial membership, so in your example above: break_on_coord_rsp="true" num_initial_members="6" timeout="3000" This way, concurrent startup without a pre-existing coordinator will wait for 6 members. D will get responses from D A B C and E, so it'll continue waiting. When it receives the 2nd GET_MBRS_RSP from A (this time as coord), break_on_coord_rsp will terminate the discovery phase. Actually, this may not work, as A runs the same logic, and A and D could become coordinators at exactly the same time...

            Bela Ban added a comment -

            I noticed you're missing UNICAST in your config. Since you use UDP which is unreliable, unicast messages can get lost without UNICAST. This is not the cause of the issue here, but I wanted you to be aware of it.

            I'll look at your case now...

            Bela Ban added a comment - I noticed you're missing UNICAST in your config. Since you use UDP which is unreliable, unicast messages can get lost without UNICAST. This is not the cause of the issue here, but I wanted you to be aware of it. I'll look at your case now...

            Bela Ban added a comment -

            I'll take a look and see whether we can fix this in 2.10, or whether it needs to be pushed out to 2.11.

            There may not be a solution at all though...

            Bela Ban added a comment - I'll take a look and see whether we can fix this in 2.10, or whether it needs to be pushed out to 2.11. There may not be a solution at all though...

              rhn-engineering-bban Bela Ban
              rddx_jira Renaud Devarieux (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: