Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-614

FC: deadlock with synchronous RPCs and exhausted credits

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Won't Do
    • Icon: Major Major
    • 2.4.2
    • None
    • None

      1. Members A and B
      2. A makes a synchronous clustered RPC
      3. B handles the RPC by placing the message into a queue where it is processed by the one and only incoming handler thread.
      4. The handler thread dequeues the message and passes it up the stack
      5. In NAKACK, a lock(A) is acquired and the message is passed up
      6. The app receives the message, invokes the method and - since it is synchronous - passes the response down
      7. In FC.down(), we happen to run out of credits and block
      8. FC.down() periodically requests credits from A.
      9. A receives B's credit request and sends new credits to B
      10. B receives the credits and places them into the incoming handler's queue
      11. HOWEVER, B will never get the credits because its incoming handler thread is still stuck in FC.down() and never has a chance to process that message

      We currently suggest not to use FC with synchronous RPCs on JGroups 2.4.1 (note that 2.5 and higher doesn't have this issue).

      However, I think a lot of customers don't know this and will run into this issue. I mean, if you think about it, just one time you don't get credits sent to you by the receiver(s), and block in FC.down() and you will block forever !

      Solution: introduce a 'minimal' OOB flag into messages, and when an OOB message is received, use a separate thread to pass it up the stack. Since we don't have many OOB messages, there shouldn't be too many threads spawned.
      We may also have to add special handling for OOB message in UNICAST.up() and NAKACK.up() as we acquire per-sender-locks there.

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: