Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 3.0.9
    • Fix Version/s: 3.0.10, 3.1
    • Labels:
      None

      Description

      The symptoms I sometime see are: broadcast messages not being delivered to a member.

      I've tracked this down to being because NAKACK2 has gaps in its record of sequence numbers, and its RetransmitTask is not running. I've confirmed that the task is not running by calling stack.getTransport().dumpTimerTasks() and seeing that it is not among the scheduled tasks.

      So far, so definite. I also have a theory about how this happens.

      Suppose thread 1 is in TimeScheduler2._run(), and has got as far as executing some tasks but has not yet reached the line tasks.keySet().removeAll(keys).

      Meanwhile, suppose thread 2 is in TimeScheduler2.schedule(), adding a task that has the same key as the just-executed task. It can reach the branch task.remove(key) ("// entry has completed; remove it"), go round the loop again, and successfully call tasks.putIfAbsent(key, task).

      Now thread 1 picks up again, calls removeAll(keys), and removes the task that has just been scheduled. Oops.

      I suggest that a likely fix is to delete the "else tasks.remove(key)" branch from schedule() altogether. (If we're in that branch then we're blocked by a completed entry. That entry will be removed shortly by the run() thread, and then we'll be able to progress).

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                belaban Bela Ban
                Reporter:
                dimbleby David Hotham
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: