Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-497

Message bundling seems to add latency well beyond max_bundle_timeout

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.4.1 SP4, 2.5
    • 2.4.1 SP3
    • None

      Short synopsis: with bundling enabled and max_bundle_timeout=30 ms, I'm sometimes seeing 700 ms delay in receiver getting a message, leading to transient AS testsuite failures. Disabling bundling makes the transient failures go away.

      Long discussion:

      The JBoss AS testsuite has been seeing intermittent failures of the asynchronous web session replication tests. Particularly with FIELD granularity tests. Basically, test modifies a session on one node, waits 500 ms, then fails over to the other node, expecting consistent state. Test fails if the session state is not as expected.

      Whenever I investigate the intermittent failure, it's always a case of the asynchronous replication message arriving after the failover request. TRACE logging of JBoss Cache shows sometimes a 700 ms delay between the sender cache sending the replication and the receiver receiving it. That's just too long!

      Causes I could think of:

      1) Some up_thread/down_thread set to true, leaving a message sitting in a queue for a while until the OS schedules the thread. We used to see this problem. Nope – all threads are set to false.

      2) Bad luck; full gc happens at the wrong time. Possible but IMO unlikely; the failures occur too often and its not like these tests are generating a ton of garbage that's forcing a lot of full gc runs.

      3) System is under some other load during the relevant period. Unlikely. The client is sleeping and the servers have nothing else going on.

      4) Message bundling. It's turned on, but max_bundle_timeout is 30 ms, so the latency it adds to an async RPC should be minimal. But, I just disabled bundling and have now run the async FIELD tests about 10 times with no failures. With it enabled I'd get a failure in some test on average nearly once per run.

      Perhaps there is something that's preventing the Bundler task executing on the expected schedule?

              rhn-engineering-bban Bela Ban
              bstansbe@redhat.com Brian Stansberry
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: