Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-1555

[AMQ 7, openwire] Flow Control, (slow) fast consumer (not) killed - possible race condition

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Minor
    • None
    • AMQ 7.1.0.GA, AMQ 7.1.1.GA, AMQ 7.2.0.GA
    • None
    • None

    Description

      Two of our tests for flow control (test_fast_consumer_is_not_killed and test_slow_consumer_is_killed) are failing. Issue concerns mostly open wire but there are one or two cases for core. The issue is very difficult to reproduce and when tried locally it occurs only rarely. For openwire client test_fast_consumer_is_not_killed fails often enough to notice (once or twice per 10 to 15 repetitions of the test) and in our CI it fails almost always.

      Unable to find source-code formatter for language: description of the test_fast_consumer_is_not_killed. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      # Create a queue
      # Send 20 messages to that queue
      # Start sending 20 messages at a rate 2 msgs/second on the background
      # Start receiving from that queue at rate 20 msgs/second
      # See if receiver gets killed
      

      In some rare cases and under very specific conditions even a fast consumer is considered to be slow. Broker periodically in a certain interval calculates whether consumer is slow or not. Our guess is that this calculation event occurs right after new messages are received and before consumer gets chance to try to consume them, messages count towards "throughput rate" while consumption rate remains the same. This is then enough to tip over the threshold and consumer is killed.

      Unable to find source-code formatter for language: description of the test_slow_consumer_is_killed. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      # Create a queue
      # Send 20 messages to that queue
      # Start sending 20 messages at a rate 2 msgs/second on the background
      # Start receiving from that queue at rate 0.5 msgs/second
      # See if receiver gets killed
      

      Similarly to fast consumer, however this time it is other way around, slow consumer is rarely not killed. Calculation event comes right after the consumption of message and before new messages are received into the queue. Therefore messages count towards consumption rate while "throughput rate" remains the same. I must say that situation where slow consumer not killed is very rare and even in our CI there are handful of occurrences.

      I have to stress that this is our best guess based on jdanek@redhat.com's knowledge of what is going on in broker and not necessarily bug per se. Therefore I've set the priority as minor, since most of the time If I am trying to reproduce it, test will pass.

      Attachments

        Activity

          People

            rh-ee-ataylor Andy Taylor
            rvais Roman Vais
            Jiri Daněk, Roman Vais
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: