Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-9783

Replication start will deadlock / starve if PageTimedWriter is flow controlled

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ARTEMIS-5441

      ReplicationFlowControlTest would reproduce this, however in a rare rate.

       

      you need the following combination to have this ocurring:

       

      • Address in page mode
      • Message injestion rate faster than the sync speed of the storage
      • messages accumulated in the PageTimedBuffer in a way the Netty acceptors are flow controlled (blocked waiting for credits)
      • Replication is started
      • and the PageCounter snapshot update is being persisted during that window.

       

      On the test I had seen indications that the deadlock happened during a shutdown.. so the replication was starting while shutdown was also happening.

       

       

      We are fixing this and PageTimedWriterUnitTest is reproducing the conditions on where this could happen.

       

       

      In case this happens the broker will need to be restarted. It is a rare race.

              csuconic@redhat.com Clebert Suconic
              csuconic@redhat.com Clebert Suconic
              Messaging QE Messaging QE
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: