Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-4100

AMQ 7: OOM when durable topic consumers disconnect from downstream broker in a mesh

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Major Major
    • None
    • AMQ 7.7.0.GA
    • None
    • False
    • False
    • Undefined
    • Hide

      Set up two AMQ 7,7 brokers in a in a non-replicated, fully-connected mesh. To make the problem easier to reproduce, I set -Xmx for both brokers to 512Mb – just so they run out of heap quickly.

      Here is the test topology:

                         +-----------+    +------------+
      publisher client-->| upstream  |--> | downstream | --> temporary subscriber 
                         |  broker   |    |  broker    |
                         +-----------+    +------------+

      I'm using the term "downstream" for the broker that the subscriber will connect to, and "upstream" for the broker to which the publisher will connect. Of course, this is a symmetric cluster, so the roles are interchangeable.

      Use JMS consumer to connect a durable consumer to downstream, on a specific topic, with a specific client ID.

      Use a JMS publisher to publish a message to the same topic, on upstream. Verify that the consumer receives the message.

      Disconnect the consumer.

      Use a JMS publisher to publish, say, unlimited 50kB messages to the same topic. The larger the messages, the quicker the problem reproduces.

      After a few hundred kB of messages, downstream fails with an OOM error with no stack backtrace. Thereafter, it is effectively dead.

      Upstream starts logging messages saying it can't connect to downstream. This is expected.

      Restart downstream – this requires a 'kill -9' in my tests.

      Almost immediately, upstream fails with an OOM – again no backtrace. Upstream is now effectively dead, and has to be restarted.

      Note that paging does start on downstream – this can be seen in the log. A heap dump from downstream after failure shows the heap completely full of message-related objects – the exact class depends on the wire protocol in use.

      I can reproduce this problem with AMQP and Core clients. I have observed other problems with durable consumers and AMQP, but this one does not seem to depend on protocol.

       

       

       

       

       

       

      Show
      Set up two AMQ 7,7 brokers in a in a non-replicated, fully-connected mesh. To make the problem easier to reproduce, I set -Xmx for both brokers to 512Mb – just so they run out of heap quickly. Here is the test topology: +-----------+ +------------+ publisher client-->| upstream |--> | downstream | --> temporary subscriber | broker | | broker | +-----------+ +------------+ I'm using the term "downstream" for the broker that the subscriber will connect to, and "upstream" for the broker to which the publisher will connect. Of course, this is a symmetric cluster, so the roles are interchangeable. Use JMS consumer to connect a durable consumer to downstream, on a specific topic, with a specific client ID. Use a JMS publisher to publish a message to the same topic, on upstream. Verify that the consumer receives the message. Disconnect the consumer. Use a JMS publisher to publish, say, unlimited 50kB messages to the same topic. The larger the messages, the quicker the problem reproduces. After a few hundred kB of messages, downstream fails with an OOM error with no stack backtrace. Thereafter, it is effectively dead. Upstream starts logging messages saying it can't connect to downstream. This is expected. Restart downstream – this requires a 'kill -9' in my tests. Almost immediately, upstream fails with an OOM – again no backtrace. Upstream is now effectively dead, and has to be restarted. Note that paging does start on downstream – this can be seen in the log. A heap dump from downstream after failure shows the heap completely full of message-related objects – the exact class depends on the wire protocol in use. I can reproduce this problem with AMQP and Core clients. I have observed other problems with durable consumers and AMQP, but this one does not seem to depend on protocol.            

      In a fully-connected, symmetric mesh, connecting a durable topic subscriber to a particular broker causes messages to be sent to that broker thereafter, even if no client with the same client ID is connected. That in itself may be expected, but this behaviour causes message-related objects to build up in the heap of the broker that originally hosted the client. The build-up continues until either a client connects and consumes the messages, or the broker runs out of heap memory.

       

        1. start.hprof.gz
          13.57 MB
        2. heap_histogram.txt
          226 kB

              gtully@redhat.com Gary Tully
              rhn-support-kboone Kevin Boone
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: