Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-8956

Iterator Leak with Core Producers and Network Interruptions

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • AMQ 7.11.6.GA
    • None
    • None
    • False
    • None
    • False
    • Hide

      Disable redistribution by setting redistribution-delay to -1 for all addresses.

      Show
      Disable redistribution by setting redistribution-delay to -1 for all addresses.
    • Hide

      1. Configure a broker (I used a single share nothing HA pair for this)

      <?xml version='1.0'?>
      <!--
      Licensed to the Apache Software Foundation (ASF) under one
      or more contributor license agreements.  See the NOTICE file
      distributed with this work for additional information
      regarding copyright ownership.  The ASF licenses this file
      to you under the Apache License, Version 2.0 (the
      "License"); you may not use this file except in compliance
      with the License.  You may obtain a copy of the License at
      
        http://www.apache.org/licenses/LICENSE-2.0
      
      Unless required by applicable law or agreed to in writing,
      software distributed under the License is distributed on an
      "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
      KIND, either express or implied.  See the License for the
      specific language governing permissions and limitations
      under the License.
      -->
      
      <configuration xmlns="urn:activemq"
                     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                     xmlns:xi="http://www.w3.org/2001/XInclude"
                     xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
      
         <core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation="urn:activemq:core ">
      
            <name>0.0.0.0</name>
            <persistence-enabled>true</persistence-enabled>
            <max-redelivery-records>1</max-redelivery-records>
            <journal-type>ASYNCIO</journal-type>
            <paging-directory>data/paging</paging-directory>
            <bindings-directory>data/bindings</bindings-directory>
            <journal-directory>data/journal</journal-directory>
            <large-messages-directory>data/large-messages</large-messages-directory>
            <journal-datasync>true</journal-datasync>
            <journal-min-files>2</journal-min-files>
            <journal-pool-files>10</journal-pool-files>
            <journal-device-block-size>4096</journal-device-block-size>
            <journal-file-size>10M</journal-file-size>
            <journal-buffer-timeout>24000</journal-buffer-timeout>
            <journal-max-io>4096</journal-max-io>
            <disk-scan-period>5000</disk-scan-period>
            <max-disk-usage>90</max-disk-usage>
            <critical-analyzer>true</critical-analyzer>
            <critical-analyzer-timeout>120000</critical-analyzer-timeout>
            <critical-analyzer-check-period>60000</critical-analyzer-check-period>
            <critical-analyzer-policy>HALT</critical-analyzer-policy>
            <page-sync-timeout>1844000</page-sync-timeout>
            <global-max-messages>-1</global-max-messages>
      
            <acceptors>
      
               <!-- Acceptor for every supported protocol -->
               <acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;supportAdvisory=false;suppressInternalManagementObjects=false</acceptor>
      
               <!-- AMQP Acceptor.  Listens on default AMQP port for AMQP traffic.-->
               <acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>
      
               <!-- STOMP Acceptor. -->
               <acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
      
               <!-- HornetQ Compatibility Acceptor.  Enables HornetQ Core and STOMP for legacy HornetQ clients. -->
               <acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
      
               <!-- MQTT Acceptor -->
               <acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
      
            </acceptors>
      
            <connectors>
               <connector name="broker1a-connector">tcp://node1.test.redhat.com:61616</connector>
               <connector name="broker1b-connector">tcp://node2.test.redhat.com:61616</connector>
            </connectors>
      
            <ha-policy>
               <replication>
                 <master>
                    <group-name>cluster1</group-name>
                    <check-for-live-server>true</check-for-live-server>
                 </master>
               </replication>
            </ha-policy>
      
            <cluster-user>admin</cluster-user>
            <cluster-password>admin</cluster-password>
            <cluster-connections>
              <cluster-connection name="static-cluster">
                <connector-ref>broker1a-connector</connector-ref>
                <message-load-balancing>ON_DEMAND</message-load-balancing>
                <static-connectors>
                    <connector-ref>broker1b-connector</connector-ref>
                </static-connectors>
              </cluster-connection>
            </cluster-connections>	
      
            <security-settings>
               <security-setting match="#">
                  <permission type="createNonDurableQueue" roles="amq"/>
                  <permission type="deleteNonDurableQueue" roles="amq"/>
                  <permission type="createDurableQueue" roles="amq"/>
                  <permission type="deleteDurableQueue" roles="amq"/>
                  <permission type="createAddress" roles="amq"/>
                  <permission type="deleteAddress" roles="amq"/>
                  <permission type="consume" roles="amq"/>
                  <permission type="browse" roles="amq"/>
                  <permission type="send" roles="amq"/>
                  <!-- we need this otherwise ./artemis data imp wouldn't work -->
                  <permission type="manage" roles="amq"/>
               </security-setting>
            </security-settings>
      
            <address-settings>
               <address-setting match="activemq.management#">
                  <dead-letter-address>DLQ</dead-letter-address>
                  <expiry-address>ExpiryQueue</expiry-address>
                  <redelivery-delay>0</redelivery-delay>
                  <max-size-bytes>-1</max-size-bytes>
                  <message-counter-history-day-limit>10</message-counter-history-day-limit>
                  <address-full-policy>PAGE</address-full-policy>
                  <auto-create-queues>true</auto-create-queues>
                  <auto-create-addresses>true</auto-create-addresses>
               </address-setting>
               <!--default for catch all-->
               <address-setting match="#">
                  <dead-letter-address>DLQ</dead-letter-address>
                  <expiry-address>ExpiryQueue</expiry-address>
                  <redelivery-delay>0</redelivery-delay>
                  <max-size-bytes>-1</max-size-bytes>
                  <max-size-messages>-1</max-size-messages>
                  <page-size-bytes>10M</page-size-bytes>
                  <max-read-page-messages>-1</max-read-page-messages>
                  <max-read-page-bytes>20M</max-read-page-bytes>
                  <message-counter-history-day-limit>10</message-counter-history-day-limit>
                  <address-full-policy>PAGE</address-full-policy>
                  <auto-create-queues>true</auto-create-queues>
                  <auto-create-addresses>true</auto-create-addresses>
                  <auto-delete-queues>false</auto-delete-queues>
                  <auto-delete-addresses>false</auto-delete-addresses>
                  <auto-delete-created-queues>false</auto-delete-created-queues>
               </address-setting>
            </address-settings>
      
            <addresses>
               <address name="DLQ">
                  <anycast>
                     <queue name="DLQ" />
                  </anycast>
               </address>
               <address name="ExpiryQueue">
                  <anycast>
                     <queue name="ExpiryQueue" />
                  </anycast>
               </address>
            </addresses>
      
         </core>
      </configuration>
      

      2. start several remote consumers (I started 20 consumers, each on a separate anycast address, using the Java core jms client)

      3. Start remote producers for each of the addresses used by (1). I started 10 producers per address for a total of 200, each producing 1 message per 200ms, also using core and a failover URI (in case this is relevant) and a callTimeout of 10 seconds:

      (tcp://node1.test.redhat.com:61616,tcp://node2.test.redhat.com:61616)?ha=true&callTimeout=10000

      4. Periodically interrupt the network interface to simulate a vm pause / vm move. I used a script like this one:

      #!/bin/sh
      
      # Number of times to collect data.
      LOOP=2000
      
      for ((i=1; i <= $LOOP; i++))
      do
         echo "Sleeping..."
         ifdown eth0;
         sleep 20
         ifup eth0;
         echo "Waking..."
         sleep 120
      done
      

      Let this run for some time, periodically gathering heap dumps of the broker process. Note the steady increase in iterators over time. These only seem to get cleaned up after the consumer exits.

      Show
      1. Configure a broker (I used a single share nothing HA pair for this) <?xml version='1.0'?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xi="http://www.w3.org/2001/XInclude" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd"> <core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq:core "> <name>0.0.0.0</name> <persistence-enabled>true</persistence-enabled> <max-redelivery-records>1</max-redelivery-records> <journal-type>ASYNCIO</journal-type> <paging-directory>data/paging</paging-directory> <bindings-directory>data/bindings</bindings-directory> <journal-directory>data/journal</journal-directory> <large-messages-directory>data/large-messages</large-messages-directory> <journal-datasync>true</journal-datasync> <journal-min-files>2</journal-min-files> <journal-pool-files>10</journal-pool-files> <journal-device-block-size>4096</journal-device-block-size> <journal-file-size>10M</journal-file-size> <journal-buffer-timeout>24000</journal-buffer-timeout> <journal-max-io>4096</journal-max-io> <disk-scan-period>5000</disk-scan-period> <max-disk-usage>90</max-disk-usage> <critical-analyzer>true</critical-analyzer> <critical-analyzer-timeout>120000</critical-analyzer-timeout> <critical-analyzer-check-period>60000</critical-analyzer-check-period> <critical-analyzer-policy>HALT</critical-analyzer-policy> <page-sync-timeout>1844000</page-sync-timeout> <global-max-messages>-1</global-max-messages> <acceptors> <!-- Acceptor for every supported protocol --> <acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;supportAdvisory=false;suppressInternalManagementObjects=false</acceptor> <!-- AMQP Acceptor. Listens on default AMQP port for AMQP traffic.--> <acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor> <!-- STOMP Acceptor. --> <acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor> <!-- HornetQ Compatibility Acceptor. Enables HornetQ Core and STOMP for legacy HornetQ clients. --> <acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor> <!-- MQTT Acceptor --> <acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor> </acceptors> <connectors> <connector name="broker1a-connector">tcp://node1.test.redhat.com:61616</connector> <connector name="broker1b-connector">tcp://node2.test.redhat.com:61616</connector> </connectors> <ha-policy> <replication> <master> <group-name>cluster1</group-name> <check-for-live-server>true</check-for-live-server> </master> </replication> </ha-policy> <cluster-user>admin</cluster-user> <cluster-password>admin</cluster-password> <cluster-connections> <cluster-connection name="static-cluster"> <connector-ref>broker1a-connector</connector-ref> <message-load-balancing>ON_DEMAND</message-load-balancing> <static-connectors> <connector-ref>broker1b-connector</connector-ref> </static-connectors> </cluster-connection> </cluster-connections> <security-settings> <security-setting match="#"> <permission type="createNonDurableQueue" roles="amq"/> <permission type="deleteNonDurableQueue" roles="amq"/> <permission type="createDurableQueue" roles="amq"/> <permission type="deleteDurableQueue" roles="amq"/> <permission type="createAddress" roles="amq"/> <permission type="deleteAddress" roles="amq"/> <permission type="consume" roles="amq"/> <permission type="browse" roles="amq"/> <permission type="send" roles="amq"/> <!-- we need this otherwise ./artemis data imp wouldn't work --> <permission type="manage" roles="amq"/> </security-setting> </security-settings> <address-settings> <address-setting match="activemq.management#"> <dead-letter-address>DLQ</dead-letter-address> <expiry-address>ExpiryQueue</expiry-address> <redelivery-delay>0</redelivery-delay> <max-size-bytes>-1</max-size-bytes> <message-counter-history-day-limit>10</message-counter-history-day-limit> <address-full-policy>PAGE</address-full-policy> <auto-create-queues>true</auto-create-queues> <auto-create-addresses>true</auto-create-addresses> </address-setting> <!--default for catch all--> <address-setting match="#"> <dead-letter-address>DLQ</dead-letter-address> <expiry-address>ExpiryQueue</expiry-address> <redelivery-delay>0</redelivery-delay> <max-size-bytes>-1</max-size-bytes> <max-size-messages>-1</max-size-messages> <page-size-bytes>10M</page-size-bytes> <max-read-page-messages>-1</max-read-page-messages> <max-read-page-bytes>20M</max-read-page-bytes> <message-counter-history-day-limit>10</message-counter-history-day-limit> <address-full-policy>PAGE</address-full-policy> <auto-create-queues>true</auto-create-queues> <auto-create-addresses>true</auto-create-addresses> <auto-delete-queues>false</auto-delete-queues> <auto-delete-addresses>false</auto-delete-addresses> <auto-delete-created-queues>false</auto-delete-created-queues> </address-setting> </address-settings> <addresses> <address name="DLQ"> <anycast> <queue name="DLQ" /> </anycast> </address> <address name="ExpiryQueue"> <anycast> <queue name="ExpiryQueue" /> </anycast> </address> </addresses> </core> </configuration> 2. start several remote consumers (I started 20 consumers, each on a separate anycast address, using the Java core jms client) 3. Start remote producers for each of the addresses used by (1). I started 10 producers per address for a total of 200, each producing 1 message per 200ms, also using core and a failover URI (in case this is relevant) and a callTimeout of 10 seconds: (tcp://node1.test.redhat.com:61616,tcp://node2.test.redhat.com:61616)?ha=true&callTimeout=10000 4. Periodically interrupt the network interface to simulate a vm pause / vm move. I used a script like this one: #!/bin/sh # Number of times to collect data. LOOP=2000 for ((i=1; i <= $LOOP; i++)) do echo "Sleeping..." ifdown eth0; sleep 20 ifup eth0; echo "Waking..." sleep 120 done Let this run for some time, periodically gathering heap dumps of the broker process. Note the steady increase in iterators over time. These only seem to get cleaned up after the consumer exits.
    • Critical
    • Customer Escalated, Customer Facing, Customer Reported

      In a single node cluster (I did have a cluster connection configured with a replicated backup, but this did not seem to matter as I saw similar behavior with the cluster connections commented out), I see a slow but steady increase in org.apache.activemq.artemis.utils.collections.LinkedListImpl$Iterator objects on the heap when simulating network interruptions by pausing the network while remote core producers and consumers were running. Taking periodic heap dumps, I see the iterator count increasing, though the queues used for the test were not growing in depth.

              csuconic@redhat.com Clebert Suconic
              rhn-support-dhawkins Duane Hawkins
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: