Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-1448

When killing and stopping slave in HA (replication) scenario, slave fails to start and becomes unaccessible

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Blocker
    • AMQ 7.2.0.GA
    • AMQ 7.1.1.GA
    • high-availability
    • None
    • Documentation (Ref Guide, User Guide, etc.), Release Notes, Compatibility/Configuration, User Experience
    • Hide
      1. # Create single HA replication pair
      2. Start repeatedly killing slave broker in following fashion:
        1. kill -9 slave (start, wait for it to be accessible)
        2. artemis-service stop (start, wait for it to be accessible)
      3. After some time, this slave broker is "unable to respond" to accessibility check which is done via Jolokia - as whole messaging server is somehow corrupted.

      Once this slave server is killed again and restarted, it works fine until this failure occurs again.
      Reproducibility is about 20% with a test which does scenario above.

      Show
      # Create single HA replication pair Start repeatedly killing slave broker in following fashion: kill -9 slave (start, wait for it to be accessible) artemis-service stop (start, wait for it to be accessible) After some time, this slave broker is "unable to respond" to accessibility check which is done via Jolokia - as whole messaging server is somehow corrupted. Once this slave server is killed again and restarted, it works fine until this failure occurs again. Reproducibility is about 20% with a test which does scenario above.

    Description

      Slave becomes un-accessible, all Mbeans have, despite fact, that hawtio is up.

      ERROR: java.lang.IllegalStateException: Broker is not started. It can not be managed yet (class javax.management.RuntimeMBeanException)
      

      Slave broker log file

      2018-04-08 10:58:00,396 INFO  [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.5.0.amq-710002-redhat-1 [null] started, waiting live to fail before it gets active
      2018-04-08 10:58:03,744 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: java.lang.NullPointerException
              at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:223) [artemis-server-2.5.0.amq-710002-redhat-1.jar:2.5.0.amq-710002-redhat-1]
              at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3106) [artemis-server-2.5.0.amq-710002-redhat-1.jar:2.5.0.amq-710002-redhat-1]
      
      2018-04-08 10:58:04,024 INFO  [io.hawt.web.AuthenticationFilter] Destroying hawtio authentication filter
      2018-04-08 10:58:04,158 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
      2018-04-08 10:58:04,161 INFO  [io.hawt.branding.plugin.PluginContextListener] Destroyed hawtio-redhat-fuse-branding plugin
      2018-04-08 10:58:04,171 INFO  [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.5.0.amq-710002-redhat-1 [ca47a2a0-3aa4-11e8-a733-52540045bbb8] stopped, uptime 8.397 seconds
      2018-04-08 10:58:07,786 INFO  [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
      2018-04-08 10:58:07,810 INFO  [org.apache.activemq.artemis.core.server] AMQ221000: backup Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
      2018-04-08 10:58:07,840 INFO  [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /home/jamq/ha-replication/data/journal/oldreplica.5
      2018-04-08 10:58:07,841 INFO  [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /home/jamq/ha-replication/data/journal to /home/jamq/ha-replication/data/journal/oldreplica.7
      2018-04-08 10:58:07,951 INFO  [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
      2018-04-08 10:58:08,087 INFO  [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 1,073,741,824
      2018-04-08 10:58:08,174 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
      2018-04-08 10:58:08,175 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
      2018-04-08 10:58:08,175 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
      2018-04-08 10:58:08,176 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
      2018-04-08 10:58:08,176 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
      2018-04-08 10:58:08,176 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
      2018-04-08 10:58:08,792 INFO  [io.hawt.branding.plugin.PluginContextListener] Initialized hawtio-redhat-fuse-branding plugin
      2018-04-08 10:58:08,874 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
      2018-04-08 10:58:10,075 INFO  [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
      2018-04-08 10:58:10,077 INFO  [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.4.0.redhat-630329 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
      2018-04-08 10:58:10,078 INFO  [io.hawt.jmx.UploadManager] Using file upload directory: /home/jamq/ha-replication/tmp/uploads
      2018-04-08 10:58:10,114 INFO  [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
      2018-04-08 10:58:10,194 INFO  [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/home/jamq/ha-replication/etc/jolokia-access.xml]
      2018-04-08 10:58:10,248 INFO  [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
      2018-04-08 10:58:10,484 INFO  [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.5.0.amq-710002-redhat-1 [null] started, waiting live to fail before it gets active
      2018-04-08 10:58:10,544 INFO  [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, 10.37.145.205, dhcp-145-205.lab.eng.brq.redhat.com]
      2018-04-08 10:58:11,017 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
      2018-04-08 10:58:11,017 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
      2018-04-08 10:58:11,018 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
      2018-04-08 10:58:12,084 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:13,792 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:15,842 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:17,831 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:19,876 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:21,889 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:23,924 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:25,951 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:27,986 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      2018-04-08 10:58:30,024 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
      

      Attachments

        1. master_debug.log
          1.05 MB
        2. slave_debug.log
          382 kB

        Issue Links

          Activity

            People

              ataylor@redhat.com Andy Taylor
              mtoth@redhat.com Michal Toth
              Sean Davey Sean Davey (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: