Uploaded image for project: 'Red Hat Fuse'
  1. Red Hat Fuse
  2. ENTESB-1801

ActiveMQ stays disconnected from the group after zookeeper connection SUSPENDED / RECONNECED

XMLWordPrintable

    • % %
    • Hide
      1. Create two fuse (fuse, fuse2) gears, join them into fabric
      2. Add mq-amq profile to fuse2, notice the default group in registry/clusters/fusemq/default/00000.../
      3. ssh to the fuse2 gear, watch the karaf logs
      4. ssh to the fuse gear, use jps to get the java process PID
      5. on the fuse gear, kill -STOP <pid>
      6. wait till the fuse2 gear karaf logs contains ConnectionStateManager State change: SUSPENDED
      7. on the fuse gear, kill -CONT <pid>
      8. notice the connection is back in RECONNECTED, but the activemq stays disconnected from the default group (see the empty "services" property in registry/clusters/fusemq/default/00000.../ )
      Show
      Create two fuse (fuse, fuse2) gears, join them into fabric Add mq-amq profile to fuse2, notice the default group in registry/clusters/fusemq/default/00000.../ ssh to the fuse2 gear, watch the karaf logs ssh to the fuse gear, use jps to get the java process PID on the fuse gear, kill -STOP <pid> wait till the fuse2 gear karaf logs contains ConnectionStateManager State change: SUSPENDED on the fuse gear, kill -CONT <pid> notice the connection is back in RECONNECTED, but the activemq stays disconnected from the default group (see the empty "services" property in registry/clusters/fusemq/default/00000.../ )

      After leaving a simple fuse setup (1 mq-amq gear, 1 cxf service connecting to the broker via "discovery:(fabric:default)" ) alive for a few hours, it seems to eventually get into a broken state where the cxf servie fails to connect to the mq:

      (org.apache.activemq.transport.failover.FailoverTransport Failed to connect to [] after: 60 attempt(s) continuing to retry. )

      I can see the following suspicious events in the mq container log:

          2014-08-11 11:03:51,479 | WARN  | pool-88-thread-1 | GitDataStore                     | abric8.git.internal.GitDataStore 1045 | 85 - io.fabric8.fabric-git - 1.0.0.redhat-387 | Fetch failed because of: http://fuse-bobland.apps.sample.com:80/git/fabric/: cannot open git-upload-pack
          2014-08-11 14:57:47,375 | INFO  | ad-1-EventThread | ConnectionStateManager           | ork.state.ConnectionStateManager  161 | 52 - io.fabric8.fabric-zookeeper - 1.0.0.redhat-387 | State change: SUSPENDED
          2014-08-11 14:57:49,300 | INFO  | ad-1-EventThread | ConnectionStateManager           | ork.state.ConnectionStateManager  161 | 52 - io.fabric8.fabric-zookeeper - 1.0.0.redhat-387 | State change: RECONNECTED
          2014-08-11 14:57:58,655 | INFO  | ZooKeeperGroup-0 | ActiveMQServiceFactory           | q.fabric.ActiveMQServiceFactory$   65 | 295 - org.jboss.amq.mq-fabric - 6.1.0.redhat-387 | Disconnected from the group
          2014-08-11 14:58:08,672 | WARN  | pool-88-thread-1 | GitDataStore                     | abric8.git.internal.GitDataStore 1045 | 85 - io.fabric8.fabric-git - 1.0.0.redhat-387 | Fetch failed because of: http://fuse-bobland.apps.sample.com:80/git/fabric/: cannot open git-upload-pack
      

      Looking at the registry, the "services" property of the "default" mq group was empty.

      This suggests that the zookeeper connection was temporarily suspended, probably due to an intermittent connection error, which caused the broker to disconnect itself from the group. I believe the broker should be able to recover from such an event and connect itself back again.

      See also the "steps to reproduce".

              dejanbosanac Dejan Bosanac
              maschmid@redhat.com Marek Schmidt
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: