Uploaded image for project: 'JBoss Messaging'
  1. JBoss Messaging
  2. JBMESSAGING-1456

Messages stuck in being-delivered state in cluster

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Blocker
    • Resolution: Done
    • Affects Version/s: 1.4.0.SP3_CP03, 1.4.0.SP3.CP07
    • Fix Version/s: 1.4.0.SP3.CP08, 1.4.4.GA
    • Component/s: None
    • Labels:
      None

      Description

      Messages become "stuck" in being-delivered state when clients use a clustered XA connection factory in a cluster of at least 2 nodes.

      JBoss setup:
      -2 nodes of JBoss EAP 4.3 CP02
      -commented out "ClusterPullConnectionFactory" in messaging-service.xml to prevent message redistribution and eliminate the "message suckers" as the potential culprit
      -MySQL backend using the default mysql-persistence-service.xml (from <JBOSS_HOME>/docs/examples/jms)

      Client setup:
      -both nodes have a client which is a separate process (i.e. not inside JBoss)
      -clients are Spring based
      -one client produces and consumes, the other client just consumes
      -both clients use the ClusteredXAConnectionFactory from the default connection-factories-service.xml
      -both clients publish to and consume from "queue/testDistributedQueue"
      -clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and transacted sessions

      Symptoms of the issue:
      -when running the clients I watch the JMX-Console for the "queue/testDistributedQueue"
      -as the consumers pull messages off the queue I can see the MessageCount and DeliveringCount go to 0 every so often
      -after a period of time (usually a few hours) the MessageCount and DeliveringCount never go back to 0
      -I "kill" the clients and wait for the DeliveringCount to go to 0, but it never does
      -after the clients are killed the ConsumerCount for the queue will drop, but never to 0 when messages are "stuck"
      -a thread dump reveals at least one JBM server session that is apparently stuck (it never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for "queue/testDistributedQueue"
      -a "killall -3 java" doesn't produce anything from the clients so I know their dead
      -nothing is in any DLQ or expiry queue
      -the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the DeliveringCount in the JMX-Console
      -rebooting the node with the stuck messages frees the messages to be consumed (i.e. un-sticks them)

      Other notes:
      -nothing else is happening on either node but running the client and running JBoss
      -this only appears to happen when a clustered connection factory is used. I tested using a normal connection factory and after 24 hours couldn't reproduce a stuck message.

        Gliffy Diagrams

          Attachments

          1. DeliveringCount.png
            DeliveringCount.png
            58 kB
          2. kill3_thread_dump.txt
            61 kB
          3. logs-and-config.zip
            2.56 MB
          4. MessageStucked.png
            MessageStucked.png
            78 kB
          5. RemoveAllMessagesException.png
            RemoveAllMessagesException.png
            81 kB
          6. test-1456-jars.zip
            2.95 MB
          7. thread_dump.txt
            61 kB

            Issue Links

              Activity

                People

                • Assignee:
                  gaohoward Howard Gao
                  Reporter:
                  jbertram Justin Bertram
                • Votes:
                  5 Vote for this issue
                  Watchers:
                  17 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: