Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-11030

Transaction remained in prepared state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • JMS
    • None

      Scenario

      • Start group A of two servers (node-1 and node-3) Servers are not in cluster.
      • Send messages to queue on node-1.
      • Start another group B of two servers(node-2 and node-4). Servers are not in cluster.
      • Deploy mdb on both servers in A group. This mdb reads messages from local queue and perform insert into oracle 11 gr2 database, and also sends messages to remote queue on group B.
      • Mdb deployed on nodes in group B inserts messages from local queue to oracle 11 gr2 database.
      • Kill server node-1. Restart failed node. Process all messages and verify both mdbs performed database insert

      After node-1 is killed and restarted, there are still transactions in prepare state and they are not cleared in 10 minutes. The transactions remains on servers from group B.

      Logs from test can be found in build - https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/job/mnovak-verifier-artemis2x/12/

      Investigation of issue is in progress. It is not known what component causes it.

      Clebert looked at the logs but there weren't traces from Arjuna. The above build contains also Arjuna traces. Copy paste Clebert's analyses.

      I looked at the output from the clients, and filtered the processing
      for one XA that's never commited..

      I looked for the XID on
      base64:AAAAAAAAAAAAAP_ChBkGmqipH1benkDAAAJdQAAAAMAAAAAAAAAAAAAAAAAAP_ChBkGmqipH1benkDAAAJaDUyODcHAgIA

      and filtered the onMessage that generated it (attached the log as
      processingOneMessage.txt):

      This XID was never commited simply because the TM never did it.
      Probably because of some issue on the JDBC.. but there are no errors,
      no exceptions.. totally clean on artemis side.

      There are no logging for TM or the MDB itself to correlate other
      failures. With the information I have I am certain there were nothing
      from Artemis side that would have caused it.

      In any case.. this is not replicated at all.. and not related to 87.
      This is simply the TM getting confused for some issue on the JDBC.

      As far as I am concerned I have fixed the replicated shutdown case.
      and i will send PRs upstream now.

              ehugonne1@redhat.com Emmanuel Hugonnet
              eduda_jira Erich Duda (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: