Uploaded image for project: 'JBoss Transaction Manager'
  1. JBoss Transaction Manager
  2. JBTM-3015

Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 5.8.2.Final
    • Component/s: Tomcat
    • Labels:
      None
    • Steps to Reproduce:
      Hide
      -        IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
      +        IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
      

      How to configure Tomcat.

      export CATALINA_HOME=/tmp/apache-tomcat-9.0.7
      mvn clean integration-test -Parq-tomcat -Dtest.db.type=container;
      
      Show
      - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST)); + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST)); How to configure Tomcat . export CATALINA_HOME=/tmp/apache-tomcat-9.0.7 mvn clean integration-test -Parq-tomcat -Dtest.db.type=container;
    • Workaround:
      Workaround Exists

      Description

      There is a simple recovery test in the test suite that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. i==3 or i==10, the error was not shown, the TS run and completed, so it is clearly a time related race condition.

      EDIT: When I use 30 iterations and a conditional breakpoint on i -> test(EXECUTOR_URL + RECOVERY_TEST) set to i == 28 the execution loops forever even with debugger attached.

      In this JIRA's example, to reproduce, simply:

      -        IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
      +        IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
      

      When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):

      ...
      2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
      2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
      2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
      2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
      2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
      2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
      2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
      2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
      ...
      

      I found this Atomikos related reference online.

      Tomcat log keeps showing again and again:

      24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
      24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
      24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
      24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
      24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
      24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal  
      24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
      
      24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
      24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
      24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
      24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
      24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
      24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
      
      24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal  
      
      
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal  
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
      
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
      
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal  
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
      24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
      

      Logs and thread dumps

      • postgresql_trace_log.zip – PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
      • test.war – test app used; the TS builds it for you though and stores it in tomcat/tomcat-deployment/DEFAULT_Basic-app_test.war
      • thread_dumps_in_sequence_over_time.zip – thread dumps taken over time with minutes in between snaps.
      • tomcat_logs.zip – Tomcat logs
      • good_threaddumps.zip – a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));.
      • This is what good, i.e. 3 iterations long DB trace looks like: good db trace.
      • This is what good, i.e. 3 iterations long Tomcat catalina log looks like: catalina log

      Might interest also Ondrej Chaloupka. WDYT guys? What should I try out?

        Gliffy Diagrams

          Attachments

          1. good_threaddumps.zip
            68 kB
          2. postgresql_trace_log.zip
            6 kB
          3. test.war
            5.58 MB
          4. thread_dumps_in_sequence_over_time.zip
            33 kB
          5. tomcat_logs.zip
            156 kB

            Issue Links

              Activity

                People

                • Assignee:
                  zhfeng Zheng Feng
                  Reporter:
                  mbabacek Michal Karm
                  Tester:
                  Michal Karm
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: