-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
-
Workaround Exists
There is a simple recovery test in the test suite that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. i==3 or i==10, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
EDIT: When I use 30 iterations and a conditional breakpoint on i -> test(EXECUTOR_URL + RECOVERY_TEST) set to i == 28 the execution loops forever even with debugger attached.
In this JIRA's example, to reproduce, simply:
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST)); + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
... 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database() 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database() 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database() 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database() 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database() 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database() 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database() 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database() ...
I found this Atomikos related reference online.
Tomcat log keeps showing again and again:
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass. 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass. 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
Logs and thread dumps
- postgresql_trace_log.zip – PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
- test.war – test app used; the TS builds it for you though and stores it in tomcat/tomcat-deployment/DEFAULT_Basic-app_test.war
- thread_dumps_in_sequence_over_time.zip – thread dumps taken over time with minutes in between snaps.
- tomcat_logs.zip – Tomcat logs
- good_threaddumps.zip – a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));.
- This is what good, i.e. 3 iterations long DB trace looks like: good db trace.
- This is what good, i.e. 3 iterations long Tomcat catalina log looks like: catalina log
Might interest also ochaloup@redhat.com. WDYT guys? What should I try out?
- relates to
-
JBTM-3016 NoSuchFieldException throws when setting the pool properties
- Closed