-
Bug
-
Resolution: Done
-
Major
-
5.9.0.Final
-
None
If PeriodicMonitor has active unfinished request to start recovery scan
INFO [com.arjuna.ats.arjuna] (Server.Connection:127.0.0.1:36326) ARJUNA012340: RecoveryManager scan scheduled to begin.
and meanwhile the RecoveryManager is suspended then the WorkerService is waiting in deadlock forever and JVM can't be cleanly stopped.
The whole stopping EAP server stacktrace could be seen at https://issues.jboss.org/browse/CLOUD-2767 the related Narayana related threads are[10].
It seems to me as the recovery was put to suspended[1] the processing thread waits for the state change[2]. The state change to set to terminated could be done by periodic recovery shutdown[3] but it's stuck on waiting for listners to be stopped[4]. Stopping the listeners could be done by the thread waiting for the state change[5].
[1] https://github.com/wildfly/wildfly/blob/master/transactions/src/main/java/org/jboss/as/txn/suspend/RecoverySuspendController.java#L41
[2] https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java#L348
[3] https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java#L177
[4] https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java#L169
[5] https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java#L387
[10]
"Transaction Reaper Worker 0" #121 daemon prio=5 os_prio=0 tid=0x000000000197a000 nid=0x2b6 in Object.wait() [0x00007f3661d2b000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000b289c310> (a java.util.LinkedList) at java.lang.Object.wait(Object.java:502) at com.arjuna.ats.arjuna.coordinator.TransactionReaper.waitForCancellations(TransactionReaper.java:328) - locked <0x00000000b289c310> (a java.util.LinkedList) at com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:65) Locked ownable synchronizers: - None "Transaction Reaper" #120 daemon prio=5 os_prio=0 tid=0x0000000001b4b800 nid=0x2b5 in Object.wait() [0x00007f3661e2c000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000b289c330> (a com.arjuna.ats.arjuna.coordinator.TransactionReaper) at com.arjuna.ats.internal.arjuna.coordinator.ReaperThread.run(ReaperThread.java:90) - locked <0x00000000b289c330> (a com.arjuna.ats.arjuna.coordinator.TransactionReaper) Locked ownable synchronizers: - None "Server.Connection:10.130.0.162:46200" #335 daemon prio=5 os_prio=0 tid=0x0000000003853800 nid=0x449 in Object.wait() [0x00007f365ec2c000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at com.arjuna.ats.internal.arjuna.recovery.WorkerService.doWork(WorkerService.java:95) - locked <0x00000000b2916608> (a com.arjuna.ats.internal.arjuna.recovery.WorkerService) at com.arjuna.ats.internal.arjuna.recovery.Connection.run(Connection.java:88) Locked ownable synchronizers: - None "Server.Connection:10.130.0.162:46090" #225 daemon prio=5 os_prio=0 tid=0x0000000003a4c800 nid=0x3ca in Object.wait() [0x00007f365c2ef000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at com.arjuna.ats.internal.arjuna.recovery.WorkerService.doWork(WorkerService.java:95) - locked <0x00000000b2916608> (a com.arjuna.ats.internal.arjuna.recovery.WorkerService) at com.arjuna.ats.internal.arjuna.recovery.Connection.run(Connection.java:88) Locked ownable synchronizers: - None "MSC service thread 1-5" #19 prio=5 os_prio=0 tid=0x0000000001903000 nid=0x251 in Object.wait() [0x00007f366aebe000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at com.arjuna.ats.internal.arjuna.recovery.Listener.stopListener(Listener.java:222) - locked <0x00000000b29161d0> (a com.arjuna.ats.internal.arjuna.recovery.Listener) at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.shutdown(PeriodicRecovery.java:169) at com.arjuna.ats.internal.arjuna.recovery.RecoveryManagerImple.stop(RecoveryManagerImple.java:179) at com.arjuna.ats.arjuna.recovery.RecoveryManager.terminate(RecoveryManager.java:210) - locked <0x00000000b0badbb8> (a com.arjuna.ats.arjuna.recovery.RecoveryManager) at com.arjuna.ats.arjuna.recovery.RecoveryManager.terminate(RecoveryManager.java:194) at com.arjuna.ats.jbossatx.jta.RecoveryManagerService.stop(RecoveryManagerService.java:73) at org.jboss.as.txn.service.ArjunaRecoveryManagerService.stop(ArjunaRecoveryManagerService.java:167) - locked <0x00000000b313ddb0> (a org.jboss.as.txn.service.ArjunaRecoveryManagerService) at org.jboss.msc.service.ServiceControllerImpl$StopTask.stopService(ServiceControllerImpl.java:2150) at org.jboss.msc.service.ServiceControllerImpl$StopTask.run(ServiceControllerImpl.java:2101) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x00000000b351b500> (a java.util.concurrent.ThreadPoolExecutor$Worker) "Periodic Recovery" #116 prio=5 os_prio=0 tid=0x0000000002357000 nid=0x2b4 in Object.wait() [0x00007f3661f2d000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doSuspendedWait(PeriodicRecovery.java:709) at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:346) - locked <0x00000000b289c898> (a java.lang.Object) Locked ownable synchronizers: - None
NOTE: there is interesting point there are two WorkerService instances in the stack trace. It's causes two sequential requests hits the RecoveryManager while the periodic recovery was in progress before the server was shutdown. The code invoking the two calls and then immediately running the shutdown was https://github.com/jboss-openshift/cct_module/blob/7f891cca882f86f983b298b288fe0b5c12bc51bf/os-eap-migration/added/launch/openshift-migrate-common.sh#L50
- causes
-
CLOUD-2767 [EAP][XA] EAP in migration pod hangs during shutdown on GlusterFS
- Verified
- is cloned by
-
JBEAP-16195 (7.2.z) JBTM-3047 - Suspending recovery manager causes deadlock when acive RecoveryMonitor scan request exists
- Closed