Uploaded image for project: 'JBoss Transaction Manager'
  1. JBoss Transaction Manager
  2. JBTM-3782

Race condition in Periodic Recovery using HornetqJournalStore

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • 5.11.4.Final
    • JTA
    • None

    Description

      We observed the following errors in our logs. 

      2023-03-14T07:00:46,556Z [/] WARN  [com.arjuna.ats.arjuna] (Periodic Recovery) ARJUNA012087: Activate of atomic action with id 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede and type /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/
      SubordinateAtomicAction/JCA unexpectedly failed, could not load state.
      2023-03-14T07:00:46,556Z [/] FATAL [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016131: Subordinate transaction was not recovered successfully 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede: java.lang.IllegalArgumentException: ARJUNA016147: Cannot recover imported transaction of UID '0:ffffac18fa20:16e3c88c:64071ef9:371f5ede' of transaction 'TransactionImple < jca-subordinate, BasicAction: 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede status: ActionStatus.CREATED >' as transaction base Xid is null.
              at org.jboss.jts//com.arjuna.ats.internal.jta.transaction.arjunacore.jca.TransactionImporterImple.recoverTransaction(TransactionImporterImple.java:129)
              at org.jboss.jts//com.arjuna.ats.internal.jta.transaction.arjunacore.jca.TransactionImporterImple.recoverTransaction(TransactionImporterImple.java:52)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.SubordinateAtomicActionRecoveryModule.periodicWorkFirstPass(SubordinateAtomicActionRecoveryModule.java:74)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:770)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)

      To make long story short, it happens when Periodic Recovery attempts to recover a transaction which is commited during the loop in Periodic Recovery. We've noticed it only when using HornetqJournalStore (jounral store enabled in TM settings in JBoss EAP7)

      More details below. 
      The broader context of logs contains: 

      2023-03-14T07:00:46,493Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::doCommit() result for action-id (0:ffffac18fa20:16e3c88c:64071ef9:371f5ede) on record id: (0:ffffac18fa20:16e3c88c:64071ef9:371f5ee5) is (TwoPha
      seOutcome.FINISH_OK) node id: (c56dfce3359fe9c3a1eef4)
      
      2023-03-14T07:00:46,493Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::updateState() for action-id 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede
      2023-03-14T07:00:46,493Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) HornetqObjectStore.remove_committed(0:ffffac18fa20:16e3c88c:64071ef9:371f5ede, /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/SubordinateAtomicAc
      tion/JCA)
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main]
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main] result = false
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996 result = true
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main]
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main] result = false
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996
      2023-03-14T07:00:46,494Z [/] INFO  [org.apache.activemq.transport.failover.FailoverTransport] (ActiveMQ Task-1) Successfully connected to ssl://cert1-amq1.iaws.s4a.aero:61617?soTimeout=10000&soWriteTimeout=10000
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996 result = true
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main]
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main] result = false
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996 result = true
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main]
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main] result = false
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996 result = true
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main]
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main] result = false
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996 result = true
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main]
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main] result = false
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996 result = true
      2023-03-14T07:00:46,494Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main]
      2023-03-14T07:00:46,495Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::addChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede adding Thread[default task-21831,5,main] result = false
      2023-03-14T07:00:46,495Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996
      2023-03-14T07:00:46,495Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) BasicAction::removeChildThread () action 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede removing 13996 result = true
      2023-03-14T07:00:46,495Z [/] TRACE [com.arjuna.ats.arjuna] (default task-21831) TransactionReaper::remove ( BasicAction: 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede status: ActionStatus.COMMITTED )
      
      2023-03-14T07:00:46,556Z [/] TRACE [com.arjuna.ats.arjuna] (Periodic Recovery) HornetqObjectStore.read_committed(0:ffffac18fa20:16e3c88c:64071ef9:371f5ede, /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/SubordinateAtomicActio
      n/JCA)
      2023-03-14T07:00:46,556Z [/] WARN  [com.arjuna.ats.arjuna] (Periodic Recovery) ARJUNA012087: Activate of atomic action with id 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede and type /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/
      SubordinateAtomicAction/JCA unexpectedly failed, could not load state.
      2023-03-14T07:00:46,556Z [/] FATAL [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016131: Subordinate transaction was not recovered successfully 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede: java.lang.IllegalArgumentException: ARJUNA016147: Cannot recover imported transaction of UID '0:ffffac18fa20:16e3c88c:64071ef9:371f5ede' of transaction 'TransactionImple < jca-subordinate, BasicAction: 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede status: ActionStatus.CREATED >' as transaction base Xid is null.
              at org.jboss.jts//com.arjuna.ats.internal.jta.transaction.arjunacore.jca.TransactionImporterImple.recoverTransaction(TransactionImporterImple.java:129)
              at org.jboss.jts//com.arjuna.ats.internal.jta.transaction.arjunacore.jca.TransactionImporterImple.recoverTransaction(TransactionImporterImple.java:52)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.SubordinateAtomicActionRecoveryModule.periodicWorkFirstPass(SubordinateAtomicActionRecoveryModule.java:74)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:770)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)

      In my opinion the reason is as follows
      com.arjuna.ats.internal.jta.recovery.arjunacore.SubordinateAtomicActionRecoveryModule#periodicWorkFirstPass
      calls
       

      recoveryStore.allObjUids

      In our case HornetqObjectStoreAdaptor implementation is used. So it fetches all UIDs from the store with type: StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/ SubordinateAtomicAction/JCA 
      (when you dive deeper data is taken from the map com.arjuna.ats.internal.arjuna.objectstore.hornetq.HornetqJournalStore#content) 

      After all UIDs are fetched the while loop starts iterating over all retrieved uids. 

      if (recoveryStore.allObjUids(SubordinateAtomicAction.getType(), states)
              && (states.notempty())) {
          while (true) {
              uid = UidHelper.unpackFrom(states);
              if (uid.notEquals(Uid.nullUid())) {
                  SubordinationManager
                          .getTransactionImporter().recoverTransaction(
                          uid);
              } else {
                  break;
              }
          }
      }

      Then  HornetqObjectStore.remove_committed is executed for one of the UID because transaction ends in another thread (as shown in logs above).
      And of course remove_committed method removes UID from store (HornetqJournalStore:156) (uid is removed from the map com.arjuna.ats.internal.arjuna.objectstore.hornetq.HornetqJournalStore#content )

      RecordInfo record = getContentForType(typeName).remove(uid); 

      As the result 

      com.arjuna.ats.internal.jta.transaction.arjunacore.jca.TransactionImporterImple#recoverTransaction

      is invoked for actId which is removed from the store and 
      IllegalArgumentException is thrown in line 129  of com.arjuna.ats.internal.jta.transaction.arjunacore.jca.TransactionImporterImple
       because the constructor of 
      TransactionImple calls 

      com.arjuna.ats.internal.arjuna.objectstore.hornetq.HornetqJournalStore#read_committed

      which return null for given uid:

      RecordInfo record = getContentForType(typeName).get(uid);
      if(record == null) {
         return null;
      }

      This is also confirmed by another log which can be observed:

       Activate of atomic action with id 0:ffffac18fa20:16e3c88c:64071ef9:371f5ede and type /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/
      SubordinateAtomicAction/JCA unexpectedly failed, could not load state.

      So in another words
      1) Periodic Recovery fails in the loop which calls 'recoverTransaction' for each UIDs (taken from the store/map)
      2) this fails becasue 
      recovered.baseXid() is null
      3) baseXid is null because it is set in 
      activate method
      4) active method fails because 
      read_committed retuns null
      5) read_commtted returns null beacue 
      getContentForType(typeName).get(uid) return null
      5) 
      getContentForType(typeName).get(uid) returns null because the store/map doesn't contain the uid because it was removed after transaction commit in another thread. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            jacek.jarzab Jacek Jarzab
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: