Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-19435

(7.3.z) WFTC registry records may not be removed during the OpenShift scale down processing when transaction recovery commits

XMLWordPrintable

    • +
    • Hide

      There is test in OpenShift testsuite (see step for reproduce at WFLY-12922

      mvn clean test -P72 -Dtest=EjbTxnRemotingScaleDownTest#testTxStatelessServerSecondCommitThrowRmFail -Dconsole-log-level=DEBUG
      

      There is an integration test in EAP QE crashrec testsuite

      git clone git@gitlab.mw.lab.eng.bos.redhat.com:jbossqe-eap/tests-transactions.git
      mvn clean verify -am -pl jbossts -DfailIfNoTests=false -Djbossts.noJTS -Djboss.dist=$JBOSS_HOME -Dtest=TxPropagationJMSCrashRecoveryTestCase#injectRmfailAtServerCommit
      
      Show
      There is test in OpenShift testsuite (see step for reproduce at WFLY-12922 mvn clean test -P72 -Dtest=EjbTxnRemotingScaleDownTest#testTxStatelessServerSecondCommitThrowRmFail -Dconsole-log-level=DEBUG There is an integration test in EAP QE crashrec testsuite git clone git@gitlab.mw.lab.eng.bos.redhat.com:jbossqe-eap/tests-transactions.git mvn clean verify -am -pl jbossts -DfailIfNoTests= false -Djbossts.noJTS -Djboss.dist=$JBOSS_HOME -Dtest=TxPropagationJMSCrashRecoveryTestCase#injectRmfailAtServerCommit

      The OpenShift scale down processing may cause not removing the WFTC registry record.
      This may happen when the EAP runs the EJB remote call with transaction context propagation (the EAP1 calls the EAP2). When the transaction processing fails (e.g. JVM crashes or intermittent network failure happens) after the prepare phase finishes (commit of the resource is expected) and recovery processing tries to commit the transaction afterwards. If in the same time scale down recovery processing is launched from the OpenShift WildFly operator then the commit is processed successfully - all the transaction participants are committed on EAP1 and EAP2 - but EAP1 does not remove WFTC XAResourceRegistry record (a file saved on the file system at EAP1 and which is required for successful recovery processing). Record could not be never removed and it may stuck OpenShift scaledown process as for smooth scaledown there can't be any XAResourceRegistry record. EAP1 could be then stuck during the scaled down.

      This is a follow-up (or kind of clone) of issue WFLY-12922 which talks exactly about this issue. The WFLY-12922 was fixed by change WFTC-77. But later it was found that change causes a regression JBEAP-19408 and the fix WFTC-77 was reverted by WFTC-82.

      There is need to find a way for removing WFTC XAResourceRegistry record immediately during the recovery commit.

            ochaloup@redhat.com Ondrej Chaloupka (Inactive)
            ochaloup@redhat.com Ondrej Chaloupka (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: