Uploaded image for project: 'WildFly WIP'
  1. WildFly WIP
  2. WFWIP-203

Transaction recovery may hit a wrong server when remote side works with multiple pods

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • OpenShift
    • Hide

      Run eap qe xpaas testsuite
      `mvn clean test -P72 -Dtest=StatefulSetCrashRecTest#testTxStatelessServerSecondCommitJvmHalt -Dcheckstyle.skip -Dconsole-log-level=DEBUG`

      Show
      Run eap qe xpaas testsuite `mvn clean test -P72 -Dtest=StatefulSetCrashRecTest#testTxStatelessServerSecondCommitJvmHalt -Dcheckstyle.skip -Dconsole-log-level=DEBUG`

      When server to server calls ejb remote calls where transaction context is propagated then ejb call can be routed to a one pod where the recovery call may directed to a different pod.

      Such situation causes a consistency issue.

      Let's say the scenario: the first server (let's call it `tx-client`) makes remote ejb call to remote server which is on of the servers joint in cluster named `tx-server-0` and `tx-server-1`. The `tx-client` calls the `tx-server-1`. The processing continues up to the start of the 2PC and the `tx-server-1` crashes (or host goes down, network issue happens...).
      `tx-client` understands that the process was not succesful and ask recovery manager to retry and finish.
      The recovery manager starts to call the remote server based on data saved in the object store of `tx-client`.
      But unfortunately the recovery remote call goes not to the `tx-server-1` but to `tx-server-0`. The `tx-client` gets error code `XAException.XAER_NOTA` (`-4`) and removes data from its object store (`/opt/eap/standalone/data/tx-object-store/`, `/opt/eap/standalone/data/ejb-xa-recovery`) and then never finishes in-doubt transactions at `tx-server-1`.

      It's in doubt if it's issue of OpenShift configuration or if it's a trouble of WFTC/ejb/remoting layer in WildFly.

      This is tested with WFLY Operator from 2019-09-26 `@90a2b3b`.

        1. tx-client-0.log
          298 kB
          Martin Simka
        2. tx-server-0.log
          208 kB
          Martin Simka
        3. tx-server-1.log
          295 kB
          Martin Simka

              ochaloup@redhat.com Ondrej Chaloupka (Inactive)
              ochaloup@redhat.com Ondrej Chaloupka (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: