Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Component/s: OpenShift
Labels:
- operator

Steps to Reproduce:

Hide

Run eap qe xpaas testsuite
`mvn clean test -P72 -Dtest=StatefulSetCrashRecTest#testTxStatelessServerSecondCommitJvmHalt -Dcheckstyle.skip -Dconsole-log-level=DEBUG`

Show
Run eap qe xpaas testsuite `mvn clean test -P72 -Dtest=StatefulSetCrashRecTest#testTxStatelessServerSecondCommitJvmHalt -Dcheckstyle.skip -Dconsole-log-level=DEBUG`

When server to server calls ejb remote calls where transaction context is propagated then ejb call can be routed to a one pod where the recovery call may directed to a different pod.

Such situation causes a consistency issue.

Let's say the scenario: the first server (let's call it `tx-client`) makes remote ejb call to remote server which is on of the servers joint in cluster named `tx-server-0` and `tx-server-1`. The `tx-client` calls the `tx-server-1`. The processing continues up to the start of the 2PC and the `tx-server-1` crashes (or host goes down, network issue happens...).
`tx-client` understands that the process was not succesful and ask recovery manager to retry and finish.
The recovery manager starts to call the remote server based on data saved in the object store of `tx-client`.
But unfortunately the recovery remote call goes not to the `tx-server-1` but to `tx-server-0`. The `tx-client` gets error code `XAException.XAER_NOTA` (`-4`) and removes data from its object store (`/opt/eap/standalone/data/tx-object-store/`, `/opt/eap/standalone/data/ejb-xa-recovery`) and then never finishes in-doubt transactions at `tx-server-1`.

It's in doubt if it's issue of OpenShift configuration or if it's a trouble of WFTC/ejb/remoting layer in WildFly.

This is tested with WFLY Operator from 2019-09-26 `@90a2b3b`.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

tx-client-0.log
298 kB
2019/09/17 8:54 AM
tx-server-0.log
208 kB
2019/09/17 8:54 AM
tx-server-1.log
295 kB
2019/09/17 8:54 AM

relates to

WFWIP-201 incomplete tx recovery on openshift

Resolved

Assignee:: Ondrej Chaloupka (Inactive)

Reporter:: Ondrej Chaloupka (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2019/09/16 9:24 AM

Updated:: 2022/09/09 7:10 AM

Resolved:: 2019/09/18 8:41 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates