Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Critical
Fix Version/s: None
Affects Version/s: 9.1.0.Final
Component/s: Transactions
Labels:
- consistency

Sprint:
DataGrid Sprint #40
Git Pull Request:
https://github.com/infinispan/infinispan/pull/6364

In scenario where the originator stays in minor partition (in our test suite, the originator isolated tests), it is possible to a transaction to be committed and rolled back in the majority partition.

In Pessimitic Locking, the transaction is committed in one-phase using the PrepareCommand. If the partition happens when the originator sends the PrepareCommand, the nodes in the majority partition may or may not receive it. We can have the case where some nodes receive the PrepareCommand and applied and other don't receive it.

When the topology is updated in the majority partition, the TransactionTable rollbacks all transaction in which the originator isn't present. So, in the nodes where the PrepareCommand isn't received, the transaction is rolled back.

The originator in the minory partition detects the partition and marks the transaction partially completed. When the merge occurs, it tries to commit the transaction again. In the nodes where the transaction is rolled back, the transaction is marked as completed and when the PrepareCommand is received, it throws an IllegalStateException (TransactionTable:386, getOrCreateRemoteTransaction()). In this case, the transaction isn't removed from the PartitionHandlingManager and our test suite fails with "there are pending tx".

Other theoretically scenario is the PrepareCommand to be executed when no locks are acquired.

The same issue can happen with Optimistic Locking for the CommitCommand.

The problem is the transaction table can't identify is the node left gracefully or not. A solution would be to have an "expected members" list, ideally separated from the CacheTopology to avoid sending it every time. Also, it would need some sysadmin tools for the case where the node crashes and it won't be back online for a while (or for some reason, it doesn't need to be back online).
A sysadmin could remove the node from this list (CacheTopology is updated and there is no need to increase it) and decide what to do with the pending transactions (or an automatic mechanism to auto-commit/rollback the transaction).

is cloned by

JDG-3935 Transaction inconsistency during network partitions

Reopened

is related to

ISPN-8305 PessimisticTxPartitionAndMergeDuringPrepareTest.testPrimaryOwnerIsolatedPartitionWithDiscard[DIST_SYNC] randomly failing

Closed

relates to

ISPN-6456 OptimisticTxPartitionAndMergeDuringPrepareTest.testOriginatorIsolatedPartition fails randomly

Closed

ISPN-8453 Commit should fail if cache is in degraded mode

Closed

ISPN-9291 BasePartitionHandlingTest.Partition.installMergeView() doesn't compute the merge digest

Closed

Assignee:: Pedro Ruivo

Reporter:: Pedro Ruivo

Archiver:: Amol Dongare

Created:: 2017/08/28 5:57 AM

Updated:: 2024/11/27 2:35 PM

Resolved:: 2023/05/25 1:40 PM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty