Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Blocker
Fix Version/s: None
Affects Version/s: 4.2.0.Final
Component/s: Core, Transactions
Labels:
None

Estimated Difficulty:
High

After much testing and analysis (and reopening and fixing ~~ISPN-865~~), the final issue here is that certain transactions throw an IllegalStateException in commit() - and this cascades into a series of problems.

See http://lists.jboss.org/pipermail/infinispan-dev/2011-January/007320.html for a more detailed discussion.

Original request:

There are two scenarios we're seeing on rehashing, both of which are critical.

1. On a node leaving a running cluster, we're seeing an inordinate amount of timeout errors, such as the one below. The end result of this is that the cluster ends up losing data.

org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:417)
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
at org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:341)
at org.infinispan.interceptors.DistributionInterceptor.realRemoteGet(DistributionInterceptor.java:143)
at org.infinispan.interceptors.DistributionInterceptor.remoteGetAndStoreInL1(DistributionInterceptor.java:131)
06:07:44,097 WARN [GMS] cms-node-20192: merge leader did not get data from all partition coordinators [cms-node-20192, mydht1-18445], merge is cancelled at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:59)

2. Joining a node into a running cluster causes transactional failures on the other nodes. Most of the time, depending on the load, a node can take upwards of 8 minutes to join.

I've attached a unit test that can reproduce these issues.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

cacheTest.zip
19 kB
2011/01/27 4:52 PM

blocks

ISPN-493 Harden rehash leave process

Closed

relates to

ISPN-493 Harden rehash leave process

Closed

Assignee:: Manik Surtani (Inactive)

Reporter:: Erik Salter (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2011/01/27 4:52 PM

Updated:: 2020/09/14 5:34 AM

Resolved:: 2011/02/16 10:07 AM

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates