Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 7.0.1.CR1, 7.0.1.GA
Affects Version/s: 7.0.0.GA
Component/s: None
Labels:
None

CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.0.z.GA
Git Pull Request:
https://github.com/infinispan/infinispan/pull/4133

Sprint:
EAP 7.0.1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Normally, the JGroupsTransport startup sequence goes like this:

Create the Channel
Create the CommandAwareRpcDispatcher and install it as an UpHandler
Connect the channel

This way, every RequestCorrelator message received by the channel is passed up to CommandAwareRpcDispatcher, which executes the appropriate command.

When using a JGroupsChannelLookup, the lookup implementation is allowed to return a Channel instance that is already connected (shouldConnect() == false). That means there is now a window where the channel doesn't have an UpHandler, and messages sent to this node are discarded.

Normally a node only receives commands after it sent a join request to the coordinator. There are however a few exceptions:

On startup, LocalTopologyManagerImpl sends the join request to the JGroups coordinator, which may not have the UpHandler yet. This seems to be responsible for the recent hanging in ConcurrentStartTest. We have a workaround here, to use a smaller timeout on the CacheTopologyControlCommand(JOIN) command, and retry it on TimeoutException.
When a node becomes coordinator, ClusterTopologyManagerImpl broadcasts a GET_STATUS request to all cluster members, and expects a response from each of them. The same workaround with a smaller timeout and retries might work here.
In replicated mode, write commands are broadcasted to all cluster members. There is some commented out code in RpcManagerImpl.invokeRemotelyAsync() that might fix it by only waiting for responses from the cache topology members.

We should consider deprecating JGroupsChannelLookup.shouldConnect() and requiring that the channel is only connected by JGroupsTransport. Assuming that works with ForkChannel, of course.

causes

ISPN-5495 ConcurrentStartTest.testConcurrentStart random failures

Closed

clones

ISPN-6322 Infinispan can miss incoming commands with JGroupsChannelLookup

Closed

is incorporated by

ISPN-6433 Backport to 8.1.x branch

Closed

JBEAP-4124 Upgrade to Infinispan 8.1.4.Final

Closed

Assignee:: Dan Berindei (Inactive)

Reporter:: Brad Maxwell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2016/05/18 8:16 PM

Updated:: 2020/09/30 2:55 PM

Resolved:: 2016/05/18 8:26 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates