Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 8.2.1.Final, 8.1.4.Final, 9.0.0.Final
Affects Version/s: 8.2.0.CR1, 8.1.2.Final
Component/s: Core
Labels:
None

Git Pull Request:
https://github.com/infinispan/infinispan/pull/4133

Normally, the JGroupsTransport startup sequence goes like this:

Create the Channel
Create the CommandAwareRpcDispatcher and install it as an UpHandler
Connect the channel

This way, every RequestCorrelator message received by the channel is passed up to CommandAwareRpcDispatcher, which executes the appropriate command.

When using a JGroupsChannelLookup, the lookup implementation is allowed to return a Channel instance that is already connected (shouldConnect() == false). That means there is now a window where the channel doesn't have an UpHandler, and messages sent to this node are discarded.

Normally a node only receives commands after it sent a join request to the coordinator. There are however a few exceptions:

On startup, LocalTopologyManagerImpl sends the join request to the JGroups coordinator, which may not have the UpHandler yet. This seems to be responsible for the recent hanging in ConcurrentStartTest. We have a workaround here, to use a smaller timeout on the CacheTopologyControlCommand(JOIN) command, and retry it on TimeoutException.
When a node becomes coordinator, ClusterTopologyManagerImpl broadcasts a GET_STATUS request to all cluster members, and expects a response from each of them. The same workaround with a smaller timeout and retries might work here.
In replicated mode, write commands are broadcasted to all cluster members. There is some commented out code in RpcManagerImpl.invokeRemotelyAsync() that might fix it by only waiting for responses from the cache topology members.

We should consider deprecating JGroupsChannelLookup.shouldConnect() and requiring that the channel is only connected by JGroupsTransport. Assuming that works with ForkChannel, of course.

causes

ISPN-5495 ConcurrentStartTest.testConcurrentStart random failures

Closed

is cloned by

JBEAP-4621 Infinispan can miss incoming commands with JGroupsChannelLookup

Closed

is incorporated by

ISPN-6433 Backport to 8.1.x branch

Closed

Assignee:: Dan Berindei (Inactive)

Reporter:: Dan Berindei (Inactive)

Archiver:: Amol Dongare

Created:: 2016/03/04 3:31 AM

Updated:: 2020/09/30 2:54 PM

Resolved:: 2016/04/30 1:25 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty