Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 8.2.1.Final, 9.0.0.Final
Affects Version/s: 8.2.0.Final
Component/s: Test Suite
Labels:
None

Git Pull Request:
https://github.com/infinispan/infinispan/pull/4146, https://github.com/infinispan/infinispan/pull/4199

GMS can sometimes delay the processing of a join/leave request because of ~~JGRP-2028~~.

Joiners retry automatically after GMS.join_timeout, so it's not that bad. Leavers, however, don't resend their leave requests, so the delay can be worse.

Normally, the FD/FD_ALL/FD_SOCK protocols would wake up the ViewHandler thread. But we remove the FD* protocols from the stack in most of our tests, unless the test uses DISCARD. That means the leave request can be delayed until another node leaves:

16:35:56,247 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeB-8309: sending LEAVE request to NodeA-45395
16:35:56,268 TRACE (OOB-1,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeB-8309 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeB-8309, UNICAST3: DATA, seqno=22, TP: [cluster_name=ISPN]
16:35:56,268 TRACE (OOB-1,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeB-8309#22

16:36:07,263 ERROR (testng-ClusterListenerDistAddListenerTest:) [UnitTestTestNGListener] Test testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled(org.infinispan.notifications.cachelistener.cluster.ClusterListenerDistAddListenerTest) failed.
org.infinispan.util.concurrent.TimeoutException: Timed out before caches had complete views.  Expected 3 members in each view.  Views are as follows: [[NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165]]

16:37:07,341 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeD-55165: sending LEAVE request to NodeA-45395
16:37:07,361 TRACE (OOB-4,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeD-55165 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeD-55165, UNICAST3: DATA, seqno=21, TP: [cluster_name=ISPN]
16:37:07,361 TRACE (OOB-4,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeD-55165#21
16:37:07,361 TRACE (ViewHandler,NodeA-45395:) [GMS] NodeA-45395: joiners=[], suspected=[], leaving=[NodeB-8309], new view: [NodeA-45395|4] (3) [NodeA-45395, NodeC-53222, NodeD-55165]

FD_ALL is pretty cheap: it just sends a message every second, without opening any new sockets. So I think we should enable it by default, and only enable FD_SOCK with TransportFlags.withFD(true).

is related to

JGRP-2028 GMS sometimes ignores view bundling timeout

Closed

Assignee:: Dan Berindei (Inactive)

Reporter:: Dan Berindei (Inactive)

Archiver:: Amol Dongare

Created:: 2016/03/18 3:44 AM

Updated:: 2016/04/04 12:34 PM

Resolved:: 2016/03/31 6:09 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty