Loading...

XML

Word

Printable

Type: Story
Resolution: Can't Do
Priority: Major
Fix Version/s: RH436 - RHEL 8.3 1
Affects Version/s: RH436 - RHEL 7.1 1
Component/s: RH436
Labels:
None

Pool Team:

ILT, ROLE, VT
Language:

en-US (English)

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

URL:
Reporter RHNID:
Section: -
Language: en-US (English)
Workaround:

Description: Discussion in gls-contractor-list between Alpaslan and Phil requested an expansion/improvement in the discussion of RRP passive and active modes, and also clarify the support relationship between DLM, GFS2, and RRP in more detail. Key message below. Accuracy of Phil's information should be confirmed.

On 10/2/15 3:22 PM, Alpaslan Kaplan - Perception Training and Consulting
wrote:
> Hi,
>
> I need help about understanding active vs passive mode of RRP.

Yes, this information should be added to the coursebook.

> I thought in passive mode only one ring is used (like activebackup mode
> of teaming) and in active mode two rings are used simultaneously. But it
> seems that's not the case. Anyone who can explain how they work shortly?

Shortly?  Uuuummmm, no.  I'm the talkative one, remember?  Just read
every third or fourth paragraph only.

In passive mode, both are configured and available, but corosync
*alternates* between the two rings.  In active mode, corosync uses both
rings *simultaneously*, i.e., replicated.

Active offers slightly lower latency from transmit to delivery in faulty
network environments but with less performance.

Passive may nearly double the speed of the totem protocol if the
protocol doesn't become cpu-bound.

When rrp_mode=none only one network interface will be used to operate
the totem protocol.

A problem with active is that active doesn't make progress if one of its
link fails, until the link is marked as failed.  Passive always makes
progress even if a link is failed and not yet detected.

The following netstat output shows a typical configuration, with UDP
listeners for both mcast send and receive on each of two rings.  This
example was passive, demonstrating that the configuration is the same
for all modes of RRP.

udp    0    0 10.0.0.1:5404      0.0.0.0:*          2016/corosync
udp    0    0 10.0.0.1:5405      0.0.0.0:*          2016/corosync
udp    0    0 226.94.1.1:5405    0.0.0.0:*          2016/corosync
udp    0    0 172.16.0.1:5406    0.0.0.0:*          2016/corosync
udp    0    0 172.16.0.1:5407    0.0.0.0:*          2016/corosync
udp    0    0 226.94.1.2:5407    0.0.0.0:*          2016/corosync

> Also in the book it's stated that there is a constraint for using RRP
> such as services requiring DLM are not 'supported'. Does this mean it's
> OK use RRP in a cluster with GFS & CLVM but these services will not take
> advantage of RRP and remaining services in the cluster will?

No, it means that using GFS and CLVM in a cluster that has RRP
configured will cause failures.  When configured, RRP is not just
another choice, it *is* the heartbeat network.

> Or does it
> mean that we can never use RRP in a cluster even if any part of the
> cluster uses DLM.

If *any* part of the cluster needs DLM, you cannot use RRP.  See below.

> I understand it is not likely that GFS will be used
> only by 'some' nodes in a cluster but I'm trying to understand the subject.

To handle redundancy, DLM uses Stream Control Transmission Protocol
(SCTP) for multiple address redundancy. SCTP is a loaded as kernel
module. There are outstanding bugs related to SCTP not correctly
handling the creation and use of sockets required for DLM communication.
 For example, the SCTP module attempts to invoke userland file
descriptor creation and ends up spoiling the internal FD table for
kernel processes.  And then things fail.

Even though DLM offers the SCTP protocol as an option, it is not
entirely functional and is therefore not supported by Red Hat. The
functionality exists because DLM does not work with the multi-homing
provided by RRP with normal TCP, so SCTP was added to DLM as a possible
solution. However, the implementation is incomplete and has bugs, which
means that DLM is currently unusable with RRP altogether.

The cpglockd daemon can provide an alternative lock manager for
rgmanager in RHEL 6 Update 4 or later, if rgmanager is still being used
by a client.  But otherwise, all daemons and components that require DLM
will not function with RRP and must have it disabled.

Summary: Use RRP only if *not* using GFS and CLVM (which require DLM).
There is no such thing as using it one way on one node and another way
on other nodes.  Whole clusters work as a consistently configured set of
nodes using the same protocols.

I would expect that work continues on fixing these protocols.

Assignee:: Herve Quatremain

Reporter:: Philip Sweany (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2015/10/03 3:59 PM

Updated:: 2023/09/24 7:38 PM

Resolved:: 2021/03/11 2:04 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty