-
Story
-
Resolution: Can't Do
-
Major
-
RH436 - RHEL 7.1 1
-
None
-
ILT, ROLE, VT
-
en-US (English)
URL:
Reporter RHNID:
Section: -
Language: en-US (English)
Workaround:
Description: Discussion in gls-contractor-list between Alpaslan and Phil requested an expansion/improvement in the discussion of RRP passive and active modes, and also clarify the support relationship between DLM, GFS2, and RRP in more detail. Key message below. Accuracy of Phil's information should be confirmed.
On 10/2/15 3:22 PM, Alpaslan Kaplan - Perception Training and Consulting wrote: > Hi, > > I need help about understanding active vs passive mode of RRP. Yes, this information should be added to the coursebook. > I thought in passive mode only one ring is used (like activebackup mode > of teaming) and in active mode two rings are used simultaneously. But it > seems that's not the case. Anyone who can explain how they work shortly? Shortly? Uuuummmm, no. I'm the talkative one, remember? Just read every third or fourth paragraph only. In passive mode, both are configured and available, but corosync *alternates* between the two rings. In active mode, corosync uses both rings *simultaneously*, i.e., replicated. Active offers slightly lower latency from transmit to delivery in faulty network environments but with less performance. Passive may nearly double the speed of the totem protocol if the protocol doesn't become cpu-bound. When rrp_mode=none only one network interface will be used to operate the totem protocol. A problem with active is that active doesn't make progress if one of its link fails, until the link is marked as failed. Passive always makes progress even if a link is failed and not yet detected. The following netstat output shows a typical configuration, with UDP listeners for both mcast send and receive on each of two rings. This example was passive, demonstrating that the configuration is the same for all modes of RRP. udp 0 0 10.0.0.1:5404 0.0.0.0:* 2016/corosync udp 0 0 10.0.0.1:5405 0.0.0.0:* 2016/corosync udp 0 0 226.94.1.1:5405 0.0.0.0:* 2016/corosync udp 0 0 172.16.0.1:5406 0.0.0.0:* 2016/corosync udp 0 0 172.16.0.1:5407 0.0.0.0:* 2016/corosync udp 0 0 226.94.1.2:5407 0.0.0.0:* 2016/corosync > Also in the book it's stated that there is a constraint for using RRP > such as services requiring DLM are not 'supported'. Does this mean it's > OK use RRP in a cluster with GFS & CLVM but these services will not take > advantage of RRP and remaining services in the cluster will? No, it means that using GFS and CLVM in a cluster that has RRP configured will cause failures. When configured, RRP is not just another choice, it *is* the heartbeat network. > Or does it > mean that we can never use RRP in a cluster even if any part of the > cluster uses DLM. If *any* part of the cluster needs DLM, you cannot use RRP. See below. > I understand it is not likely that GFS will be used > only by 'some' nodes in a cluster but I'm trying to understand the subject. To handle redundancy, DLM uses Stream Control Transmission Protocol (SCTP) for multiple address redundancy. SCTP is a loaded as kernel module. There are outstanding bugs related to SCTP not correctly handling the creation and use of sockets required for DLM communication. For example, the SCTP module attempts to invoke userland file descriptor creation and ends up spoiling the internal FD table for kernel processes. And then things fail. Even though DLM offers the SCTP protocol as an option, it is not entirely functional and is therefore not supported by Red Hat. The functionality exists because DLM does not work with the multi-homing provided by RRP with normal TCP, so SCTP was added to DLM as a possible solution. However, the implementation is incomplete and has bugs, which means that DLM is currently unusable with RRP altogether. The cpglockd daemon can provide an alternative lock manager for rgmanager in RHEL 6 Update 4 or later, if rgmanager is still being used by a client. But otherwise, all daemons and components that require DLM will not function with RRP and must have it disabled. Summary: Use RRP only if *not* using GFS and CLVM (which require DLM). There is no such thing as using it one way on one node and another way on other nodes. Whole clusters work as a consistently configured set of nodes using the same protocols. I would expect that work continues on fixing these protocols.