Uploaded image for project: 'Product Technical Learning'
  1. Product Technical Learning
  2. PTL-8044

RH436-71: Better discussion of RRP passive vs active mode requested

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Can't Do
    • Icon: Major Major
    • RH436 - RHEL 8.3 1
    • RH436 - RHEL 7.1 1
    • RH436
    • None
    • ILT, ROLE, VT
    • en-US (English)

      URL:
      Reporter RHNID:
      Section: -
      Language: en-US (English)
      Workaround:

      Description: Discussion in gls-contractor-list between Alpaslan and Phil requested an expansion/improvement in the discussion of RRP passive and active modes, and also clarify the support relationship between DLM, GFS2, and RRP in more detail. Key message below. Accuracy of Phil's information should be confirmed.

      On 10/2/15 3:22 PM, Alpaslan Kaplan - Perception Training and Consulting
      wrote:
      > Hi,
      >
      > I need help about understanding active vs passive mode of RRP.
      
      Yes, this information should be added to the coursebook.
      
      > I thought in passive mode only one ring is used (like activebackup mode
      > of teaming) and in active mode two rings are used simultaneously. But it
      > seems that's not the case. Anyone who can explain how they work shortly?
      
      Shortly?  Uuuummmm, no.  I'm the talkative one, remember?  Just read
      every third or fourth paragraph only.
      
      In passive mode, both are configured and available, but corosync
      *alternates* between the two rings.  In active mode, corosync uses both
      rings *simultaneously*, i.e., replicated.
      
      Active offers slightly lower latency from transmit to delivery in faulty
      network environments but with less performance.
      
      Passive may nearly double the speed of the totem protocol if the
      protocol doesn't become cpu-bound.
      
      When rrp_mode=none only one network interface will be used to operate
      the totem protocol.
      
      A problem with active is that active doesn't make progress if one of its
      link fails, until the link is marked as failed.  Passive always makes
      progress even if a link is failed and not yet detected.
      
      The following netstat output shows a typical configuration, with UDP
      listeners for both mcast send and receive on each of two rings.  This
      example was passive, demonstrating that the configuration is the same
      for all modes of RRP.
      
      udp    0    0 10.0.0.1:5404      0.0.0.0:*          2016/corosync
      udp    0    0 10.0.0.1:5405      0.0.0.0:*          2016/corosync
      udp    0    0 226.94.1.1:5405    0.0.0.0:*          2016/corosync
      udp    0    0 172.16.0.1:5406    0.0.0.0:*          2016/corosync
      udp    0    0 172.16.0.1:5407    0.0.0.0:*          2016/corosync
      udp    0    0 226.94.1.2:5407    0.0.0.0:*          2016/corosync
      
      > Also in the book it's stated that there is a constraint for using RRP
      > such as services requiring DLM are not 'supported'. Does this mean it's
      > OK use RRP in a cluster with GFS & CLVM but these services will not take
      > advantage of RRP and remaining services in the cluster will?
      
      No, it means that using GFS and CLVM in a cluster that has RRP
      configured will cause failures.  When configured, RRP is not just
      another choice, it *is* the heartbeat network.
      
      > Or does it
      > mean that we can never use RRP in a cluster even if any part of the
      > cluster uses DLM.
      
      If *any* part of the cluster needs DLM, you cannot use RRP.  See below.
      
      > I understand it is not likely that GFS will be used
      > only by 'some' nodes in a cluster but I'm trying to understand the subject.
      
      To handle redundancy, DLM uses Stream Control Transmission Protocol
      (SCTP) for multiple address redundancy. SCTP is a loaded as kernel
      module. There are outstanding bugs related to SCTP not correctly
      handling the creation and use of sockets required for DLM communication.
       For example, the SCTP module attempts to invoke userland file
      descriptor creation and ends up spoiling the internal FD table for
      kernel processes.  And then things fail.
      
      Even though DLM offers the SCTP protocol as an option, it is not
      entirely functional and is therefore not supported by Red Hat. The
      functionality exists because DLM does not work with the multi-homing
      provided by RRP with normal TCP, so SCTP was added to DLM as a possible
      solution. However, the implementation is incomplete and has bugs, which
      means that DLM is currently unusable with RRP altogether.
      
      The cpglockd daemon can provide an alternative lock manager for
      rgmanager in RHEL 6 Update 4 or later, if rgmanager is still being used
      by a client.  But otherwise, all daemons and components that require DLM
      will not function with RRP and must have it disabled.
      
      Summary: Use RRP only if *not* using GFS and CLVM (which require DLM).
      There is no such thing as using it one way on one node and another way
      on other nodes.  Whole clusters work as a consistently configured set of
      nodes using the same protocols.
      
      I would expect that work continues on fixing these protocols.
      

              rht-hquatrem Herve Quatremain
              rht-psweany Philip Sweany (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: