Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7593

Avoiding "Retransmit List" on cluster startup and the influence of send_join

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • rhel-8.4.0
    • corosync
    • None
    • None
    • rhel-ha
    • 12
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      This is intended to track possible ways to deal of "Retransmit List" messages logged by corosync during the process of forming a cluster. We're occasionally seeing the following on larger clusters with 8-16 nodes:

      Aug 17 09:54:42 [12155] east-09.lab.bos.redhat.com corosync notice [TOTEM ] Retransmit List: 1
      Aug 17 09:54:42 [12899] east-10.lab.bos.redhat.com corosync notice [TOTEM ] Failed to receive the leave message. failed: 1
      Aug 17 09:54:42 [17878] east-11.lab.bos.redhat.com corosync notice [TOTEM ] Retransmit List: 1
      Aug 17 09:54:42 [17878] east-11.lab.bos.redhat.com corosync notice [TOTEM ] Failed to receive the leave message. failed: 1
      Aug 17 09:54:42 [12346] east-13.lab.bos.redhat.com corosync notice [TOTEM ] Retransmit List: 1
      Aug 17 09:54:42 [12346] east-13.lab.bos.redhat.com corosync notice [TOTEM ] Failed to receive the leave message. failed: 1
      Aug 17 09:54:42 [11647] east-14.lab.bos.redhat.com corosync notice [TOTEM ] Retransmit List: 1
      Aug 17 09:54:42 [20347] east-15.lab.bos.redhat.com corosync notice [TOTEM ] Retransmit List: 1
      Aug 17 09:54:42 [10881] east-16.lab.bos.redhat.com corosync notice [TOTEM ] Retransmit List: 1

      Eventually the communication goes ham and the list grows like this:

      Aug 13 11:41:23 [1979] host-027.virt.lab.msp.redhat.com corosync notice [TOTEM ] Retransmit List: 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24
      Aug 13 11:41:23 [1979] host-027.virt.lab.msp.redhat.com corosync notice [TOTEM ] Retransmit List: 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32

      I have been able to reproduce this on 16-node clusters in multiple labs we have in BRQ and MSP with KVM-based virtual machines, but also on a 7-node physical cluster in BOS lab.

      The RHEL base is 7.6 with following pkg versions currently used:
      corosync-2.4.3-4.el7.x86_64
      pacemaker-1.1.19-6.el7.x86_64
      pcs-0.9.165-3.el7.x86_64

      By experimenting with various totem options as suggested to me by Jan Friesse I've found the 'send_join' option to be most influential for mitigating this behaviour. On my virtual machine-based 16-node clusters the reproducibility is very high and running a number of startups as high as 500 iterations reveals a threshold of send_join at about 50 where the retransmits are very scarce (~15 occasions of 500). Setting the send_join to 100 seems to be safe enough to eliminate the retransmits altogether (for now, my tests are still running).

      The corosync.conf manual page states following for sync_join: "For configurations with less than 32 nodes, this parameter is not necessary". If this is not enough in practice, I am wondering if we could:

      • reconsider the defaults
      • update documentation to suggest tuning certain totem options
      • let pcs determine optimal values somehow

      Indeed, as long as the root cause of retransmits is apparently evading me I am not inclined to any specific solution right now.

      To aviod performance issues with pcs' way of starting the whole cluster from a single node (which itself seems to be causing other issues such as "Process pause detected for..." messages) I've been running 'pcs cluster start' on all nodes in parallel.

      My corosync.conf looks like this (tested both UDP/UDPU with no noticable difference):

      totem {
      version: 2
      cluster_name: STSRHTS8926
      secauth: off
      transport: udp
      send_join: 100
      }
      nodelist {
      node

      { ring0_addr: host-026 nodeid: 1 }

      ....
      node

      { ring0_addr: host-041 nodeid: 16 }

      }
      quorum {
      provider: corosync_votequorum
      }
      logging {
      to_logfile: yes
      logfile: /var/log/cluster/corosync.log
      to_syslog: yes
      timestamp: on
      }

      (Since this issue is not necessarily a bug in corosync itself, we can change the assigned component as needed.)

              rhn-engineering-jfriesse Jan Friesse
              radek.steiger Radek Steiger (Inactive)
              Jan Friesse Jan Friesse
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: