Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-5036

link init race condition leads to nodes being fenced

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • rhel-8.8.0, rhel-8.8.0.z, rhel-9.2.0, rhel-9.2.0.z
    • kronosnet
    •  kronosnet-1.28-1.el8
    • None
    • Medium
    • Patch, Upstream, TestCaseProvided
    • sst_high_availability
    • ssg_filesystems_storage_and_HA
    • None
    • QE ack
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None

      What were you trying to do that didn't work?

      pcs cluster start --all
      on a 5 nodes cluster, where one node is older hw generation than others.

      the cluster formed 2 partitions:

      • one by one node (older hw, yet started faster)
      • one by 4 nodes (took longer to start corosync but they did start in sync)

      the single node attempted to send membership packets to the other nodes, but those were rejected because the 4 nodes were still initializing the knet links. by the time the 4 nodes have completed init with the slow node, they decided that node1 was not part of the membership and fenced the node.

      Please provide the package NVR for which bug is seen:

      all of them, this is a design decision in knet that existed forever.

      How reproducible:

      always

      Steps to reproduce

      This is a complex race condition to reproduce on normal clusters. So far I have seen this problem only on one BM cluster that is currently used for SAS workload calibration.

      It is possible to reproduce it manually all the time, it´s just a bit inconvenient.

      Create a 2 node cluster. In order to simulate the failure, we will need 2 different corosync.conf on each node:

      node1:

      totem {
          version: 2
          secauth: on
          cluster_name: demo
          crypto_cipher: aes256
          crypto_hash: sha256
          config_version: 1
      }
      
      nodelist {
          node {
              name: rhel8-node1
              ring0_addr: 192.168.9.41
              nodeid: 1
          }
      
          node {
              name: rhel8-node2
              ring0_addr: 192.168.9.42
              nodeid: 2
          }
      }
      
      quorum {
          provider: corosync_votequorum
          two_node: 1
      }
      
      logging {
          debug: on
          to_logfile: yes
          logfile: /var/log/cluster/corosync.log
          to_syslog: yes
          timestamp: on
      }
      

      pretty much standard corosync.conf.

      For the second node, we need to tweak token and pong_count to delay the knet link initialization code.

      node2:

      totem {
          version: 2
          secauth: on
          cluster_name: demo
          crypto_cipher: aes256
          crypto_hash: sha256
          config_version: 1
          token: 30000
          interface {
              linknumber: 0
              knet_pong_count: 30
          }
      }
      
      nodelist {
          node {
              name: rhel8-node1
              ring0_addr: 192.168.9.41
              nodeid: 1
          }
      
          node {
              name: rhel8-node2
              ring0_addr: 192.168.9.42
              nodeid: 2
          }
      }
      
      quorum {
          provider: corosync_votequorum
          two_node: 1
      }
      
      logging {
          debug: on
          to_logfile: yes
          logfile: /var/log/cluster/corosync.log
          to_syslog: yes
          timestamp: on
      }
      

      Actual results

      with the current version of knet, node2 will reject membership packets with:

      Sep 19 04:59:15 debug [KNET ] rx: host: 1 link: 0 received pong: 5
      Sep 19 04:59:16 debug [KNET ] rx: Source host 1 not reachable yet. Discarding packet.
      Sep 19 04:59:16 debug [KNET ] rx: host: 1 link: 0 received pong: 6
      Sep 19 04:59:16 debug [KNET ] rx: Source host 1 not reachable yet. Discarding packet.
      Sep 19 04:59:17 debug [KNET ] rx: host: 1 link: 0 received pong: 7
      Sep 19 04:59:17 debug [KNET ] rx: Source host 1 not reachable yet. Discarding packet.
      Sep 19 04:59:17 debug [KNET ] rx: host: 1 link: 0 received pong: 8
      Sep 19 04:59:17 debug [KNET ] rx: Source host 1 not reachable yet. Discarding packet.
      Sep 19 04:59:18 debug [KNET ] rx: host: 1 link: 0 received pong: 9
      Sep 19 04:59:18 debug [KNET ] rx: Source host 1 not reachable yet. Discarding packet.
      Sep 19 04:59:18 debug [KNET ] rx: host: 1 link: 0 received pong: 10
      Sep 19 04:59:19 debug [KNET ] rx: Source host 1 not reachable yet. Discarding packet.
      Sep 19 04:59:19 debug [KNET ] rx: host: 1 link: 0 received pong: 11
      Sep 19 04:59:19 debug [KNET ] rx: Source host 1 not reachable yet. Discarding packet.

      causing the creation of the 2 membership above.

      The new code instead is able to better deal with this situation and it will immediately up the link and form membership:

      Sep 19 05:00:58 debug [KNET ] rx: host: 1 link: 0 received pong: 1
      Sep 19 05:00:58 debug [KNET ] rx: host: 1 link: 0 received pong: 2
      Sep 19 05:00:59 debug [KNET ] rx: host: 1 link: 0 received pong: 3
      Sep 19 05:00:59 debug [TOTEM ] Knet pMTU change: 421
      Sep 19 05:00:59 debug [KNET ] rx: host: 1 link: 0 received data during valid ping/pong activity. Force link up.

            rhn-support-ccaulfie Christine Caulfield
            rhn-engineering-fdinitto Fabio Massimo Di Nitto
            Barry Marson, Christine Caulfield, Jan Friesse, Patrik Hagara
            Christine Caulfield Christine Caulfield
            Patrik Hagara Patrik Hagara
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: