Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7625

Stonith action times out if fenced node is still in CPG membership when the stonith action timeout expires

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • None
    • Moderate
    • rhel-ha
    • 3
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:

      If a fenced node is still in the corosync CPG membership when the stonith action's timeout expires, the node gets rebooted successfully but the stonith action fails due to a timeout. This results in a fenced node getting rebooted twice in a row.

      The stonith action timeout seems to default to 72s == TIMEOUT_MULTIPLY_FACTOR * peer (or device?) timeout. So in practice, this issue tends to happen if:

      ((time for stonith op to complete) + token + consensus) > 72s

      A simple way to reproduce is:
      1. set the token timeout to 40000 (a nice round number)
      2. disable cluster services from starting on boot on the node to be fenced
      3. cause a resource operation with on-fail=fence to fail on one node

      With debug enabled for pacemaker-fenced and trace enabled for fenced_remote.c and fenced_commands.c, you should get logs like the following when node 2 gets fenced. Note that the operation returns "OK" but the stonith action still times out. (I could be misusing operation/action terms but you get the idea.)

      [root@fastvm-rhel-8-0-23 pacemaker]# date; corosync-cmapctl | grep 'totem\.token '; pcs resource config | grep dummy1-monitor-interval; pcs stonith config | grep -A2 xvm2
      Fri Mar 19 17:53:15 PDT 2021
      runtime.config.totem.token (u32) = 40000
      totem.token (u32) = 40000
      monitor interval=10s on-fail=fence timeout=20s (dummy1-monitor-interval-10s)
      Resource: xvm2 (class=stonith type=fence_xvm)
      Attributes: pcmk_host_map=node2:fastvm-rhel-8.0-24
      Operations: monitor interval=60s (xvm2-monitor-interval-60s)

      1. # Request and initiate stonith action against node2
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (unpack_rsc_op_failure@unpack.c:3027) warning: Unexpected result (error) was recorded for monitor of dummy1 on node2 at Mar 19 17:53:44 2021 | rc=1 id=dummy1_last_failure_0
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (pe_fence_node@unpack.c:162) warning: Cluster node node2 will be fenced: dummy1 failed there
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (stage6@pcmk_sched_allocate.c:1652) warning: Scheduling Node node2 for STONITH
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-controld [333021] (te_fence_node@controld_fencing.c:859) notice: Requesting fencing (reboot) of node node2 | action=6 timeout=60000
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (can_fence_host_with_device@fenced_commands.c:1738) notice: xvm2 is eligible to fence (reboot) node2 (aka. 'fastvm-rhel-8.0-24'): static-list
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (call_remote_stonith@fenced_remote.c:1544) info: Total timeout set to 60 for peer's fencing targeting node2 for pacemaker-controld.333021|id=0db9abc2
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (call_remote_stonith@fenced_remote.c:1586) notice: Requesting that node1 perform 'reboot' action targeting node2 | for client pacemaker-controld.333021 (72s, 0s)
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (schedule_stonith_command@fenced_commands.c:506) debug: Scheduling 'reboot' action targeting node2 using xvm2 for remote peer node1 with op id 0db9abc2 and timeout 60s
        Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (fork_cb@fenced_commands.c:321) debug: Operation 'reboot' [333087] targeting node2 using xvm2 now running with 60s timeout
      1. # Reboot succeeds
        Mar 19 17:53:46 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (log_operation@fenced_commands.c:2076) notice: Operation 'reboot' [333087] (call 6 from pacemaker-controld.333021) targeting node2 using xvm2 returned 0 (OK)
        Mar 19 17:53:46 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (stonith_send_async_reply@fenced_commands.c:2128) trace: Directed reply to node1
        Mar 19 17:53:46 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (stonith_device_execute@fenced_commands.c:373) trace: No actions using xvm2 are needed
      1. # Token loss occurs (corosync.log)
        Mar 19 17:54:00 [1729] fastvm-rhel-8-0-23 corosync info [KNET ] link: host: 2 link: 0 is down
        Mar 19 17:54:00 [1729] fastvm-rhel-8-0-23 corosync info [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
        Mar 19 17:54:00 [1729] fastvm-rhel-8-0-23 corosync warning [KNET ] host: host: 2 has no active links
        Mar 19 17:54:14 [1729] fastvm-rhel-8-0-23 corosync notice [TOTEM ] Token has not been received in 30000 ms
        Mar 19 17:54:24 [1729] fastvm-rhel-8-0-23 corosync notice [TOTEM ] A processor failed, forming new configuration.
      1. # Action times out 72 seconds after it began
        Mar 19 17:54:56 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (remote_op_query_timeout@fenced_remote.c:647) debug: Operation 0db9abc2 targeting node2 already in progress
        Mar 19 17:54:56 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (remote_op_timeout@fenced_remote.c:618) debug: Action 'reboot' targeting node2 for client pacemaker-controld.333021 timed out | id=0db9abc2
        Mar 19 17:54:56 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (stonith_bcast_result_to_peers@fenced_remote.c:383) trace: Broadcasting result to peers
      1. # node2 leaves the CPG membership and the action finishes failing
      2. # corosync.log
        Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync notice [TOTEM ] A new membership (1.115e2) was formed. Members left: 2
        Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync notice [TOTEM ] Failed to receive the leave message. failed: 2
        Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync warning [CPG ] downlist left_list: 1 received
        Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync notice [QUORUM] Members[1]: 1
        Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
      1. # pacemaker.log
        Mar 19 17:55:12 fastvm-rhel-8-0-23 pacemaker-attrd [333019] (pcmk_cpg_membership@cpg.c:673) info: Group attrd event 2: node2 (node 2 pid 1690) left via cluster exit
        Mar 19 17:55:12 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (remote_op_done@fenced_remote.c:549) error: Operation 'reboot' targeting node2 by node1 for pacemaker-controld.333021@node1: Timer expired | id=0db9abc2
        Mar 19 17:55:12 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (remote_op_done@fenced_remote.c:503) error: Already sent notifications for 'reboot' targeting node2 by node1 for client pacemaker-controld.333021@node1: OK | rc=0 state=completed id=0db9abc2
        Mar 19 17:55:12 fastvm-rhel-8-0-23 pacemaker-controld [333021] (tengine_stonith_notify@controld_fencing.c:511) notice: Peer node2 was not terminated (reboot) by node1 on behalf of pacemaker-controld.333021: Timer expired | initiator=node1 ref=0db9abc2-2b07-4048-9746-4bceee5528e3
      1. # node2 gets fenced again
        Mar 19 17:55:13 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (pe_fence_node@unpack.c:162) warning: Cluster node node2 will be fenced: peer is no longer part of the cluster
        Mar 19 17:55:13 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (unpack_rsc_op_failure@unpack.c:3027) warning: Unexpected result (error) was recorded for monitor of dummy1 on node2 at Mar 19 17:53:44 2021 | rc=1 id=dummy1_last_failure_0
        Mar 19 17:55:13 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (stage6@pcmk_sched_allocate.c:1652) warning: Scheduling Node node2 for STONITH

      Version-Release number of selected component (if applicable):

      Using upstream master

      RHEL release would be pacemaker-2.0.4-6.el8_3.1.


      How reproducible:

      Always


      Steps to Reproduce:
      1. Set the Corosync token timeout to 40000.

      • This is a nice round number and testing should be consistent this way, but I was able to reproduce the issue with at least as low as 34000.
        2. Disable cluster services from starting on boot on the node to be fenced (`pcs cluster disable`).
        3. Cause a resource operation with on-fail=fence to fail on one node.
      • This was the first that came to mind without relying on stonith_admin, which can sometimes behave differently compared to automatic fencing.

      Actual results:

      The stonith action times out despite the node getting fenced. The node gets rebooted twice, as pacemaker tries to fence it a second time.


      Expected results:

      The stonith action succeeds the first time, provided that the fenced node gets rebooted successfully.


      Additional info:

      This is also reproducible on RHEL 7 (which is where it was observed in a customer environment), but it doesn't seem worthwhile to backport a fix unless it's fairly trivial to do so. This is a bit of a corner case.

              rhn-support-clumens Christopher Lumens
              rhn-support-nwahl Reid Wahl
              Kenneth Gaillot Kenneth Gaillot (Inactive)
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: