Loading...

Type: Bug
Resolution: Won't Do
Priority: Minor
Fix Version/s: None
Affects Version/s: rhel-8.3.0
Component/s: pacemaker
Labels:
- MigratedToJIRA
- auto-closed

Regression:
None
Severity:
Moderate

AssignedTeam:
rhel-ha

Story Points:
3
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Release Note Type:
If docs needed, set a value

Experience:
Architecture:

All
Bugzilla Bug:
RHBZ: 1941108

PX Impact Score:
PX Technical Impact:
PX Impact Range:
PX Priority Data:
PX Review Complete:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None
Internal Target Milestone numeric:
57,005

Description of problem:

If a fenced node is still in the corosync CPG membership when the stonith action's timeout expires, the node gets rebooted successfully but the stonith action fails due to a timeout. This results in a fenced node getting rebooted twice in a row.

The stonith action timeout seems to default to 72s == TIMEOUT_MULTIPLY_FACTOR * peer (or device?) timeout. So in practice, this issue tends to happen if:

((time for stonith op to complete) + token + consensus) > 72s

A simple way to reproduce is:
1. set the token timeout to 40000 (a nice round number)
2. disable cluster services from starting on boot on the node to be fenced
3. cause a resource operation with on-fail=fence to fail on one node

With debug enabled for pacemaker-fenced and trace enabled for fenced_remote.c and fenced_commands.c, you should get logs like the following when node 2 gets fenced. Note that the operation returns "OK" but the stonith action still times out. (I could be misusing operation/action terms but you get the idea.)

[root@fastvm-rhel-8-0-23 pacemaker]# date; corosync-cmapctl | grep 'totem\.token '; pcs resource config | grep dummy1-monitor-interval; pcs stonith config | grep -A2 xvm2
Fri Mar 19 17:53:15 PDT 2021
runtime.config.totem.token (u32) = 40000
totem.token (u32) = 40000
monitor interval=10s on-fail=fence timeout=20s (dummy1-monitor-interval-10s)
Resource: xvm2 (class=stonith type=fence_xvm)
Attributes: pcmk_host_map=node2:fastvm-rhel-8.0-24
Operations: monitor interval=60s (xvm2-monitor-interval-60s)

# Request and initiate stonith action against node2
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (unpack_rsc_op_failure@unpack.c:3027) warning: Unexpected result (error) was recorded for monitor of dummy1 on node2 at Mar 19 17:53:44 2021 | rc=1 id=dummy1_last_failure_0
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (pe_fence_node@unpack.c:162) warning: Cluster node node2 will be fenced: dummy1 failed there
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (stage6@pcmk_sched_allocate.c:1652) warning: Scheduling Node node2 for STONITH
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-controld [333021] (te_fence_node@controld_fencing.c:859) notice: Requesting fencing (reboot) of node node2 | action=6 timeout=60000
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (can_fence_host_with_device@fenced_commands.c:1738) notice: xvm2 is eligible to fence (reboot) node2 (aka. 'fastvm-rhel-8.0-24'): static-list
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (call_remote_stonith@fenced_remote.c:1544) info: Total timeout set to 60 for peer's fencing targeting node2 for pacemaker-controld.333021|id=0db9abc2
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (call_remote_stonith@fenced_remote.c:1586) notice: Requesting that node1 perform 'reboot' action targeting node2 | for client pacemaker-controld.333021 (72s, 0s)
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (schedule_stonith_command@fenced_commands.c:506) debug: Scheduling 'reboot' action targeting node2 using xvm2 for remote peer node1 with op id 0db9abc2 and timeout 60s
Mar 19 17:53:44 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (fork_cb@fenced_commands.c:321) debug: Operation 'reboot' [333087] targeting node2 using xvm2 now running with 60s timeout

# Reboot succeeds
Mar 19 17:53:46 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (log_operation@fenced_commands.c:2076) notice: Operation 'reboot' [333087] (call 6 from pacemaker-controld.333021) targeting node2 using xvm2 returned 0 (OK)
Mar 19 17:53:46 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (stonith_send_async_reply@fenced_commands.c:2128) trace: Directed reply to node1
Mar 19 17:53:46 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (stonith_device_execute@fenced_commands.c:373) trace: No actions using xvm2 are needed

# Token loss occurs (corosync.log)
Mar 19 17:54:00 [1729] fastvm-rhel-8-0-23 corosync info [KNET ] link: host: 2 link: 0 is down
Mar 19 17:54:00 [1729] fastvm-rhel-8-0-23 corosync info [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Mar 19 17:54:00 [1729] fastvm-rhel-8-0-23 corosync warning [KNET ] host: host: 2 has no active links
Mar 19 17:54:14 [1729] fastvm-rhel-8-0-23 corosync notice [TOTEM ] Token has not been received in 30000 ms
Mar 19 17:54:24 [1729] fastvm-rhel-8-0-23 corosync notice [TOTEM ] A processor failed, forming new configuration.

# Action times out 72 seconds after it began
Mar 19 17:54:56 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (remote_op_query_timeout@fenced_remote.c:647) debug: Operation 0db9abc2 targeting node2 already in progress
Mar 19 17:54:56 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (remote_op_timeout@fenced_remote.c:618) debug: Action 'reboot' targeting node2 for client pacemaker-controld.333021 timed out | id=0db9abc2
Mar 19 17:54:56 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (stonith_bcast_result_to_peers@fenced_remote.c:383) trace: Broadcasting result to peers

# node2 leaves the CPG membership and the action finishes failing
# corosync.log
Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync notice [TOTEM ] A new membership (1.115e2) was formed. Members left: 2
Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync notice [TOTEM ] Failed to receive the leave message. failed: 2
Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync warning [CPG ] downlist left_list: 1 received
Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync notice [QUORUM] Members[1]: 1
Mar 19 17:55:12 [1729] fastvm-rhel-8-0-23 corosync notice [MAIN ] Completed service synchronization, ready to provide service.

# pacemaker.log
Mar 19 17:55:12 fastvm-rhel-8-0-23 pacemaker-attrd [333019] (pcmk_cpg_membership@cpg.c:673) info: Group attrd event 2: node2 (node 2 pid 1690) left via cluster exit
Mar 19 17:55:12 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (remote_op_done@fenced_remote.c:549) error: Operation 'reboot' targeting node2 by node1 for pacemaker-controld.333021@node1: Timer expired | id=0db9abc2
Mar 19 17:55:12 fastvm-rhel-8-0-23 pacemaker-fenced [333017] (remote_op_done@fenced_remote.c:503) error: Already sent notifications for 'reboot' targeting node2 by node1 for client pacemaker-controld.333021@node1: OK | rc=0 state=completed id=0db9abc2
Mar 19 17:55:12 fastvm-rhel-8-0-23 pacemaker-controld [333021] (tengine_stonith_notify@controld_fencing.c:511) notice: Peer node2 was not terminated (reboot) by node1 on behalf of pacemaker-controld.333021: Timer expired | initiator=node1 ref=0db9abc2-2b07-4048-9746-4bceee5528e3

# node2 gets fenced again
Mar 19 17:55:13 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (pe_fence_node@unpack.c:162) warning: Cluster node node2 will be fenced: peer is no longer part of the cluster
Mar 19 17:55:13 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (unpack_rsc_op_failure@unpack.c:3027) warning: Unexpected result (error) was recorded for monitor of dummy1 on node2 at Mar 19 17:53:44 2021 | rc=1 id=dummy1_last_failure_0
Mar 19 17:55:13 fastvm-rhel-8-0-23 pacemaker-schedulerd[333020] (stage6@pcmk_sched_allocate.c:1652) warning: Scheduling Node node2 for STONITH

Version-Release number of selected component (if applicable):

Using upstream master

RHEL release would be pacemaker-2.0.4-6.el8_3.1.

How reproducible:

Always

Steps to Reproduce:
1. Set the Corosync token timeout to 40000.

This is a nice round number and testing should be consistent this way, but I was able to reproduce the issue with at least as low as 34000.
2. Disable cluster services from starting on boot on the node to be fenced (`pcs cluster disable`).
3. Cause a resource operation with on-fail=fence to fail on one node.
This was the first that came to mind without relying on stonith_admin, which can sometimes behave differently compared to automatic fencing.

Actual results:

The stonith action times out despite the node getting fenced. The node gets rebooted twice, as pacemaker tries to fence it a second time.

Expected results:

The stonith action succeeds the first time, provided that the fenced node gets rebooted successfully.

Additional info:

This is also reproducible on RHEL 7 (which is where it was observed in a customer environment), but it doesn't seem worthwhile to backport a fix unless it's fairly trivial to do so. This is a bit of a corner case.

external trackers

Red Hat Customer Portal 02892553

Red Hat Customer Portal 02924716

Red Hat Customer Portal 02955913

Red Hat Issue Tracker RHELPLAN-71397

Red Hat Knowledge Base (Solution) 5895361

links to

ClusterLabs T373

(1 links to)