Loading...

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhel-9.4.z
Component/s: sbd
Labels:
None

Regression:
No
Severity:
Moderate

Pool Team:

rhel-sst-high-availability

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Experience:

PX Impact Score:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

For customer's environment, they were not doing anything special, when the issue is observed and problem occurs with standard sbd watchdog type fencing ( no poison-pill / storage devices ). I have additionally been able to reproduce the issue in some cases and with standard configuration:

Administrative Procedures for RHEL High Availability clusters - Enabling sbd fencing in RHEL 7 and 8

The issue does not manifest in all cases though, and seems like in some environments a standard watchdog sbd config doesn't trigger the leak. I have found some additional steps that seem to make a similar issue occur in any environment ( discussed further later ).

What is the impact of this issue to you?

Impacts to the customer are very low as the leak is very slow. For my own reproducer environment, I wrote a script to track the change over time. My reproducer shows a 128 kb increase every ~95s ( around 100 Mb a day ), and the customer shows a similar rate of growth:

rss updated from 2491128 to 2491256 ( diff of 128 ) at 10/07/24-13:34:58 ( diff of 95 sec )
rss updated from 2491256 to 2491384 ( diff of 128 ) at 10/07/24-13:36:37 ( diff of 99 sec )
rss updated from 2491384 to 2491512 ( diff of 128 ) at 10/07/24-13:38:12 ( diff of 95 sec )
rss updated from 2491512 to 2491768 ( diff of 256 ) at 10/07/24-13:39:50 ( diff of 98 sec )
rss updated from 2491768 to 2491896 ( diff of 128 ) at 10/07/24-13:41:25 ( diff of 95 sec )
rss updated from 2491896 to 2492024 ( diff of 128 ) at 10/07/24-13:43:00 ( diff of 95 sec )
rss updated from 2492024 to 2492152 ( diff of 128 ) at 10/07/24-13:44:35 ( diff of 95 sec )
-----------------------------------------8<----------------------------------------- 
rss updated from 2511864 to 2511992 ( diff of 128 ) at 10/07/24-17:43:55 ( diff of 95 sec )
rss updated from 2511992 to 2512120 ( diff of 128 ) at 10/07/24-17:45:30 ( diff of 95 sec )
rss updated from 2512120 to 2512248 ( diff of 128 ) at 10/07/24-17:47:09 ( diff of 99 sec )
rss updated from 2512248 to 2512376 ( diff of 128 ) at 10/07/24-17:48:44 ( diff of 95 sec )
rss updated from 2512376 to 2512504 ( diff of 128 ) at 10/07/24-17:50:19 ( diff of 95 sec )
rss updated from 2512504 to 2512632 ( diff of 128 ) at 10/07/24-17:51:54 ( diff of 95 sec )

Another view of the "ps" output, we can see the RSS for the "sbd watcher: Pacemaker" grow very large eventually and continue to grow:

From "ps" output:

 Mon Jun 17 12:01:01 PM EEST 2024 VVV
USER         PID    PPID %CPU %MEM    VSZ   RSS TT       STAT  STARTED     TIME WCHAN                            COMMAND
root         914       1  0.0  0.2  20808 20456 ?        SL     Jun 14 00:01:13 do_sigtimedwait                  sbd: inquisitor
root         915     914  0.2  5.5 450088 449472 ?       SL     Jun 14 00:07:50 do_poll.constprop.0              sbd: watcher: Pacemaker <----
root         916     914  0.0  0.5  43288 42884 ?        SL     Jun 14 00:00:50 do_poll.constprop.0              sbd: watcher: Cluster
Mon Jun 17 12:01:21 PM EEST 2024
USER         PID    PPID %CPU %MEM    VSZ   RSS TT       STAT  STARTED     TIME WCHAN                            COMMAND
root         914       1  0.0  0.2  20808 20456 ?        SL     Jun 14 00:01:13 do_sigtimedwait                  sbd: inquisitor
root         915     914  0.2  5.5 450236 449728 ?       SL     Jun 14 00:07:50 do_poll.constprop.0              sbd: watcher: Pacemaker    <<<===
root         916     914  0.0  0.5  43288 42884 ?        SL     Jun 14 00:00:50 do_poll.constprop.0              sbd: watcher: Cluster

This memory is never reclaimed and just grows continuously, so given enough time it could potentially lead to low memory conditions or possible OOM's.

Please provide the package NVR for which the bug is seen:

$ grep -e kernel-[0-9] -e pacemaker-[0-9] -e sbd-[0-9] */installed-rpms
fence-agents-sbd-4.10.0-62.el9.noarch                       Tue May 28 12:22:08 2024
kernel-5.14.0-427.18.1.el9_4.x86_64                         Tue May 28 12:04:02 2024
pacemaker-2.1.7-5.el9_4.x86_64                              Tue May 28 12:22:09 2024
sbd-1.5.1-3.el9.x86_64                                      Wed Jun 19 09:27:52 2024

How reproducible is this bug?:

Issue happens in some cases with standard watchdog config ( but not all ). From what I can tell a similar leak issue is 100% reproducible in any environment by configuring the server to use a `softdog` watchdog timer, and then blacklisting the `softdog` module so the device isn't actually able to be used.

Steps to reproduce

1. Configure a pacemaker cluster with sbd watchdog type fencing and configure this to use a watchdog type stonith device:

 # wdctl 
Device:        /dev/watchdog0 <--- use this device
Identity:      Software Watchdog [version 0] <--- module used
Timeout:       60 seconds
Pre-timeout:    0 seconds
FLAG           DESCRIPTION               STATUS BOOT-STATUS
KEEPALIVEPING  Keep alive ping reply          1           0
MAGICCLOSE     Supports magic close char      0           0
SETTIMEOUT     Set timeout (in seconds)       0           0

2. After configuration, additionally blacklist the "softdog" kernel module on one or both nodes, and reboot the node so the watchdog doesn't start on boot.

# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-427.16.1.el9_4.x86_64 root=/dev/mapper/rhel-root ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet softdog.blacklist=1 rd.driver.blacklist=softdog

The nodes with the module blacklisted will see a steady growth of RSS overtime, which is never reclaimed.

Expected results

For the sake of this Jira / issue, the expected results is that all memory allocations from sbd threads should additional be reclaimed.

The identified reproducer though does bring up a possible secondary issue though, and sbd fencing probably shouldn't come up if the watchdog device is not available.

Actual results

Memory grows over time for the "sbd watcher: Pacemaker" thread.

Additional Notes:

Memory growth is reported as heap ( observed from `/proc/<pid>/smaps` ):

55d4928c0000-55d4b0b97000  rw-p  00000000  00:00  0         [heap]                                                              Size:  494428  kB
7f0a9b636000-7f0a9b7be000  r-xp  00036000  fd:02  8758135   /usr/lib64/libgnutls.so.30.37.1                                     Size:  1568    kB
7f0a9ba28000-7f0a9bb9d000  r-xp  00028000  fd:02  8444269   /usr/lib64/libc.so.6                                                Size:  1492    kB
7f0a9b02e000-7f0a9b165000  r--p  00049000  fd:02  8444767   /usr/lib64/libunistring.so.2.1.0                                    Size:  1244    kB
7f0a9b4a6000-7f0a9b5ab000  r-xp  0002f000  fd:02  8444703   /usr/lib64/libxml2.so.2.9.13                                        Size:  1044    kB
7f0a9ac59000-7f0a9ad3c000  r-xp  0000e000  fd:02  8445317   /usr/lib64/libgcrypt.so.20.4.0                                      Size:  908     kB
7f0a9b1c0000-7f0a9b293000  r-xp  00035000  fd:02  8445361   /usr/lib64/libp11-kit.so.0.3.1                                      Size:  844     kB
7f0a9ab79000-7f0a9ac38000  r-xp  00005000  fd:02  8444606   /usr/lib64/libzstd.so.1.5.1                                         Size:  764     kB
7f0a9bc7c000-7f0a9bd0d000  r-xp  0001d000  fd:02  8758148   /usr/lib64/libglib-2.0.so.0.6800.4                                  Size:  580     kB
7f0a9bd0d000-7f0a9bd96000  r--p  000ae000  fd:02  8758148   /usr/lib64/libglib-2.0.so.0.6800.4                                  Size:  548     kB

Checking ABRT core dumps around the growing thread, the problem looks more like stack growth and allocations. From captured cor dumps against the "sbd: Watcher: pacemaker" thread the stack is just filled with "xmlStringText" objects. Below is brief example but there are a lot more of these observed ( just reading values saved to stack address and up to listed heap size ):

0x7ffe69788a68: 0x7fee216df1fc <set_unescape_error.constprop.0+300>     0x55712f3a6600
0x7ffe69788a98: 0x7fee21699030 <g_option_context_parse+2432>    0x55712f36a7b0
0x7ffe69788ab8: 0x2eeab37a      0x7fee2176e4c0 <glib_mem_vtable>
0x7ffe69788ae8: 0x55712ebe1160 <__func__.16>    0x0
0x7ffe69788b08: 0x7fee216895a3 <g_main_context_unref+163>       0x0
0x7ffe69788b18: 0x55712ebd7f80 <mon_shutdown>   0x55712ebe1160 <__func__.16>
0x7ffe69788b28: 0x55712ebda8e7 <servant_start+1943>     0x3ec
0x7ffe69788b38: 0xc8c974f62767a900      0x55712ebe1270 <__func__.5.lto_priv.0>
0x7ffe69788b58: 0x1     0x55712ebe1270 <__func__.5.lto_priv.0>
0x7ffe69788b78: 0x7fee21326319 <qb_log_from_external_source+169>        0x3000000030
0x55712f603cc0: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f603cc0: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f6041c0: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f6043c0: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f604b20: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f604be0: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f604c60: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f604d20: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f604eb0: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f605350: 0x632e6b6361706e75      0x7fee215faf00 <main_arena+640>
0x55712f605450: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f6054c0: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f605520: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f605610: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f605880: 0x632e6b6361706e75      0x7fee215fad00 <main_arena+128>
0x55712f605ca0: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f605d00: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f605d70: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f605de0: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f605e40: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f605e80: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f606190: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f606210: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f606290: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f606310: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f606470: 0x632e6b6361706e75      0x7fee215fac00 <_IO_wide_data_0+128>
0x55712f606b50: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f6072f0: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f607390: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f607560: 0x7fee20f9ceb0 <xmlStringText>  0x0
0x55712f607830: 0x55719e34c3c0  0x7fee213beaf0 <pcmk__str_hash>
0x55712f607840: 0x7fee2166d390 <g_date_fill_parse_tokens+272>   0x800000000
0x55712f607850: 0x7fee21499ce0 <_IO_flush_all_lockp+128>        0x7fee21499ce0 <_IO_flush_all_lockp+128>
0x55712f607890: 0x55719e3d2380  0x7fee213beaf0 <pcmk__str_hash>
0x55712f6078a0: 0x7fee2166d390 <g_date_fill_parse_tokens+272>   0x0
0x55712f6078b0: 0x7fee21499ce0 <_IO_flush_all_lockp+128>        0x7fee217c8f60 <pe__free_digests>
0x55712f6078f0: 0x55719e3d2420  0x7fee213beaf0 <pcmk__str_hash>
0x55712f607900: 0x7fee2166d390 <g_date_fill_parse_tokens+272>   0x0
0x55712f607910: 0x7fee21499ce0 <_IO_flush_all_lockp+128>        0x7fee21499ce0 <_IO_flush_all_lockp+128>
0x55712f607950: 0x55719e34c120  0x7fee213beaf0 <pcmk__str_hash>
0x55712f607960: 0x7fee2166d390 <g_date_fill_parse_tokens+272>   0xb00000000
0x55712f607970: 0x7fee21499ce0 <_IO_flush_all_lockp+128>        0x7fee21499ce0 <_IO_flush_all_lockp+128>
0x55712f6079b0: 0x55719e38c480  0x7fee213beaf0 <pcmk__str_hash>
0x55712f6079c0: 0x7fee2166d390 <g_date_fill_parse_tokens+272>   0x0
0x55712f6079d0: 0x7fee21499ce0 <_IO_flush_all_lockp+128>        0x7fee217c8f60 <pe__free_digests>
0x55712f607a10: 0x55719e34c360  0x7fee213beaf0 <pcmk__str_hash>
0x55712f607a20: 0x7fee2166d390 <g_date_fill_parse_tokens+272>   0x0
0x55712f607a30: 0x7fee21499ce0 <_IO_flush_all_lockp+128>        0x7fee21499ce0 <_IO_flush_all_lockp+128>
0x55712f607a70: 0x55719e3a8f70  0x7fee213beaf0 <pcmk__str_hash>

- I have not been able to find out what's causing these allocations quite yet, and unfortunately ran out of time to take investigation any further.

Standard memory leak tracing tools unfortunately don't work well here. Valgrind seems to be an issue for pacemaker in general ( just reading through the codes ), this sbd area is especially difficult though because if anything interrupts sbd process from running we will get a reboot from watchdog timer expiring.

rhn-engineering-kwenning++ Has recommended that we install upstream pacemaker package below and confirm if the issue is present there. I haven't had the opportunity to this yes, but we should be able to do this soon with reproducer if we still want to go this route:
- Index of /builds/pacemaker/rhel-9-x86-64/2.1/latest

links to

Memory Leak observed when using sbd watchdog type stonith

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Expected results

Actual results

Additional Notes:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates