-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-9.4.z
-
None
-
No
-
Moderate
-
rhel-sst-high-availability
-
3
-
False
-
-
None
-
None
-
None
-
None
-
None
What were you trying to do that didn't work?
For customer's environment, they were not doing anything special, when the issue is observed and problem occurs with standard sbd watchdog type fencing ( no poison-pill / storage devices ). I have additionally been able to reproduce the issue in some cases and with standard configuration:
- Administrative Procedures for RHEL High Availability clusters - Enabling sbd fencing in RHEL 7 and 8
The issue does not manifest in all cases though, and seems like in some environments a standard watchdog sbd config doesn't trigger the leak. I have found some additional steps that seem to make a similar issue occur in any environment ( discussed further later ).
What is the impact of this issue to you?
Impacts to the customer are very low as the leak is very slow. For my own reproducer environment, I wrote a script to track the change over time. My reproducer shows a 128 kb increase every ~95s ( around 100 Mb a day ), and the customer shows a similar rate of growth:
rss updated from 2491128 to 2491256 ( diff of 128 ) at 10/07/24-13:34:58 ( diff of 95 sec ) rss updated from 2491256 to 2491384 ( diff of 128 ) at 10/07/24-13:36:37 ( diff of 99 sec ) rss updated from 2491384 to 2491512 ( diff of 128 ) at 10/07/24-13:38:12 ( diff of 95 sec ) rss updated from 2491512 to 2491768 ( diff of 256 ) at 10/07/24-13:39:50 ( diff of 98 sec ) rss updated from 2491768 to 2491896 ( diff of 128 ) at 10/07/24-13:41:25 ( diff of 95 sec ) rss updated from 2491896 to 2492024 ( diff of 128 ) at 10/07/24-13:43:00 ( diff of 95 sec ) rss updated from 2492024 to 2492152 ( diff of 128 ) at 10/07/24-13:44:35 ( diff of 95 sec ) -----------------------------------------8<----------------------------------------- rss updated from 2511864 to 2511992 ( diff of 128 ) at 10/07/24-17:43:55 ( diff of 95 sec ) rss updated from 2511992 to 2512120 ( diff of 128 ) at 10/07/24-17:45:30 ( diff of 95 sec ) rss updated from 2512120 to 2512248 ( diff of 128 ) at 10/07/24-17:47:09 ( diff of 99 sec ) rss updated from 2512248 to 2512376 ( diff of 128 ) at 10/07/24-17:48:44 ( diff of 95 sec ) rss updated from 2512376 to 2512504 ( diff of 128 ) at 10/07/24-17:50:19 ( diff of 95 sec ) rss updated from 2512504 to 2512632 ( diff of 128 ) at 10/07/24-17:51:54 ( diff of 95 sec )
Another view of the "ps" output, we can see the RSS for the "sbd watcher: Pacemaker" grow very large eventually and continue to grow:
From "ps" output:
Mon Jun 17 12:01:01 PM EEST 2024 VVV USER PID PPID %CPU %MEM VSZ RSS TT STAT STARTED TIME WCHAN COMMAND root 914 1 0.0 0.2 20808 20456 ? SL Jun 14 00:01:13 do_sigtimedwait sbd: inquisitor root 915 914 0.2 5.5 450088 449472 ? SL Jun 14 00:07:50 do_poll.constprop.0 sbd: watcher: Pacemaker <---- root 916 914 0.0 0.5 43288 42884 ? SL Jun 14 00:00:50 do_poll.constprop.0 sbd: watcher: Cluster Mon Jun 17 12:01:21 PM EEST 2024 USER PID PPID %CPU %MEM VSZ RSS TT STAT STARTED TIME WCHAN COMMAND root 914 1 0.0 0.2 20808 20456 ? SL Jun 14 00:01:13 do_sigtimedwait sbd: inquisitor root 915 914 0.2 5.5 450236 449728 ? SL Jun 14 00:07:50 do_poll.constprop.0 sbd: watcher: Pacemaker <<<=== root 916 914 0.0 0.5 43288 42884 ? SL Jun 14 00:00:50 do_poll.constprop.0 sbd: watcher: Cluster
This memory is never reclaimed and just grows continuously, so given enough time it could potentially lead to low memory conditions or possible OOM's.
Please provide the package NVR for which the bug is seen:
$ grep -e kernel-[0-9] -e pacemaker-[0-9] -e sbd-[0-9] */installed-rpms fence-agents-sbd-4.10.0-62.el9.noarch Tue May 28 12:22:08 2024 kernel-5.14.0-427.18.1.el9_4.x86_64 Tue May 28 12:04:02 2024 pacemaker-2.1.7-5.el9_4.x86_64 Tue May 28 12:22:09 2024 sbd-1.5.1-3.el9.x86_64 Wed Jun 19 09:27:52 2024
How reproducible is this bug?:
Issue happens in some cases with standard watchdog config ( but not all ). From what I can tell a similar leak issue is 100% reproducible in any environment by configuring the server to use a `softdog` watchdog timer, and then blacklisting the `softdog` module so the device isn't actually able to be used.
Steps to reproduce
1. Configure a pacemaker cluster with sbd watchdog type fencing and configure this to use a watchdog type stonith device:
# wdctl Device: /dev/watchdog0 <--- use this device Identity: Software Watchdog [version 0] <--- module used Timeout: 60 seconds Pre-timeout: 0 seconds FLAG DESCRIPTION STATUS BOOT-STATUS KEEPALIVEPING Keep alive ping reply 1 0 MAGICCLOSE Supports magic close char 0 0 SETTIMEOUT Set timeout (in seconds) 0 0
2. After configuration, additionally blacklist the "softdog" kernel module on one or both nodes, and reboot the node so the watchdog doesn't start on boot.
# cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-427.16.1.el9_4.x86_64 root=/dev/mapper/rhel-root ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet softdog.blacklist=1 rd.driver.blacklist=softdog
The nodes with the module blacklisted will see a steady growth of RSS overtime, which is never reclaimed.
Expected results
For the sake of this Jira / issue, the expected results is that all memory allocations from sbd threads should additional be reclaimed.
The identified reproducer though does bring up a possible secondary issue though, and sbd fencing probably shouldn't come up if the watchdog device is not available.
Actual results
Memory grows over time for the "sbd watcher: Pacemaker" thread.
Additional Notes:
- Memory growth is reported as heap ( observed from `/proc/<pid>/smaps` ):
55d4928c0000-55d4b0b97000 rw-p 00000000 00:00 0 [heap] Size: 494428 kB 7f0a9b636000-7f0a9b7be000 r-xp 00036000 fd:02 8758135 /usr/lib64/libgnutls.so.30.37.1 Size: 1568 kB 7f0a9ba28000-7f0a9bb9d000 r-xp 00028000 fd:02 8444269 /usr/lib64/libc.so.6 Size: 1492 kB 7f0a9b02e000-7f0a9b165000 r--p 00049000 fd:02 8444767 /usr/lib64/libunistring.so.2.1.0 Size: 1244 kB 7f0a9b4a6000-7f0a9b5ab000 r-xp 0002f000 fd:02 8444703 /usr/lib64/libxml2.so.2.9.13 Size: 1044 kB 7f0a9ac59000-7f0a9ad3c000 r-xp 0000e000 fd:02 8445317 /usr/lib64/libgcrypt.so.20.4.0 Size: 908 kB 7f0a9b1c0000-7f0a9b293000 r-xp 00035000 fd:02 8445361 /usr/lib64/libp11-kit.so.0.3.1 Size: 844 kB 7f0a9ab79000-7f0a9ac38000 r-xp 00005000 fd:02 8444606 /usr/lib64/libzstd.so.1.5.1 Size: 764 kB 7f0a9bc7c000-7f0a9bd0d000 r-xp 0001d000 fd:02 8758148 /usr/lib64/libglib-2.0.so.0.6800.4 Size: 580 kB 7f0a9bd0d000-7f0a9bd96000 r--p 000ae000 fd:02 8758148 /usr/lib64/libglib-2.0.so.0.6800.4 Size: 548 kB
- Checking ABRT core dumps around the growing thread, the problem looks more like stack growth and allocations. From captured cor dumps against the "sbd: Watcher: pacemaker" thread the stack is just filled with "xmlStringText" objects. Below is brief example but there are a lot more of these observed ( just reading values saved to stack address and up to listed heap size ):
0x7ffe69788a68: 0x7fee216df1fc <set_unescape_error.constprop.0+300> 0x55712f3a6600 0x7ffe69788a98: 0x7fee21699030 <g_option_context_parse+2432> 0x55712f36a7b0 0x7ffe69788ab8: 0x2eeab37a 0x7fee2176e4c0 <glib_mem_vtable> 0x7ffe69788ae8: 0x55712ebe1160 <__func__.16> 0x0 0x7ffe69788b08: 0x7fee216895a3 <g_main_context_unref+163> 0x0 0x7ffe69788b18: 0x55712ebd7f80 <mon_shutdown> 0x55712ebe1160 <__func__.16> 0x7ffe69788b28: 0x55712ebda8e7 <servant_start+1943> 0x3ec 0x7ffe69788b38: 0xc8c974f62767a900 0x55712ebe1270 <__func__.5.lto_priv.0> 0x7ffe69788b58: 0x1 0x55712ebe1270 <__func__.5.lto_priv.0> 0x7ffe69788b78: 0x7fee21326319 <qb_log_from_external_source+169> 0x3000000030 0x55712f603cc0: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f603cc0: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f6041c0: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f6043c0: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f604b20: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f604be0: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f604c60: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f604d20: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f604eb0: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f605350: 0x632e6b6361706e75 0x7fee215faf00 <main_arena+640> 0x55712f605450: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f6054c0: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f605520: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f605610: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f605880: 0x632e6b6361706e75 0x7fee215fad00 <main_arena+128> 0x55712f605ca0: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f605d00: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f605d70: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f605de0: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f605e40: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f605e80: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f606190: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f606210: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f606290: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f606310: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f606470: 0x632e6b6361706e75 0x7fee215fac00 <_IO_wide_data_0+128> 0x55712f606b50: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f6072f0: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f607390: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f607560: 0x7fee20f9ceb0 <xmlStringText> 0x0 0x55712f607830: 0x55719e34c3c0 0x7fee213beaf0 <pcmk__str_hash> 0x55712f607840: 0x7fee2166d390 <g_date_fill_parse_tokens+272> 0x800000000 0x55712f607850: 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x55712f607890: 0x55719e3d2380 0x7fee213beaf0 <pcmk__str_hash> 0x55712f6078a0: 0x7fee2166d390 <g_date_fill_parse_tokens+272> 0x0 0x55712f6078b0: 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x7fee217c8f60 <pe__free_digests> 0x55712f6078f0: 0x55719e3d2420 0x7fee213beaf0 <pcmk__str_hash> 0x55712f607900: 0x7fee2166d390 <g_date_fill_parse_tokens+272> 0x0 0x55712f607910: 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x55712f607950: 0x55719e34c120 0x7fee213beaf0 <pcmk__str_hash> 0x55712f607960: 0x7fee2166d390 <g_date_fill_parse_tokens+272> 0xb00000000 0x55712f607970: 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x55712f6079b0: 0x55719e38c480 0x7fee213beaf0 <pcmk__str_hash> 0x55712f6079c0: 0x7fee2166d390 <g_date_fill_parse_tokens+272> 0x0 0x55712f6079d0: 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x7fee217c8f60 <pe__free_digests> 0x55712f607a10: 0x55719e34c360 0x7fee213beaf0 <pcmk__str_hash> 0x55712f607a20: 0x7fee2166d390 <g_date_fill_parse_tokens+272> 0x0 0x55712f607a30: 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x7fee21499ce0 <_IO_flush_all_lockp+128> 0x55712f607a70: 0x55719e3a8f70 0x7fee213beaf0 <pcmk__str_hash>
- I have not been able to find out what's causing these allocations quite yet, and unfortunately ran out of time to take investigation any further.
- Standard memory leak tracing tools unfortunately don't work well here. Valgrind seems to be an issue for pacemaker in general ( just reading through the codes ), this sbd area is especially difficult though because if anything interrupts sbd process from running we will get a reboot from watchdog timer expiring.
- rhn-engineering-kwenning++ Has recommended that we install upstream pacemaker package below and confirm if the issue is present there. I haven't had the opportunity to this yes, but we should be able to do this soon with reproducer if we still want to go this route: