-
Bug
-
Resolution: Done
-
Undefined
-
None
-
rhel-8.10, CentOS Stream 8, CentOS Stream 9, rhel-9.4
-
None
-
None
-
Important
-
rhel-sst-pt-libraries
-
ssg_platform_tools
-
None
-
False
-
-
None
-
CentOS Stream, Red Hat Enterprise Linux
-
None
-
None
-
None
-
x86_64
-
None
What were you trying to do that didn't work?
We preload own build jemalloc4 library to replace standard malloc implementation. Operation of our software requires loading of many shared libraries with dlopen(), about a hundred of libraries usually. It worked fine since early versions of RHEL 6, but with recent glibc update on RHEL 8.10 and RHEL 9 we now always get crashes after a number of successful dlopen() calls.
Please provide the package NVR for which bug is seen:
- latest glibc on RHEL 8: glibc-2.28-251.el8.2
- latest glibc on RHEL 9: glibc-2.34-113.el9
I was able to narrow down search for the culprit change to the case https://issues.redhat.com/browse/RHEL-14497, specifically the following commits:
- https://gitlab.com/redhat/centos-stream/rpms/glibc/-/commit/8514cc782e13e7f117eb5f254c69855f44b24aad in c8s branch
- https://gitlab.com/redhat/centos-stream/rpms/glibc/-/commit/2ea2e4b80215f5f1eb5146d5cab677b4357780e0 in c9s branch
Using own build of glibc from that repository, I get the following results on CentOS Stream 8:
- crash on glibc-2.28-238.el8
- no crash when downgraded to glibc-2.28-237.el8
and on CentOS Stream 9:
- crash on glibc-2.34-87.el9
- no crash when downgraded to glibc-2.34-86.el9
How reproducible:
It is always reproducible on vanilla RHEL 9 system with libmemusage.so from standard glibc-utils package. The same problem exists on CentOS Stream 8 and 9.
Please see attached script repro.sh. It is a minimal reproducible example of a problem we had in production with own build of jemalloc4 built using LLVM tools: Clang and LLD linker.
Curiously, it is reproducible with straightforward build of jemalloc4 library (latest stable-4 branch version) when GOLD or LLD linkers were used, but, surprisingly, not with BFD linker. Not reproducible with jemalloc5 (latest master branch version) no matter if GOLD, LLD, or BFD linker was used. I am not sure why it is so, but it seems like a useful detail to share for the investigation.
We use only x86_64 architecture, and I've tested only on this one architecture.
Steps to reproduce
- Run repro.sh script as root for convenience of installing dependency packages
- See either "Test passed!" or "Test failed!" message printed in the end.
When the test fails, it must be because of the crash, and thus core file would be generated.
Expected results
The test would always succeed on latest glibc versions, and it would never crash.
Actual results
The test fails due to crash on latest glibc versions.
Downgrade glibc version to latest known working version, and the test would pass:
# dnf downgrade glibc-2.34-86.el9
After that, upgrade glibc version to earliest known broken version, and the test would fail:
# dnf upgrade glibc-2.34-87.el9
Similarly on RHEL 8 fine on older glibc version:
# dnf downgrade glibc-2.28-237.el8
and crashes on newer glibc version:
# dnf upgrade glibc-2.28-238.el8
Additional details
Example of running the repro script: note how it always fails on loading of sixteenth shared library:
# ./repro.sh glibc version in use: glibc-2.34-113.el9.x86_64 Built! DEBUG: opening ./plugin1.so DEBUG: opening ./plugin2.so DEBUG: opening ./plugin3.so DEBUG: opening ./plugin4.so DEBUG: opening ./plugin5.so DEBUG: opening ./plugin6.so DEBUG: opening ./plugin7.so DEBUG: opening ./plugin8.so DEBUG: opening ./plugin9.so DEBUG: opening ./plugin10.so DEBUG: opening ./plugin11.so DEBUG: opening ./plugin12.so DEBUG: opening ./plugin13.so DEBUG: opening ./plugin14.so DEBUG: opening ./plugin15.so DEBUG: opening ./plugin16.so ./repro.sh: line 44: 47772 Segmentation fault (core dumped) LD_PRELOAD=/usr/lib64/libmemusage.so ./test Test failed!
Example of stacktrace: starting with call to malloc, there is endless recursion involing malloc, __tls_get_addr, and _dl_resize_dtv frames, resulting in crash after hitting a stack limit (like 244483 frames in this example):
$ tail gdb.txt #244473 0x00007f0f236cf044 in _dl_catch_exception () from /lib64/libc.so.6 #244474 0x00007f0f23ecd8c7 in dl_open_worker (a=0x7ffe27cbecb0) at dl-open.c:787 #244475 0x00007f0f236cf044 in _dl_catch_exception () from /lib64/libc.so.6 #244476 0x00007f0f23ecdb81 in _dl_open (file=0x7ffe27cbef60 "./plugin16.so", mode=-2147483646, caller_dlopen=0x400691 <main+81>, nsid=<optimized out>, argc=1, argv=<optimized out>, env=0x7ffe27cbf0b8) at dl-open.c:895 #244477 0x00007f0f2392cf8a in dlopen_doit () from /lib64/libdl.so.2 #244478 0x00007f0f236cf044 in _dl_catch_exception () from /lib64/libc.so.6 #244479 0x00007f0f236cf103 in _dl_catch_error () from /lib64/libc.so.6 #244480 0x00007f0f2392d52e in _dlerror_run () from /lib64/libdl.so.2 #244481 0x00007f0f2392d02a in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2 #244482 0x0000000000400691 in main () at test.c:9 $ head gdb.txt #0 0x00007f0f23b31b25 in malloc () from /usr/lib64/libmemusage.so #1 0x00007f0f23ed25b1 in malloc (size=528) at ../include/rtld-malloc.h:56 #2 _dl_resize_dtv (dtv=dtv@entry=0x7f0f240e30a0, max_modid=max_modid@entry=17) at ../elf/dl-tls.c:494 #3 0x00007f0f23ed30f2 in _dl_update_slotinfo (req_modid=1, new_gen=16) at ../elf/dl-tls.c:810 #4 0x00007f0f23ed31fc in update_get_addr (ti=0x7f0f23d33fc0, gen=<optimized out>) at ../elf/dl-tls.c:917 #5 0x00007f0f23ebee9c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55 #6 0x00007f0f23b315e6 in update_data () from /usr/lib64/libmemusage.so #7 0x00007f0f23b31bd3 in malloc () from /usr/lib64/libmemusage.so #8 0x00007f0f23ed25b1 in malloc (size=528) at ../include/rtld-malloc.h:56 #9 _dl_resize_dtv (dtv=dtv@entry=0x7f0f240e30a0, max_modid=max_modid@entry=17) at ../elf/dl-tls.c:494 $ wc -l gdb.txt 244483 gdb.txt
Similarly with jemalloc4:
# LD_PRELOAD=/root/libjemalloc.so.2-stable4-gold ./test DEBUG: opening ./plugin1.so DEBUG: opening ./plugin2.so DEBUG: opening ./plugin3.so DEBUG: opening ./plugin4.so DEBUG: opening ./plugin5.so DEBUG: opening ./plugin6.so DEBUG: opening ./plugin7.so DEBUG: opening ./plugin8.so DEBUG: opening ./plugin9.so DEBUG: opening ./plugin10.so DEBUG: opening ./plugin11.so DEBUG: opening ./plugin12.so DEBUG: opening ./plugin13.so DEBUG: opening ./plugin14.so DEBUG: opening ./plugin15.so DEBUG: opening ./plugin16.so Segmentation fault (core dumped)
Stacktrace:
#0 0x00007f12a5ccc196 in _dl_update_slotinfo (req_modid=1, new_gen=16) at ../elf/dl-tls.c:728 #1 0x00007f12a5ccc37c in update_get_addr (ti=0x7f12a5ca4fb0, gen=<optimized out>) at ../elf/dl-tls.c:922 #2 0x00007f12a5ccf61c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55 #3 0x00007f12a5c41bbb in je_tsd_fetch_impl (init=true) at include/jemalloc/internal/tsd.h:699 #4 je_tsd_fetch () at include/jemalloc/internal/tsd.h:718 #5 ialloc_body (slow_path=false, usize=<synthetic pointer>, tsdn=<synthetic pointer>, zero=false, size=528) at src/jemalloc.c:1588 #6 malloc (size=528) at src/jemalloc.c:1644 #7 0x00007f12a5ccb941 in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56 #8 _dl_resize_dtv (dtv=dtv@entry=0x7f12a5c155c0, max_modid=max_modid@entry=17) at ../elf/dl-tls.c:499 #9 0x00007f12a5ccc280 in _dl_update_slotinfo (req_modid=1, new_gen=16) at ../elf/dl-tls.c:815 #10 0x00007f12a5ccc37c in update_get_addr (ti=0x7f12a5ca4fb0, gen=<optimized out>) at ../elf/dl-tls.c:922 #11 0x00007f12a5ccf61c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55 #12 0x00007f12a5c41bbb in je_tsd_fetch_impl (init=true) at include/jemalloc/internal/tsd.h:699 #13 je_tsd_fetch () at include/jemalloc/internal/tsd.h:718 #14 ialloc_body (slow_path=false, usize=<synthetic pointer>, tsdn=<synthetic pointer>, zero=false, size=528) at src/jemalloc.c:1588 #15 malloc (size=528) at src/jemalloc.c:1644 #16 0x00007f12a5ccb941 in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56 #17 _dl_resize_dtv (dtv=dtv@entry=0x7f12a5c155c0, max_modid=max_modid@entry=17) at ../elf/dl-tls.c:499 #18 0x00007f12a5ccc280 in _dl_update_slotinfo (req_modid=1, new_gen=16) at ../elf/dl-tls.c:815 #19 0x00007f12a5ccc37c in update_get_addr (ti=0x7f12a5ca4fb0, gen=<optimized out>) at ../elf/dl-tls.c:922 #20 0x00007f12a5ccf61c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55 #21 0x00007f12a5c41bbb in je_tsd_fetch_impl (init=true) at include/jemalloc/internal/tsd.h:699 #22 je_tsd_fetch () at include/jemalloc/internal/tsd.h:718 #23 ialloc_body (slow_path=false, usize=<synthetic pointer>, tsdn=<synthetic pointer>, zero=false, size=528) at src/jemalloc.c:1588 #24 malloc (size=528) at src/jemalloc.c:1644 ...
I have also found https://bugzilla.redhat.com/show_bug.cgi?id=1878932 involving use of LD_AUDIT, but that works fine in my environment.
I have tried supplying different values (8 and 123456789) for glibc.rtld.optional_static_tls of GLIBC_TUNABLES family, but that doesn't make difference for the repro script: it still crashes no matter which value was set:
# GLIBC_TUNABLES=glibc.rtld.optional_static_tls=123456789 LD_PRELOAD=/usr/lib64/libmemusage.so ./test DEBUG: opening ./plugin1.so DEBUG: opening ./plugin2.so DEBUG: opening ./plugin3.so DEBUG: opening ./plugin4.so DEBUG: opening ./plugin5.so DEBUG: opening ./plugin6.so DEBUG: opening ./plugin7.so DEBUG: opening ./plugin8.so DEBUG: opening ./plugin9.so DEBUG: opening ./plugin10.so DEBUG: opening ./plugin11.so DEBUG: opening ./plugin12.so DEBUG: opening ./plugin13.so DEBUG: opening ./plugin14.so DEBUG: opening ./plugin15.so DEBUG: opening ./plugin16.so Segmentation fault (core dumped)
- duplicates
-
RHEL-39994 glibc: Add workaround for certain dynamic TLS usage in interposed malloc [rhel-8.10.z]
- Closed