Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-46634

Regression in glibc resulting in crash on dlopen() when libmemusage.so is preloaded

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • rhel-8.10, CentOS Stream 8, CentOS Stream 9, rhel-9.4
    • glibc
    • None
    • None
    • Important
    • sst_pt_libraries
    • ssg_platform_tools
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • CentOS Stream, Red Hat Enterprise Linux
    • None
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      We preload own build jemalloc4 library to replace standard malloc implementation. Operation of our software requires loading of many shared libraries with dlopen(), about a hundred of libraries usually. It worked fine since early versions of RHEL 6, but with recent glibc update on RHEL 8.10 and RHEL 9 we now always get crashes after a number of successful dlopen() calls.

      Please provide the package NVR for which bug is seen:

      • latest glibc on RHEL 8: glibc-2.28-251.el8.2
      • latest glibc on RHEL 9: glibc-2.34-113.el9

      I was able to narrow down search for the culprit change to the case https://issues.redhat.com/browse/RHEL-14497, specifically the following commits:

      Using own build of glibc from that repository, I get the following results on CentOS Stream 8:

      • crash on glibc-2.28-238.el8
      • no crash when downgraded to glibc-2.28-237.el8

      and on CentOS Stream 9:

      • crash on glibc-2.34-87.el9
      • no crash when downgraded to glibc-2.34-86.el9

      How reproducible:

      It is always reproducible on vanilla RHEL 9 system with libmemusage.so from standard glibc-utils package. The same problem exists on CentOS Stream 8 and 9.

      Please see attached script repro.sh. It is a minimal reproducible example of a problem we had in production with own build of jemalloc4 built using LLVM tools: Clang and LLD linker.

      Curiously, it is reproducible with straightforward build of jemalloc4 library (latest stable-4 branch version) when GOLD or LLD linkers were used, but, surprisingly, not with BFD linker. Not reproducible with jemalloc5 (latest master branch version) no matter if GOLD, LLD, or BFD linker was used. I am not sure why it is so, but it seems like a useful detail to share for the investigation.

      We use only x86_64 architecture, and I've tested only on this one architecture.

      Steps to reproduce

      1. Run repro.sh script as root for convenience of installing dependency packages
      2. See either "Test passed!" or "Test failed!" message printed in the end.

      When the test fails, it must be because of the crash, and thus core file would be generated.

      Expected results

      The test would always succeed on latest glibc versions, and it would never crash.

      Actual results

      The test fails due to crash on latest glibc versions.

      Downgrade glibc version to latest known working version, and the test would pass:

      # dnf downgrade glibc-2.34-86.el9
      

      After that, upgrade glibc version to earliest known broken version, and the test would fail:

      # dnf upgrade glibc-2.34-87.el9
      

      Similarly on RHEL 8 fine on older glibc version:

      # dnf downgrade glibc-2.28-237.el8
      

      and crashes on newer glibc version:

      # dnf upgrade glibc-2.28-238.el8
      

      Additional details

      Example of running the repro script: note how it always fails on loading of sixteenth shared library:

      # ./repro.sh
      glibc version in use:  glibc-2.34-113.el9.x86_64
      Built!
      DEBUG: opening ./plugin1.so
      DEBUG: opening ./plugin2.so
      DEBUG: opening ./plugin3.so
      DEBUG: opening ./plugin4.so
      DEBUG: opening ./plugin5.so
      DEBUG: opening ./plugin6.so
      DEBUG: opening ./plugin7.so
      DEBUG: opening ./plugin8.so
      DEBUG: opening ./plugin9.so
      DEBUG: opening ./plugin10.so
      DEBUG: opening ./plugin11.so
      DEBUG: opening ./plugin12.so
      DEBUG: opening ./plugin13.so
      DEBUG: opening ./plugin14.so
      DEBUG: opening ./plugin15.so
      DEBUG: opening ./plugin16.so
      ./repro.sh: line 44: 47772 Segmentation fault      (core dumped) LD_PRELOAD=/usr/lib64/libmemusage.so ./test
      Test failed!
      

      Example of stacktrace: starting with call to malloc, there is endless recursion involing malloc, __tls_get_addr, and _dl_resize_dtv frames, resulting in crash after hitting a stack limit (like 244483 frames in this example):

      $ tail gdb.txt
      #244473 0x00007f0f236cf044 in _dl_catch_exception () from /lib64/libc.so.6
      #244474 0x00007f0f23ecd8c7 in dl_open_worker (a=0x7ffe27cbecb0) at dl-open.c:787
      #244475 0x00007f0f236cf044 in _dl_catch_exception () from /lib64/libc.so.6
      #244476 0x00007f0f23ecdb81 in _dl_open (file=0x7ffe27cbef60 "./plugin16.so", mode=-2147483646, caller_dlopen=0x400691 <main+81>, nsid=<optimized out>, argc=1, argv=<optimized out>, env=0x7ffe27cbf0b8) at dl-open.c:895
      #244477 0x00007f0f2392cf8a in dlopen_doit () from /lib64/libdl.so.2
      #244478 0x00007f0f236cf044 in _dl_catch_exception () from /lib64/libc.so.6
      #244479 0x00007f0f236cf103 in _dl_catch_error () from /lib64/libc.so.6
      #244480 0x00007f0f2392d52e in _dlerror_run () from /lib64/libdl.so.2
      #244481 0x00007f0f2392d02a in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
      #244482 0x0000000000400691 in main () at test.c:9
      
      $ head gdb.txt
      #0  0x00007f0f23b31b25 in malloc () from /usr/lib64/libmemusage.so
      #1  0x00007f0f23ed25b1 in malloc (size=528) at ../include/rtld-malloc.h:56
      #2  _dl_resize_dtv (dtv=dtv@entry=0x7f0f240e30a0, max_modid=max_modid@entry=17) at ../elf/dl-tls.c:494
      #3  0x00007f0f23ed30f2 in _dl_update_slotinfo (req_modid=1, new_gen=16) at ../elf/dl-tls.c:810
      #4  0x00007f0f23ed31fc in update_get_addr (ti=0x7f0f23d33fc0, gen=<optimized out>) at ../elf/dl-tls.c:917
      #5  0x00007f0f23ebee9c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
      #6  0x00007f0f23b315e6 in update_data () from /usr/lib64/libmemusage.so
      #7  0x00007f0f23b31bd3 in malloc () from /usr/lib64/libmemusage.so
      #8  0x00007f0f23ed25b1 in malloc (size=528) at ../include/rtld-malloc.h:56
      #9  _dl_resize_dtv (dtv=dtv@entry=0x7f0f240e30a0, max_modid=max_modid@entry=17) at ../elf/dl-tls.c:494
      
      $ wc -l gdb.txt
      244483 gdb.txt
      

      Similarly with jemalloc4:

      # LD_PRELOAD=/root/libjemalloc.so.2-stable4-gold ./test
      DEBUG: opening ./plugin1.so
      DEBUG: opening ./plugin2.so
      DEBUG: opening ./plugin3.so
      DEBUG: opening ./plugin4.so
      DEBUG: opening ./plugin5.so
      DEBUG: opening ./plugin6.so
      DEBUG: opening ./plugin7.so
      DEBUG: opening ./plugin8.so
      DEBUG: opening ./plugin9.so
      DEBUG: opening ./plugin10.so
      DEBUG: opening ./plugin11.so
      DEBUG: opening ./plugin12.so
      DEBUG: opening ./plugin13.so
      DEBUG: opening ./plugin14.so
      DEBUG: opening ./plugin15.so
      DEBUG: opening ./plugin16.so
      Segmentation fault (core dumped)
      

      Stacktrace:

      #0  0x00007f12a5ccc196 in _dl_update_slotinfo (req_modid=1, new_gen=16) at ../elf/dl-tls.c:728
      #1  0x00007f12a5ccc37c in update_get_addr (ti=0x7f12a5ca4fb0, gen=<optimized out>) at ../elf/dl-tls.c:922
      #2  0x00007f12a5ccf61c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
      #3  0x00007f12a5c41bbb in je_tsd_fetch_impl (init=true) at include/jemalloc/internal/tsd.h:699
      #4  je_tsd_fetch () at include/jemalloc/internal/tsd.h:718
      #5  ialloc_body (slow_path=false, usize=<synthetic pointer>, tsdn=<synthetic pointer>, zero=false, size=528) at src/jemalloc.c:1588
      #6  malloc (size=528) at src/jemalloc.c:1644
      #7  0x00007f12a5ccb941 in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56
      #8  _dl_resize_dtv (dtv=dtv@entry=0x7f12a5c155c0, max_modid=max_modid@entry=17) at ../elf/dl-tls.c:499
      #9  0x00007f12a5ccc280 in _dl_update_slotinfo (req_modid=1, new_gen=16) at ../elf/dl-tls.c:815
      #10 0x00007f12a5ccc37c in update_get_addr (ti=0x7f12a5ca4fb0, gen=<optimized out>) at ../elf/dl-tls.c:922
      #11 0x00007f12a5ccf61c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
      #12 0x00007f12a5c41bbb in je_tsd_fetch_impl (init=true) at include/jemalloc/internal/tsd.h:699
      #13 je_tsd_fetch () at include/jemalloc/internal/tsd.h:718
      #14 ialloc_body (slow_path=false, usize=<synthetic pointer>, tsdn=<synthetic pointer>, zero=false, size=528) at src/jemalloc.c:1588
      #15 malloc (size=528) at src/jemalloc.c:1644
      #16 0x00007f12a5ccb941 in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56
      #17 _dl_resize_dtv (dtv=dtv@entry=0x7f12a5c155c0, max_modid=max_modid@entry=17) at ../elf/dl-tls.c:499
      #18 0x00007f12a5ccc280 in _dl_update_slotinfo (req_modid=1, new_gen=16) at ../elf/dl-tls.c:815
      #19 0x00007f12a5ccc37c in update_get_addr (ti=0x7f12a5ca4fb0, gen=<optimized out>) at ../elf/dl-tls.c:922
      #20 0x00007f12a5ccf61c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
      #21 0x00007f12a5c41bbb in je_tsd_fetch_impl (init=true) at include/jemalloc/internal/tsd.h:699
      #22 je_tsd_fetch () at include/jemalloc/internal/tsd.h:718
      #23 ialloc_body (slow_path=false, usize=<synthetic pointer>, tsdn=<synthetic pointer>, zero=false, size=528) at src/jemalloc.c:1588
      #24 malloc (size=528) at src/jemalloc.c:1644
      ...
      

      I have also found https://bugzilla.redhat.com/show_bug.cgi?id=1878932 involving use of LD_AUDIT, but that works fine in my environment.

      I have tried supplying different values (8 and 123456789) for glibc.rtld.optional_static_tls of GLIBC_TUNABLES family, but that doesn't make difference for the repro script: it still crashes no matter which value was set:

      # GLIBC_TUNABLES=glibc.rtld.optional_static_tls=123456789 LD_PRELOAD=/usr/lib64/libmemusage.so ./test
      DEBUG: opening ./plugin1.so
      DEBUG: opening ./plugin2.so
      DEBUG: opening ./plugin3.so
      DEBUG: opening ./plugin4.so
      DEBUG: opening ./plugin5.so
      DEBUG: opening ./plugin6.so
      DEBUG: opening ./plugin7.so
      DEBUG: opening ./plugin8.so
      DEBUG: opening ./plugin9.so
      DEBUG: opening ./plugin10.so
      DEBUG: opening ./plugin11.so
      DEBUG: opening ./plugin12.so
      DEBUG: opening ./plugin13.so
      DEBUG: opening ./plugin14.so
      DEBUG: opening ./plugin15.so
      DEBUG: opening ./plugin16.so
      Segmentation fault (core dumped)
      

            glibc-bugzilla Platform Tools - Libraries Bot
            antonsmyk_br Anton Smyk
            Platform Tools - Libraries Bot Platform Tools - Libraries Bot
            qe-baseos-tools-bugs@redhat.com qe-baseos-tools-bugs@redhat.com qe-baseos-tools-bugs@redhat.com qe-baseos-tools-bugs@redhat.com
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: