Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-57717

pmdastatsd random segfaults in pmdaRehash

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • rhel-9.6
    • rhel-8.10
    • pcp
    • None
    • pcp-6.3.2-1.el9
    • No
    • Moderate
    • 2
    • rhel-pt-pcp
    • ssg_platform_tools
    • 21
    • 2
    • QE ack, Dev ack
    • False
    • False
    • Hide

      None

      Show
      None
    • No
    • PCP Sprint 12, PCP Sprint 13
    • Unspecified Release Note Type - Unknown
    • x86_64
    • None

      What were you trying to do that didn't work?

      Sending approx 2k metrics from Satellite via statsd to PCP, a random segfault of pmdastatsd happens:

      Sep  5 21:50:53 pmoravec-sat615 kernel: pmdastatsd[921663]: segfault at 7fafc11a5018 ip 00007fafc90b7516 sp 00007ffda859a440 error 4 in libpcp_pmda.so.3[7fafc90ae000+18000]
      

      Please provide the package NVR for which bug is seen:

      pcp-pmda-statsd-5.3.7-20.el8_10.x86_64

      How reproducible:

      100% within a hour

      Steps to reproduce

      1. Install Satellite6.15
      2. Follow https://github.com/pmoravec/sat-perf-correlation
      3. Enable all Satellite metrics to send via statsd (modify the "Add needed allowed_labels" in Asnible playbook in the sat-perf-correlation by '.*' everywhere)
      4. Generate some load to Satellite (see reproducer below)
      5. Wait a hour.

      Expected results

      No segfault

      Actual results

      Segfault with backtrace:

      (gdb) thread apply all bt full
      
      Thread 4 (Thread 0x7f2d3980f700 (LWP 1039630)):
      #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x55997bf59960) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
              __ret = -512
              oldtype = 0
              err = <optimized out>
              oldtype = <optimized out>
              err = <optimized out>
              __ret = <optimized out>
              resultvar = <optimized out>
              __arg4 = <optimized out>
              __arg3 = <optimized out>
              __arg2 = <optimized out>
              __arg1 = <optimized out>
              _a4 = <optimized out>
              _a3 = <optimized out>
              _a2 = <optimized out>
              _a1 = <optimized out>
      #1  __pthread_cond_wait_common (abstime=0x0, mutex=0x55997bf59910, cond=0x55997bf59938) at pthread_cond_wait.c:502
              spin = 0
              buffer = {__routine = 0x7f2d40d56200 <__condvar_cleanup_waiting>, __arg = 0x7f2d3980eb70, __canceltype = 738206992, __prev = 0x0}
              cbuffer = {wseq = 1331888, cond = 0x55997bf59938, mutex = 0x55997bf59910, private = 0}
              rt = <optimized out>
              err = <optimized out>
              g = 0
              flags = <optimized out>
              g1_start = <optimized out>
              signals = <optimized out>
              result = 0
              wseq = 1331888
              seq = 665944
              private = 0
              maxspin = <optimized out>
              err = <optimized out>
              result = <optimized out>
              wseq = <optimized out>
              g = <optimized out>
              seq = <optimized out>
              flags = <optimized out>
              private = <optimized out>
              signals = <optimized out>
              g1_start = <optimized out>
              spin = <optimized out>
              buffer = <optimized out>
              cbuffer = <optimized out>
              rt = <optimized out>
              s = <optimized out>
      --Type <RET> for more, q to quit, c to continue without paging--
      #2  __pthread_cond_wait (cond=cond@entry=0x55997bf59938, mutex=mutex@entry=0x55997bf59910) at pthread_cond_wait.c:655
      No locals.
      #3  0x00007f2d40f6978e in buffered_chan_recv (data=0x7f2d3980ec40, chan=0x55997bf598b0) at src/chan.c:280
              msg = <optimized out>
              msg = <optimized out>
      #4  chan_recv (chan=chan@entry=0x55997bf598b0, data=data@entry=0x7f2d3980ec40) at src/chan.c:239
      No locals.
      #5  0x000055997a753424 in aggregator_exec (args=<optimized out>) at aggregators.c:61
              success_recv = <optimized out>
              config = 0x55997a961b80 <config>
              metrics_container = 0x55997bf559c0
              stats_container = 0x55997bf55a30
              parser_to_aggregator = 0x55997bf598b0
              message = 0x7f2d34019810
              t0 = {tv_sec = 97881, tv_nsec = 670904310}
              t1 = {tv_sec = 97881, tv_nsec = 670904821}
              time_spent_aggregating = 511
              should_exit = 0
      #6  0x00007f2d40d501ca in start_thread (arg=<optimized out>) at pthread_create.c:479
              ret = <optimized out>
              pd = <optimized out>
              unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139832215009024, -7359956588598194139, 140731863178062, 140731863178063, 0, 139832215006592, 7459752619671490597, 7459843141556572197}, 
                    mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
              not_first_call = <optimized out>
      #7  0x00007f2d3f11b8d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      No locals.
      
      Thread 3 (Thread 0x7f2d3a010700 (LWP 1039629)):
      #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x55997bf57834) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
              __ret = 0
              oldtype = 0
              err = <optimized out>
              oldtype = <optimized out>
              err = <optimized out>
              __ret = <optimized out>
              resultvar = <optimized out>
              __arg4 = <optimized out>
              __arg3 = <optimized out>
              __arg2 = <optimized out>
              __arg1 = <optimized out>
              _a4 = <optimized out>
              _a3 = <optimized out>
              _a2 = <optimized out>
              _a1 = <optimized out>
      #1  __pthread_cond_wait_common (abstime=0x0, mutex=0x55997bf577e0, cond=0x55997bf57808) at pthread_cond_wait.c:502
              spin = 0
              buffer = {__routine = 0x7f2d40d56200 <__condvar_cleanup_waiting>, __arg = 0x7f2d3a00fb70, __canceltype = 872512320, __prev = 0x0}
      --Type <RET> for more, q to quit, c to continue without paging--
              cbuffer = {wseq = 1395111, cond = 0x55997bf57808, mutex = 0x55997bf577e0, private = 0}
              rt = <optimized out>
              err = <optimized out>
              g = 1
              flags = <optimized out>
              g1_start = <optimized out>
              signals = <optimized out>
              result = 0
              wseq = 1395111
              seq = 697555
              private = 0
              maxspin = <optimized out>
              err = <optimized out>
              result = <optimized out>
              wseq = <optimized out>
              g = <optimized out>
              seq = <optimized out>
              flags = <optimized out>
              private = <optimized out>
              signals = <optimized out>
              g1_start = <optimized out>
              spin = <optimized out>
              buffer = <optimized out>
              cbuffer = <optimized out>
              rt = <optimized out>
              s = <optimized out>
      #2  __pthread_cond_wait (cond=cond@entry=0x55997bf57808, mutex=mutex@entry=0x55997bf577e0) at pthread_cond_wait.c:655
      No locals.
      #3  0x00007f2d40f6978e in buffered_chan_recv (data=0x7f2d3a00fc40, chan=0x55997bf57780) at src/chan.c:280
              msg = <optimized out>
              msg = <optimized out>
      #4  chan_recv (chan=chan@entry=0x55997bf57780, data=data@entry=0x7f2d3a00fc40) at src/chan.c:239
      No locals.
      #5  0x000055997a7574f0 in parser_exec (args=<optimized out>) at parsers.c:55
              success_recv = <optimized out>
              parsed = 0x7f2d2c005110
              tok = <optimized out>
              network_end_message = 0x55997a75d08f "PMDASTATSD_EXIT"
              config = <optimized out>
              network_listener_to_parser = <optimized out>
              parser_to_aggregator = <optimized out>
              parse_datagram = 0x55997a757d00 <basic_parser_parse>
              datagram = 0x7f2d3401a630
              delim = "\n"
              t0 = {tv_sec = 97881, tv_nsec = 670901044}
              t1 = {tv_sec = 97881, tv_nsec = 670901275}
              time_spent_parsing = <optimized out>
      --Type <RET> for more, q to quit, c to continue without paging--
              should_exit = 0
              message = <optimized out>
      #6  0x00007f2d40d501ca in start_thread (arg=<optimized out>) at pthread_create.c:479
              ret = <optimized out>
              pd = <optimized out>
              unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139832223401728, -7359956588598194139, 140731863178062, 140731863178063, 0, 139832223399296, 7459749320599736357, 7459843141556572197}, 
                    mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
              not_first_call = <optimized out>
      #7  0x00007f2d3f11b8d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      No locals.
      
      Thread 2 (Thread 0x7f2d3a811700 (LWP 1039628)):
      #0  0x00007f2d3f2171ff in __GI___select (nfds=nfds@entry=4, readfds=readfds@entry=0x7f2d3a810b70, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, 
          timeout=timeout@entry=0x7f2d3a810b30) at ../sysdeps/unix/sysv/linux/select.c:41
              resultvar = 18446744073709551102
              sc_cancel_oldtype = 0
              sc_ret = <optimized out>
      #1  0x000055997a75712e in network_listener_exec (args=<optimized out>) at network-listener.c:80
              end_message = 0x55997a75d08f "PMDASTATSD_EXIT"
              config = <optimized out>
              network_listener_to_parser = 0x55997bf57780
              hints = {ai_flags = 33, ai_family = 0, ai_socktype = 2, ai_protocol = 0, ai_addrlen = 0, ai_addr = 0x0, ai_canonname = 0x0, ai_next = 0x0}
              readfds = {fds_bits = {8, 0 <repeats 15 times>}}
              res = 0x7f2d340011e0
              port_buffer = "8125\000"
              err = <optimized out>
              fd = <optimized out>
              tv = {tv_sec = 0, tv_usec = 997461}
              max_udp_packet_size = 1472
              buffer = 0x7f2d34001420 ""
              src_addr = {ss_family = 2, __ss_padding = "\277\320\177\000\000\001", '\000' <repeats 111 times>, __ss_align = 0}
              src_addr_len = 16
              rv = <optimized out>
              datagram = <optimized out>
              length = <optimized out>
      #2  0x00007f2d40d501ca in start_thread (arg=<optimized out>) at pthread_create.c:479
              ret = <optimized out>
              pd = <optimized out>
              unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139832231794432, -7359956588598194139, 140731863178062, 140731863178063, 0, 139832231792000, 7459750404542107685, 7459843141556572197}, 
                    mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
              not_first_call = <optimized out>
      #3  0x00007f2d3f11b8d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      No locals.
      
      Thread 1 (Thread 0x7f2d4179d980 (LWP 1039627)):
      #0  0x00007f2d3f984516 in pmdaRehash () from /lib64/libpcp_pmda.so.3
      No symbol table info available.
      --Type <RET> for more, q to quit, c to continue without paging--
      #1  0x000055997a75926b in statsd_possible_reload (pmda=pmda@entry=0x55997bf54910) at pmda-callbacks.c:462
              data = 0x55997a961b00 <data>
              need_reload = <optimized out>
      #2  0x000055997a759b4b in statsd_fetch (num_pm_id=340, pm_id_list=0x55997bfb452c, resp=0x7ffeb0b69b98, pmda=0x55997bf54910) at pmda-callbacks.c:685
              data = 0x55997a961b00 <data>
      #3  0x00007f2d3f985d69 in __pmdaMainPDU () from /lib64/libpcp_pmda.so.3
      No symbol table info available.
      #4  0x000055997a752d48 in main_PDU_loop (dispatch=0x7ffeb0b69c40) at pmdastatsd.c:250
      No locals.
      #5  main (argc=3, argv=0x7ffeb0b69ee8) at pmdastatsd.c:417
              new_action = {__sigaction_handler = {sa_handler = 0x55997a75a380 <signal_handler>, sa_sigaction = 0x55997a75a380 <signal_handler>}, sa_mask = {__val = {0 <repeats 16 times>}}, 
                sa_flags = 536870912, sa_restorer = 0xa30}
              old_action = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0, 0, 0, 0, 1024, 0, 1725472837, 533999999, 1725472837, 533999999, 1725472837, 
                    533999999, 0, 0, 0, 94117698027536}}, sa_flags = 0, sa_restorer = 0x0}
              sep = <optimized out>
              dispatch = {domain = 57, comm = {pmda_interface = 7, pmapi_version = 2, flags = 0}, status = 0, version = {any = {ext = 0x55997bf54910, profile = 0x7f2d3f981960 <pmdaProfile>, 
                    fetch = 0x55997a759b20 <statsd_fetch>, desc = 0x55997a759870 <statsd_desc>, instance = 0x55997a759ae0 <statsd_instance>, text = 0x55997a7598a0 <statsd_text>, 
                    store = 0x7f2d3f983530 <pmdaStore>}, two = {ext = 0x55997bf54910, profile = 0x7f2d3f981960 <pmdaProfile>, fetch = 0x55997a759b20 <statsd_fetch>, 
                    desc = 0x55997a759870 <statsd_desc>, instance = 0x55997a759ae0 <statsd_instance>, text = 0x55997a7598a0 <statsd_text>, store = 0x7f2d3f983530 <pmdaStore>}, three = {
                    ext = 0x55997bf54910, profile = 0x7f2d3f981960 <pmdaProfile>, fetch = 0x55997a759b20 <statsd_fetch>, desc = 0x55997a759870 <statsd_desc>, 
                    instance = 0x55997a759ae0 <statsd_instance>, text = 0x55997a7598a0 <statsd_text>, store = 0x7f2d3f983530 <pmdaStore>}, four = {ext = 0x55997bf54910, 
                    profile = 0x7f2d3f981960 <pmdaProfile>, fetch = 0x55997a759b20 <statsd_fetch>, desc = 0x55997a759870 <statsd_desc>, instance = 0x55997a759ae0 <statsd_instance>, 
                    text = 0x55997a7598a0 <statsd_text>, store = 0x7f2d3f983530 <pmdaStore>, pmid = 0x55997a759b90 <statsd_pmid>, name = 0x55997a759be0 <statsd_name>, 
                    children = 0x55997a759c30 <statsd_children>}, five = {ext = 0x55997bf54910, profile = 0x7f2d3f981960 <pmdaProfile>, fetch = 0x55997a759b20 <statsd_fetch>, 
                    desc = 0x55997a759870 <statsd_desc>, instance = 0x55997a759ae0 <statsd_instance>, text = 0x55997a7598a0 <statsd_text>, store = 0x7f2d3f983530 <pmdaStore>, 
                    pmid = 0x55997a759b90 <statsd_pmid>, name = 0x55997a759be0 <statsd_name>, children = 0x55997a759c30 <statsd_children>}, six = {ext = 0x55997bf54910, 
                    profile = 0x7f2d3f981960 <pmdaProfile>, fetch = 0x55997a759b20 <statsd_fetch>, desc = 0x55997a759870 <statsd_desc>, instance = 0x55997a759ae0 <statsd_instance>, 
                    text = 0x55997a7598a0 <statsd_text>, store = 0x7f2d3f983530 <pmdaStore>, pmid = 0x55997a759b90 <statsd_pmid>, name = 0x55997a759be0 <statsd_name>, 
                    children = 0x55997a759c30 <statsd_children>, attribute = 0x7f2d3f9835a0 <pmdaAttribute>}, seven = {ext = 0x55997bf54910, profile = 0x7f2d3f981960 <pmdaProfile>, 
                    fetch = 0x55997a759b20 <statsd_fetch>, desc = 0x55997a759870 <statsd_desc>, instance = 0x55997a759ae0 <statsd_instance>, text = 0x55997a7598a0 <statsd_text>, 
                    store = 0x7f2d3f983530 <pmdaStore>, pmid = 0x55997a759b90 <statsd_pmid>, name = 0x55997a759be0 <statsd_name>, children = 0x55997a759c30 <statsd_children>, 
                    attribute = 0x7f2d3f9835a0 <pmdaAttribute>, label = 0x55997a759c90 <statsd_label>}}}
      (gdb) 
      

      Coredump and access to my reproducer system will be provided in next comment.

              nathans@redhat.com Nathan Scott
              rhn-support-pmoravec Pavel Moravec
              pcp-maint pcp-maint
              Jan Kurik Jan Kurik
              Tomas Capek Tomas Capek
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: