Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-60135

deadlock during cleanAllRuv

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • 389-ds-base-2.6.1-1.el9
    • No
    • Important
    • ZStream
    • rhel-idm-ds
    • ssg_idm
    • 26
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • Yes
    • None
    • Approved Blocker
    • Bug Fix
    • Hide
      .`cleanAllRUV` no longer blocks itself

      Before this update, when you ran the `cleanAllRUV` task after a replica deletion from replication topology, the task was trying to update the replication configuration entry while the same task was purging the replication changelog of the old replica ID (`rid`). As a result, the server was unresponsive.

      With this update, `cleanAllRUV` cleans up the replication configuration only after the changelog purging is complete.
      Show
      .`cleanAllRUV` no longer blocks itself Before this update, when you ran the `cleanAllRUV` task after a replica deletion from replication topology, the task was trying to update the replication configuration entry while the same task was purging the replication changelog of the old replica ID (`rid`). As a result, the server was unresponsive. With this update, `cleanAllRUV` cleans up the replication configuration only after the changelog purging is complete.
    • Done
    • x86_64
    • None

      What were you trying to do that didn't work?

      This issue typically happens right after an IPA replica is deleted ( ipa server-del ).
      The deletion process triggers the purging of replicaID(s) used by the removed replica.
      The purging thread uses a high amount of CPU and the LDAP won't respond to requests.
      Killing the LDAP server is sometimes the only option to recover.
      A few customers are noticing this behaviour ( a common pattern is the RHEL version that is 9.4 ).

      A couple of stacktraces from 2 different servers: 

         PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1033655 dirsrv    20   0 1805076 546380 327680 R  99.9   9.4   2:25.51 ns-slapdThread 43 (Thread 0x7f0ba44af640 (LWP 1033655) "ns-slapd"):
      #0  0x00007f0bc6015c75 in __db_tas_mutex_lock_int () at target:/lib64/libdb-5.3.so
      #1  0x00007f0bc60d07fc in __db_cursor_int () at target:/lib64/libdb-5.3.so
      #2  0x00007f0bc60d3481 in __dbc_idup () at target:/lib64/libdb-5.3.so
      #3  0x00007f0bc60d3cb6 in __dbc_iget () at target:/lib64/libdb-5.3.so
      #4  0x00007f0bc60e25e1 in __dbc_get_pp () at target:/lib64/libdb-5.3.so
      #5  0x00007f0bc623e304 in bdb_dblayer_cursor_iterate () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
      #6  0x00007f0bc815995c in _cl5Iterate () at target:/usr/lib64/dirsrv/plugins/libreplication-plugin.so
      #7  0x00007f0bc8159e3c in _cl5PurgeRID () at target:/usr/lib64/dirsrv/plugins/libreplication-plugin.so
      #8  0x00007f0bc815c98e in trigger_cl_purging_thread () at target:/usr/lib64/dirsrv/plugins/libreplication-plugin.so
      #9  0x00007f0bca76ebd4 in _pt_root () at target:/lib64/libnspr4.so
      #10 0x00007f0bca089c02 in start_thread () at target:/lib64/libc.so.6
      #11 0x00007f0bca10ec40 in clone3 () at target:/lib64/libc.so.6
      
      
          PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
       994148 dirsrv    20   0 1668720 251560 137984 R  99.9   4.3   1:36.64 ns-slapdThread 42 (Thread 0x7fe57ba61640 (LWP 994148) "ns-slapd"):
      #0  0x00007fe59e01db74 in __bamc_next () at target:/lib64/libdb-5.3.so
      #1  0x00007fe59e0220dc in __bamc_get () at target:/lib64/libdb-5.3.so
      #2  0x00007fe59e0d3d4d in __dbc_iget () at target:/lib64/libdb-5.3.so
      #3  0x00007fe59e0e25e1 in __dbc_get_pp () at target:/lib64/libdb-5.3.so
      #4  0x00007fe59e23e304 in bdb_dblayer_cursor_iterate () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
      #5  0x00007fe59df5195c in _cl5Iterate () at target:/usr/lib64/dirsrv/plugins/libreplication-plugin.so
      #6  0x00007fe59df51e3c in _cl5PurgeRID () at target:/usr/lib64/dirsrv/plugins/libreplication-plugin.so
      #7  0x00007fe59df5498e in trigger_cl_purging_thread () at target:/usr/lib64/dirsrv/plugins/libreplication-plugin.so
      #8  0x00007fe5a2761bd4 in _pt_root () at target:/lib64/libnspr4.so
      #9  0x00007fe5a2089c02 in start_thread () at target:/lib64/libc.so.6
      #10 0x00007fe5a210ec40 in clone3 () at target:/lib64/libc.so.6

      What is the impact of this issue to you?

      The LDAP server becomes unresponsive.

      Please provide the package NVR for which the bug is seen:

        cat etc/redhat-release 
      Red Hat Enterprise Linux release 9.4 (Plow)
      
        grep ^ipa installed-rpms 
      ipa-client-4.11.0-15.el9_4.x86_64                           Mon Jun 24 12:26:48 2024
      ipa-client-common-4.11.0-15.el9_4.noarch                    Mon Jun 24 12:26:32 2024
      ipa-common-4.11.0-15.el9_4.noarch                           Mon Jun 24 12:26:48 2024
      ipa-healthcheck-0.16-3.el9.noarch                           Mon May  6 11:54:23 2024
      ipa-healthcheck-core-0.16-3.el9.noarch                      Mon May  6 11:53:01 2024
      ipa-selinux-4.11.0-15.el9_4.noarch                          Mon Jun 24 12:26:40 2024
      ipa-server-4.11.0-15.el9_4.x86_64                           Mon Jun 24 12:27:10 2024
      ipa-server-common-4.11.0-15.el9_4.noarch                    Mon Jun 24 12:26:32 2024
      ipa-server-dns-4.11.0-15.el9_4.noarch                       Mon Jun 24 12:27:12 2024

      How reproducible is this bug?:

      Quite often on RHEL 9.4.

      Steps to reproduce

      1. Delete an IPA server
      2. Check the CPU usage of the LDAP server on other replicas in the topology
      3. Check for the pattern "CleanAllRUV" in the LDAP errors log

      Expected results

      Working LDAP server.

      Actual results

      Unresponsive LDAP server.

              rhn-engineering-mareynol Mark Reynolds
              rhn-support-tmihinto Têko Mihinto
              IdM DS Dev IdM DS Dev
              Viktor Ashirov Viktor Ashirov
              Evgenia Martyniuk Evgenia Martyniuk
              Votes:
              1 Vote for this issue
              Watchers:
              22 Start watching this issue

                Created:
                Updated:
                Resolved: