Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-32984

Recurring crashes in 'sqlite3WalFindFrame.constprop.0' (sqlite3.c) on fapolicyd

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhel-9.3.0, rhel-9.4
    • fapolicyd
    • None
    • None
    • Important
    • rhel-sst-security-special-projects
    • ssg_security
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None
    • None
    • x86_64
    • None

       What were you trying to do that didn't work?

      fapolicyd crashes while reloading, crash indicates the SQLite database may be the culprit and leaves the system in an unresponsive state.

      Please provide the package NVR for which bug is seen:

      sqlite-3.34.1-7.el9_3.x86_64
      rpm-4.16.1.3-27.el9_3.x86_64
      fapolicyd-1.3.2-100.el9.x86_64

      How reproducible:

      Very often on customer's system (we gave a hotfix to fapolicyd to better handle signals to the customer in order to gather the coredump)

      Steps to reproduce

      Coredump available at appcore.usersys (crashing thread #3)
      ID: 5e18ee0f949d969f22d0678e5e8f383e2b667adbbb2e965078e66c2340fabdc8

      $ appcore-cli gdb --id 5e18ee0f949d969f22d0678e5e8f383e2b667adbbb2e965078e66c2340fabdc8
      

      Expected results

      SQLite thread not to crash

      Actual results

      RMetrich's analysis:

       

      The coredump shows Thread 3 is crashing, even though Thread 1 got the signal.
      This is because Thread 3 sent the signal to Thread 1 then paused itself (it's the patch I made to avoid the deadlock in fapolicyd).
      
      (gdb) info threads 
        Id   Target Id                          Frame 
      * 1    Thread 0x7f9b7641b780 (LWP 217901) 0x00007f9b763426ff in __GI___poll (fds=0x7ffc0b4c0b10, nfds=2, timeout=-1)
          at ../sysdeps/unix/sysv/linux/poll.c:29
        2    Thread 0x7f9b719ff640 (LWP 217903) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, 
          expected=0, futex_word=0x55af20de24cc <do_decision+44>) at futex-internal.c:57
        3    Thread 0x7f9b729ff640 (LWP 217902) 0x00007f9b763184c2 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29
        4    Thread 0x7f9b711fe640 (LWP 217904) 0x00007f9b76313975 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, 
          flags=flags@entry=0, req=req@entry=0x7f9b711fdd40, rem=rem@entry=0x7f9b711fdd40)
          at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
      (gdb) thread 3
      [Switching to thread 3 (Thread 0x7f9b729ff640 (LWP 217902))]
      #0  0x00007f9b763184c2 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29
      29      return SYSCALL_CANCEL (pause);
      (gdb) bt
      #0  0x00007f9b763184c2 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29
      #1  0x000055af20dcaee5 in coredump_handler (sig=7) at daemon/fapolicyd.c:226
      #2  <signal handler called>
      #3  0x00007f9b7617bbc9 in sqlite3WalFindFrame.constprop.0 (pWal=0x7f9b6c0ae298, pgno=2945, 
          piRead=piRead@entry=0x7f9b729faf34) at /usr/src/debug/sqlite-3.34.1-7.el9_3.x86_64/sqlite3.c:62715
      #4  0x00007f9b760a24e2 in readDbPage (pPg=pPg@entry=0x7f9b6c096a30)
          at /usr/src/debug/sqlite-3.34.1-7.el9_3.x86_64/sqlite3.c:54887
      #5  0x00007f9b760a49f7 in getPageNormal (pPager=0x7f9b6c115848, pgno=2945, ppPage=0x7f9b729fafa0, flags=<optimized out>)
          at /usr/src/debug/sqlite-3.34.1-7.el9_3.x86_64/sqlite3.c:57461
      [...]
      The code dies in frame 3:
      
      (gdb) f 3
      #3  0x00007f9b7617bbc9 in sqlite3WalFindFrame.constprop.0 (pWal=0x7f9b6c0ae298, pgno=2945, 
          piRead=piRead@entry=0x7f9b729faf34) at /usr/src/debug/sqlite-3.34.1-7.el9_3.x86_64/sqlite3.c:62715
      62715        while( (iH = AtomicLoad(&sLoc.aHash[iKey]))!=0 ){
      (gdb) p &sLoc.aHash[iKey]
      $1 = (volatile ht_slot *) 0x7f9b76419bfe
      (gdb) p sLoc.aHash[iKey]
      $2 = 0
      The code looks valid to me.
      Indeed, we have a AtomicLoad() call on the pointer, which does this (line 206 or 209):
      
       205 #if GCC_VERSION>=4007000 || __has_extension(c_atomic)
       206 # define AtomicLoad(PTR)       __atomic_load_n((PTR),__ATOMIC_RELAXED)
       207 # define AtomicStore(PTR,VAL)  __atomic_store_n((PTR),(VAL),__ATOMIC_RELAXED)
       208 #else
       209 # define AtomicLoad(PTR)       (*(PTR))
       210 # define AtomicStore(PTR,VAL)  (*(PTR) = (VAL))
       211 #endif
      Assuming it's line 209 (semantic is similar), then we would have (*(&sLoc.aHash[iKey])) which evaluates to (sLoc.aHash[iKey]) which is here (0).
      

       

              rsroka@redhat.com Radovan Sroka
              rhn-support-jgamba Juan Gamba
              Radovan Sroka Radovan Sroka
              SSG Security QE SSG Security QE
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: