-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-9.3.0, rhel-9.4
-
None
-
None
-
Important
-
rhel-sst-security-special-projects
-
ssg_security
-
None
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
None
-
-
x86_64
-
None
What were you trying to do that didn't work?
fapolicyd crashes while reloading, crash indicates the SQLite database may be the culprit and leaves the system in an unresponsive state.
Please provide the package NVR for which bug is seen:
sqlite-3.34.1-7.el9_3.x86_64
rpm-4.16.1.3-27.el9_3.x86_64
fapolicyd-1.3.2-100.el9.x86_64
How reproducible:
Very often on customer's system (we gave a hotfix to fapolicyd to better handle signals to the customer in order to gather the coredump)
Steps to reproduce
Coredump available at appcore.usersys (crashing thread #3)
ID: 5e18ee0f949d969f22d0678e5e8f383e2b667adbbb2e965078e66c2340fabdc8
$ appcore-cli gdb --id 5e18ee0f949d969f22d0678e5e8f383e2b667adbbb2e965078e66c2340fabdc8
Expected results
SQLite thread not to crash
Actual results
RMetrich's analysis:
The coredump shows Thread 3 is crashing, even though Thread 1 got the signal. This is because Thread 3 sent the signal to Thread 1 then paused itself (it's the patch I made to avoid the deadlock in fapolicyd). (gdb) info threads Id Target Id Frame * 1 Thread 0x7f9b7641b780 (LWP 217901) 0x00007f9b763426ff in __GI___poll (fds=0x7ffc0b4c0b10, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 2 Thread 0x7f9b719ff640 (LWP 217903) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55af20de24cc <do_decision+44>) at futex-internal.c:57 3 Thread 0x7f9b729ff640 (LWP 217902) 0x00007f9b763184c2 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 4 Thread 0x7f9b711fe640 (LWP 217904) 0x00007f9b76313975 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f9b711fdd40, rem=rem@entry=0x7f9b711fdd40) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48 (gdb) thread 3 [Switching to thread 3 (Thread 0x7f9b729ff640 (LWP 217902))] #0 0x00007f9b763184c2 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 29 return SYSCALL_CANCEL (pause); (gdb) bt #0 0x00007f9b763184c2 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 #1 0x000055af20dcaee5 in coredump_handler (sig=7) at daemon/fapolicyd.c:226 #2 <signal handler called> #3 0x00007f9b7617bbc9 in sqlite3WalFindFrame.constprop.0 (pWal=0x7f9b6c0ae298, pgno=2945, piRead=piRead@entry=0x7f9b729faf34) at /usr/src/debug/sqlite-3.34.1-7.el9_3.x86_64/sqlite3.c:62715 #4 0x00007f9b760a24e2 in readDbPage (pPg=pPg@entry=0x7f9b6c096a30) at /usr/src/debug/sqlite-3.34.1-7.el9_3.x86_64/sqlite3.c:54887 #5 0x00007f9b760a49f7 in getPageNormal (pPager=0x7f9b6c115848, pgno=2945, ppPage=0x7f9b729fafa0, flags=<optimized out>) at /usr/src/debug/sqlite-3.34.1-7.el9_3.x86_64/sqlite3.c:57461 [...] The code dies in frame 3: (gdb) f 3 #3 0x00007f9b7617bbc9 in sqlite3WalFindFrame.constprop.0 (pWal=0x7f9b6c0ae298, pgno=2945, piRead=piRead@entry=0x7f9b729faf34) at /usr/src/debug/sqlite-3.34.1-7.el9_3.x86_64/sqlite3.c:62715 62715 while( (iH = AtomicLoad(&sLoc.aHash[iKey]))!=0 ){ (gdb) p &sLoc.aHash[iKey] $1 = (volatile ht_slot *) 0x7f9b76419bfe (gdb) p sLoc.aHash[iKey] $2 = 0 The code looks valid to me. Indeed, we have a AtomicLoad() call on the pointer, which does this (line 206 or 209): 205 #if GCC_VERSION>=4007000 || __has_extension(c_atomic) 206 # define AtomicLoad(PTR) __atomic_load_n((PTR),__ATOMIC_RELAXED) 207 # define AtomicStore(PTR,VAL) __atomic_store_n((PTR),(VAL),__ATOMIC_RELAXED) 208 #else 209 # define AtomicLoad(PTR) (*(PTR)) 210 # define AtomicStore(PTR,VAL) (*(PTR) = (VAL)) 211 #endif Assuming it's line 209 (semantic is similar), then we would have (*(&sLoc.aHash[iKey])) which evaluates to (sLoc.aHash[iKey]) which is here (0).