-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-8.10, rhel-9.6, rhel-10.0, rhel-9.7
-
None
-
None
-
Moderate
-
rhel-security-selinux
-
None
-
False
-
False
-
-
None
-
None
-
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
What were you trying to do that didn't work?
Dumping state through sending pkill -USR1 fapolicyd (or just waiting for the state to be periodically dumped) makes the system hang.
The root cause is fapolicyd starts monitoring initially excluded /run file system, which causes a deadlock when dumping the state (/run/fapolicyd/fapolicyd.state) because dumping the state holds the decision thread lock.
Dumping the state is performed by rpt_write() function, which is called on line 395 and 411, while holding decision_lock:
336 // write a stat report to file at the standard location 337 static void rpt_write(void) 338 { 339 FILE *f = fopen(STAT_REPORT, "w"); : 37 #define STAT_REPORT "/run/fapolicyd/fapolicyd.state" 346 static void *decision_thread_main(void *arg) 347 { : 375 pthread_mutex_lock(&decision_lock); 376 while (get_ready() == 0) { : 378 if (rpt_interval) { : 390 if (expired || run_stats) { 391 // write a new report only when one of 392 // 1. new events observed since last report 393 // 2. explicitly requested with run_stats 394 if (rpt_is_stale || run_stats) { 395 >>> rpt_write(); : 409 } else { 410 if (run_stats) { 411 >>> rpt_write(); :
Because /run becomes "monitored", this triggers an event which is "pre-processed" by the main thread:
474 void handle_events(void) 475 { : 499 pthread_mutex_lock(&decision_lock); 500 metadata = (const struct fanotify_event_metadata *)buf; 501 while (FAN_EVENT_OK(metadata, len)) { :
Because the lock is already held, the main thread hangs forever waiting for the lock to be released.
This affects RHEL9 including RHEL9.7 (still unreleased) and RHEL10.0
What is the impact of this issue to you?
System hang, even in permissive
Please provide the package NVR for which the bug is seen:
fapolicyd-1.3.2-1.el8 (RHEL8.10)
fapolicyd-1.3.3-106.el9 (RHEL9.7)
fapolicyd-1.3.3-102.el10 (RHEL10.0)
How reproducible is this bug?:
Always on RHEL9.6 and RHEL10.0
Couldn't reproduce on RHEL8.10 but some customer hit this (we have a vmcore showing /run/fapolicyd.state was being dumped and main thread deadlocked
Steps to reproduce
- Configure fapolicyd with allow_filesystem_mark = 1
- Create a directory just under /run
# mkdir /run/netns
- Do a bind mount on itself
# mount --bind /run/netns /run/netns
- Dump state
# pkill -USR1 fapolicyd
Expected results
No system hang
Actual results
System deadlocks.
Additional infos
The attached systemtap script demonstrates that /run becomes monitored, causing the deadlock to occur: we can see that fapolicyd decision thread opens /run/fapolicyd/fapolicyd.state for writing, which triggers a fsnotify event which never gets a reply:
# stap -v ./fapolicyd_openat_catcher.stp
[...]
1771: openat(AT_FDCWD, "/run/fapolicyd/fapolicyd.state", O_WRONLY|O_CREAT|O_TRUNC, 0666) ->
1771: fsnotify() being called ->
0xffffffffad8f8414 : fsnotify+0x4/0xb90 [kernel]
0xffffffffad8f90f3 : __fsnotify_parent+0x143/0x3a0 [kernel]
0xffffffffad88ddf9 : do_dentry_open+0xe9/0x440 [kernel]
0xffffffffad8905be : vfs_open+0x2e/0xe0 [kernel]
0xffffffffad8a6b42 : do_open+0x162/0x3d0 [kernel]
0xffffffffad8ac494 : path_openat+0x124/0x2d0 [kernel]
0xffffffffad8ac714 : do_filp_open+0xc4/0x170 [kernel]
0xffffffffad890a9e : do_sys_openat2+0xae/0xe0 [kernel]
0xffffffffad890f25 : __x64_sys_openat+0x55/0xa0 [kernel]
0xffffffffad488290 : arch_rethook_trampoline+0x0/0x60 [kernel]
Important: recently audit_steve redesigned the code through Upstream commit a81d68df8fff366b4479094d1ef5df0db5dd524c:
commit a81d68df8fff366b4479094d1ef5df0db5dd524c Author: Steve Grubb <ausearch.1@gmail.com> Date: Sun Aug 31 17:41:13 2025 -0400 Enhance queue struct and API for thread safety Introduced internal synchronization primitives to the queue, adding a mutex, condition variable, and shutdown flag alongside new enqueue/dequeue APIs and a timed variant for bounded waits. Refactored the daemon to rely on the thread-safe queue, removing custom locks, updating event handling to use q_enqueue, and coordinating shutdown via the new q_shutdown helper.
This commit should avoid the issue (I didn't confirm yet) since no lock is taken anymore while dumping state and main thread is not grabbing the lock either.
- clones
-
RHEL-120827 fapolicyd deadlocks itself when writing its state file (/run/fapolicyd/fapolicyd.state)
-
- Planning
-
- is incorporated by
-
RHEL-118363 [RHEL 9] Update fapolicyd to v1.3.7
-
- Planning
-