Issue reproduced. I noticed that `watch cat /proc/$(pgrep fapolicyd)/stack` was not ticking while firefox was hanging. So it seems like fapolicyd is blocking everything in the server during that time. I'm looking into a vmcore and packet capture to see what is happening during the issue. vmcore task queued on bumblebee-x86.sush-001.prod.us-west-2.aws.redhat.com. $ allspark revive 275538286 crash> ps -m | grep UN [0 00:00:04.215] [UN] PID: 4590 TASK: ffff9d3f40b4d7c0 CPU: 3 COMMAND: "nfsd" [0 00:00:04.221] [UN] PID: 4589 TASK: ffff9d3f4cb78000 CPU: 3 COMMAND: "nfsd" [0 00:00:04.225] [UN] PID: 4593 TASK: ffff9d3f47679d40 CPU: 0 COMMAND: "nfsd" PID: 4593 TASK: ffff9d3f47679d40 CPU: 0 COMMAND: "nfsd" #0 [ffffaa458082f918] __schedule at ffffffffb612f879 #1 [ffffaa458082f980] schedule at ffffffffb612fb2e #2 [ffffaa458082f998] fanotify_get_response.constprop.0 at ffffffffb58c63b0 #3 [ffffaa458082f9f0] fanotify_handle_event at ffffffffb58c7018 #4 [ffffaa458082fa50] send_to_group at ffffffffb58c1361 #5 [ffffaa458082fab0] fsnotify at ffffffffb58c1753 #6 [ffffaa458082fb50] __fsnotify_parent at ffffffffb58c198f #7 [ffffaa458082fc18] do_dentry_open at ffffffffb586156a #8 [ffffaa458082fc50] dentry_open at ffffffffb58619cd #9 [ffffaa458082fc70] __nfsd_open.constprop.0 at ffffffffc0e04cfb [nfsd] #10 [ffffaa458082fcb0] nfsd_file_do_acquire at ffffffffc0e0faec [nfsd] #11 [ffffaa458082fd58] nfsd4_commit at ffffffffc0e161f5 [nfsd] #12 [ffffaa458082fd90] nfsd4_proc_compound at ffffffffc0e1a590 [nfsd] #13 [ffffaa458082fdd8] nfsd_dispatch at ffffffffc0dff1e6 [nfsd] #14 [ffffaa458082fe20] svc_process_common at ffffffffc04cf87e [sunrpc] #15 [ffffaa458082fe68] svc_process at ffffffffc04cff1d [sunrpc] #16 [ffffaa458082fe80] svc_handle_xprt at ffffffffc04e41c8 [sunrpc] #17 [ffffaa458082fec8] svc_recv at ffffffffc04e479a [sunrpc] #18 [ffffaa458082fef8] nfsd at ffffffffc0dfdaf4 [nfsd] #19 [ffffaa458082ff18] kthread at ffffffffb553eaad #20 [ffffaa458082ff50] ret_from_fork at ffffffffb5408549 // The other two tasks are similar. crash> ps -m | grep fapolicyd [0 00:00:00.000] [IN] PID: 5565 TASK: ffff9d3f51bfd7c0 CPU: 0 COMMAND: "fapolicyd" <--- breaking a lease [0 00:00:00.371] [IN] PID: 5566 TASK: ffff9d3f40841d40 CPU: 0 COMMAND: "fapolicyd" <--- polling [0 00:00:02.624] [IN] PID: 5568 TASK: ffff9d3f40358000 CPU: 3 COMMAND: "fapolicyd" <--- waiting on futex [0 00:00:04.233] [IN] PID: 5567 TASK: ffff9d3f4cac3a80 CPU: 0 COMMAND: "fapolicyd" <--- usleeping PID: 5565 TASK: ffff9d3f51bfd7c0 CPU: 0 COMMAND: "fapolicyd" #0 [ffffaa458162fa18] __schedule at ffffffffb612f879 #1 [ffffaa458162fa80] schedule at ffffffffb612fb2e #2 [ffffaa458162fa98] schedule_timeout at ffffffffb61356b8 #3 [ffffaa458162fb00] __break_lease at ffffffffb58eb09a #4 [ffffaa458162fb88] do_dentry_open at ffffffffb58615a9 #5 [ffffaa458162fbc0] dentry_open at ffffffffb58619cd #6 [ffffaa458162fbe0] copy_event_to_user at ffffffffb58c8a42 #7 [ffffaa458162fc60] fanotify_read at ffffffffb58c9014 #8 [ffffaa458162fd00] vfs_read at ffffffffb5866d1b #9 [ffffaa458162fd98] ksys_read at ffffffffb5867d4f #10 [ffffaa458162fdd0] do_syscall_64 at ffffffffb611f8dc #11 [ffffaa458162ff50] entry_SYSCALL_64_after_hwframe at ffffffffb6200130 crash> file.f_inode ffff9d3f503c7900 f_inode = 0xffff9d3f4845b878, crash> inode.i_dentry 0xffff9d3f4845b878 i_dentry = { first = 0xffff9d3f42400170 }, crash> kmem 0xffff9d3f42400170 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff9d3f401f9400 192 49961 50085 2385 4k dentry SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea3344090000 ffff9d3f42400000 0 21 21 0 FREE / [ALLOCATED] [ffff9d3f424000c0] crash> files -d ffff9d3f424000c0 DENTRY INODE SUPERBLK TYPE PATH ffff9d3f424000c0 ffff9d3f4845b878 ffff9d3f4410d800 REG /home/tbecker/.mozilla/firefox/o7c6gkwc.default-default/places.sqlite // Is breaking leases causing this issue? Running fapolicyd under trace command changes the behavior. Nice. Packet capture. $ capinfos fapolicyd.pcap File name: fapolicyd.pcap File type: Wireshark/tcpdump/... - pcap File encapsulation: Ethernet File timestamp precision: microseconds (6) Packet size limit: file hdr: 262144 bytes Number of packets: 4,094 File size: 5,139 kB Data size: 5,074 kB Capture duration: 15.047295 seconds $ tshark -2r fapolicyd.pcap -Y 'rpc.msgtyp == 0 && not rpc.reqframe' 4028 11.276330 192.168.2.109 → 192.168.2.111 NFS 274 V4 Call COMMIT FH: 0x7e082237 Offset: 0 Len: 0 4033 11.277271 192.168.2.109 → 192.168.2.111 NFS 346 V4 Call OPEN DH: 0x7e082237/ 4062 11.283084 192.168.2.109 → 192.168.2.111 NFS 410 V4 Call OPEN DH: 0xaf049240/webext.sc.lz4.tmp fapolicyd is holding nfsd replies back. ftracing by hand show a lot of break leases when the hang occurs. # for i in break_lease_block break_lease_noblock break_lease_unblock ; do echo 1 > /sys/kernel/tracing/events/filelock/$i/enable ; done ; fapolicyd --debug ; for i in break_lease_block break_lease_noblock break_lease_unblock ; do echo 0 > /sys/kernel/tracing/events/filelock/$i/enable ; done ; cat /sys/kernel/tracing/trace | tee trace.txt # # grep fapolicyd-6270 trace.txt | awk '{ print $6 }' | sort | uniq -c 5410 fl=00000000c7b72c78 # grep fapolicyd trace.txt | awk '{ print $1,$3,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14 }' | sort | uniq -c 2705 fapolicyd-6270 ....1.. break_lease_block: fl=00000000c7b72c78 dev=0xfc:0x11 ino=0xc00009c fl_blocker=00000000da0302b9 fl_owner=0000000000000000 fl_flags=FL_LEASE fl_type=F_RDLCK fl_break_time=0 fl_downgrade_time=0 2705 fapolicyd-6270 ....1.. break_lease_unblock: fl=00000000c7b72c78 dev=0xfc:0x11 ino=0xc00009c fl_blocker=00000000da0302b9 fl_owner=0000000000000000 fl_flags=FL_LEASE fl_type=F_RDLCK fl_break_time=0 fl_downgrade_time=0 This is hashed to not expose pointers. # grep fapolicyd-6270 trace.txt | (head -1 ; tail -1) fapolicyd-6270 [001] ....1.. 44237.443140: break_lease_block: fl=00000000c7b72c78 dev=0xfc:0x11 ino=0xc00009c fl_blocker=00000000da0302b9 fl_owner=0000000000000000 fl_flags=FL_LEASE fl_type=F_RDLCK fl_break_time=0 fl_downgrade_time=0 fapolicyd-6270 [001] ....1.. 44242.871457: break_lease_unblock: fl=00000000c7b72c78 dev=0xfc:0x11 ino=0xc00009c fl_blocker=00000000da0302b9 fl_owner=0000000000000000 fl_flags=FL_LEASE fl_type=F_RDLCK fl_break_time=0 fl_downgrade_time=0 About 5 seconds to break the lock. Any callback in the packet capture? $ tshark -2r fapolicyd.pcap -Y 'nfs.cb' 4031 11.276615 192.168.2.111 → 192.168.2.109 NFS CB 266 V1 CB_COMPOUND Call (Reply In 4032) CB_SEQUENCE;CB_RECALL 4032 11.277243 192.168.2.109 → 192.168.2.111 NFS CB 154 V1 CB_COMPOUND Reply (Call In 4031) CB_SEQUENCE;CB_RECALL Recalling a delegation. Client looks responsive. ----- Since it seems like breaking leases are causing this issue, disabling the leases should resolve this issue. Tested this and it seems to resolve the issue. sysctl -w fs.leases-enable=0