-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.17.z, 4.16.z
This is a clone of issue RHEL-44944 to use for version rhel-9.4.z
–
Original description:
This is a clone of issue RHEL-7984 to use for version rhel-9.5
–
Original description:
Description of problem:
- Filesystem mount blocked for a long time, vmcore collected, 594311032
- we can see original mount process blocked on this stack :
crash> bt PID: 20412 TASK: ffff9b55dc432800 CPU: 0 COMMAND: "mount" #0 [ffffc0d4e40238c8] __schedule at ffffffff91beafe6 #1 [ffffc0d4e4023920] schedule at ffffffff91beb735 #2 [ffffc0d4e4023930] xfs_extent_busy_flush at ffffffffc03b49aa [xfs] #3 [ffffc0d4e4023980] xfs_alloc_ag_vextent_size at ffffffffc03622c5 [xfs] #4 [ffffc0d4e40239f0] xfs_alloc_ag_vextent at ffffffffc0362f16 [xfs] #5 [ffffc0d4e4023a00] xfs_alloc_fix_freelist at ffffffffc036369b [xfs] #6 [ffffc0d4e4023b08] xfs_free_extent_fix_freelist at ffffffffc0363e54 [xfs] #7 [ffffc0d4e4023bb8] __xfs_free_extent at ffffffffc0363ef8 [xfs] #8 [ffffc0d4e4023c18] xfs_trans_free_extent at ffffffffc03df456 [xfs] #9 [ffffc0d4e4023c70] xfs_efi_item_recover at ffffffffc03df6c1 [xfs] #10 [ffffc0d4e4023cb8] xlog_recover_process_intents at ffffffffc03e7433 [xfs] #11 [ffffc0d4e4023d60] xlog_recover_finish at ffffffffc03e794c [xfs] #12 [ffffc0d4e4023d78] xfs_log_mount_finish at ffffffffc03d93d2 [xfs] #13 [ffffc0d4e4023da0] xfs_mountfs at ffffffffc03cbf92 [xfs] #14 [ffffc0d4e4023df8] xfs_fs_fill_super at ffffffffc03d08ac [xfs] #15 [ffffc0d4e4023e30] get_tree_bdev at ffffffff915603ff #16 [ffffc0d4e4023e70] vfs_get_tree at ffffffff9155ead5 #17 [ffffc0d4e4023e90] do_mount at ffffffff91585209 #18 [ffffc0d4e4023ef8] ksys_mount at ffffffff91585c1e #19 [ffffc0d4e4023f30] __x64_sys_mount at ffffffff91585c61 #20 [ffffc0d4e4023f38] do_syscall_64 at ffffffff912043ab #21 [ffffc0d4e4023f50] entry_SYSCALL_64_after_hwframe at ffffffff91c000a9 RIP: 00007f1878cb135e RSP: 00007fff7025cd78 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00005633b04ada40 RCX: 00007f1878cb135e RDX: 00005633b04b5250 RSI: 00005633b04adc40 RDI: 00005633b04adc20 RBP: 00007f1879b23184 R8: 0000000000000000 R9: 0000000000000002 R10: 00000000c0ed0000 R11: 0000000000000246 R12: 0000000000000000 R13: 00000000c0ed0000 R14: 00005633b04adc20 R15: 00005633b04b5250 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b // here : 581 void 582 xfs_extent_busy_flush( 583 struct xfs_mount *mp, 584 struct xfs_perag *pag, 585 unsigned busy_gen) 586 { 587 DEFINE_WAIT (wait); 588 int error; 589 590 error = xfs_log_force(mp, XFS_LOG_SYNC); 591 if (error) 592 return; 593 594 do { 595 prepare_to_wait(&pag->pagb_wait, &wait, TASK_KILLABLE); 596 if (busy_gen != READ_ONCE(pag->pagb_gen)) 597 break; 598 schedule(); <<<<--- 599 } while (1); // we've been running for a long time : crash> ps -t 20412 PID: 20412 TASK: ffff9b55dc432800 CPU: 0 COMMAND: "mount" RUN TIME: 02:49:31 START TIME: 304267682759 UTIME: 5832582 STIME: 9204315314778 // for some reason on vmcore process is on RUN state, but i suspect is not the case // if i look for busy extents i don't find any but on current transaction : crash> xfs_cil_ctx.busy_extents 0xffff9b5528f60300 busy_extents = { next = 0xffff9b5528f60330, prev = 0xffff9b5528f60330 }, crash> xfs_cil_ctx.busy_extents 0xffff9b5528f60300 -ox struct xfs_cil_ctx { [ffff9b5528f60330] struct list_head busy_extents; } crash> xfs_trans.t_busy ffff9b54c9bfbed0 -ox struct xfs_trans { [ffff9b54c9bfbf90] struct list_head t_busy; } crash> list -H ffff9b54c9bfbf90 -o xfs_extent_busy.list ffff9b5598c14980 crash> xfs_extent_busy ffff9b5598c14980 struct xfs_extent_busy { rb_node = { __rb_parent_color = 1, rb_right = 0x0, rb_left = 0x0 }, list = { next = 0xffff9b54c9bfbf90, prev = 0xffff9b54c9bfbf90 }, agno = 13, bno = 14770, length = 19, flags = 0 }
// which seems strange, we're on ag13, its true this fs have a big amount of
// AGs, but looks we're deadlock/stuck on that particular busy extent flushing
// a metadump was collected and i can reproduce the issue
// i suspect we might be dealing with issue described on :
https://www.spinics.net/lists/linux-xfs/msg72870.html
// in that case we may need :
8ebbf262d468 xfs: don't block in busy flushing when freeing extents
// on RHEL8/9
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
- is cloned by
-
OCPBUGS-56574 RHCOS 4.16 XFS: recovery blocked on xfs_extent_busy_flush [rhel-9.4.z]
-
- Closed
-
- links to
-
RHBA-2025:8280 OpenShift Container Platform 4.17.32 bug fix update