What were you trying to do that didn't work?
When run in read-only mode against a mounted filesystem, some xfs_db commands can segfault. The segfault is believed to be due to xfs_db encountering a changing free space btree at the same time as xfs_db is reading them.
This is seen in customer environments because the insights-client runs several xfs_db commands against the device for mounted filesystems:
xfs_db -r -c frag /dev/device
xfs_db -r -c freesp /dev/device
Please provide the package NVR for which bug is seen:
xfsprogs-5.0.0-10.el8
seen in RHEL 7 as well
How reproducible:
unknown, but seen in at least 3 customer cases thus far
Steps to reproduce
unknown
Expected results
no segfaults
Actual results
#0 __fswab16 (x=<optimized out>) at ../include/xfs_arch.h:145
#1 process_inode (agf=0x559582375800, dip=0x600, agino=3092803) at frag.c:308
#2 scanfunc_ino (block=0x55958237ce00, level=level@entry=0, agf=agf@entry=0x559582375800) at frag.c:513
#3 0x00005595801e7d45 in scan_sbtree (agf=agf@entry=0x559582375800, root=3, nlevels=nlevels@entry=1, btype=TYP_INOBT, func=0x5595801e77d0 <scanfunc_ino at frag.c:461>) at frag.c:416
#4 0x00005595801e786c in scanfunc_ino (block=0x55958237ae00, level=level@entry=1, agf=agf@entry=0x559582375800) at ../include/xfs_arch.h:158
#5 0x00005595801e7d45 in scan_sbtree (agf=agf@entry=0x559582375800, root=1447206, nlevels=2, btype=TYP_INOBT, func=0x5595801e77d0 <scanfunc_ino at frag.c:461>) at frag.c:416
#6 0x00005595801e7fcd in scan_ag (agno=0) at ../include/xfs_arch.h:158
#7 frag_f (argc=<optimized out>, argv=<optimized out>) at frag.c:155
#8 frag_f (argc=<optimized out>, argv=<optimized out>) at frag.c:145
#9 0x00005595801d24ee in main (argc=<optimized out>, argv=<optimized out>) at init.c:195#1 process_inode (agf=0x559582375800, dip=0x600, agino=3092803) at frag.c:308
308 switch (be16_to_cpu(dip->di_mode) & S_IFMT) {
(gdb) p dip->di_mode
Cannot access memory at address 0x602
so 'dip' has an invalid value, and the segfault is due to accessing that invalid address.
(gdb) frame 2
#2 scanfunc_ino (block=0x55958237ce00, level=level@entry=0, agf=agf@entry=0x559582375800) at frag.c:513(gdb) list
508 for (j = 0; j < inodes_per_buf; j++) {
509 if (XFS_INOBT_IS_FREE_DISK(&rp[i], ioff + j))
510 continue;
511 dip = (xfs_dinode_t *)((char *)iocur_top->data +
512 ((off + j) << mp->m_sb.sb_inodelog));
513 process_inode(agf, agino + ioff + j, dip);
514 }{}(gdb) p mp->m_sb.sb_inodelog
$15 = 9 '\t'(gdb) p iocur_top->data
$16 = (void *) 0x0(gdb) p off
$17 = <optimized out>(gdb) p j
$18 = 3
it would appear that 'off' is 0:
(gdb) p (xfs_dinode_t *)((char *)iocur_top->data + ((0 + j) << mp->m_sb.sb_inodelog))
$21 = (xfs_dinode_t *) 0x600
The question is over how/why iocur_top->data is 0/NULL... in the code, there's a test just before the loop above to make sure that iocur_top->data is specifically NOT null:
502 if (iocur_top->data == NULL) { <<<<<<<<<
503 dbprintf(_("can't read inode block %u/%u\n"),
504 seqno, agbno);
505 goto next_buf;
506 }
507
508 for (j = 0; j < inodes_per_buf; j++) {
509 if (XFS_INOBT_IS_FREE_DISK(&rp[i], ioff + j))
510 continue;
511 dip = (xfs_dinode_t *)((char *)iocur_top->data +
512 ((off + j) << mp->m_sb.sb_inodelog));
513 process_inode(agf, agino + ioff + j, dip);
514 }
I'll attach a coredump with prebuilt root environment/debuginfo tree