-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
None
-
None
-
No
-
Important
-
ZStream
-
rhel-fs
-
ssg_filesystems_storage_and_HA
-
None
-
False
-
False
-
None
-
None
-
Approved Blocker
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
All
-
Linux
-
None
Description of problem:
kernel-4.18.0-305.17.1.el8_4
Cust says "I have observed the same pattern on 4 systems."
We're doing the lookup of "/tmp/isvinstall.20980/.git/tXtCid1".
We panic because inode corresponding to the last component is supposed to be a symlink, but it isn't. When following it, calling 'get_link' , it's NULL and we panic.
Last component dentry, is a negative dentry (has d_inode set to NULL), and the dentry.d_seq has changed so we have removed the symlink while doing the lookup (see detailed analysis below).
The analysis clearly shows that the dentry has changed after the checks in 'step_into()' function and the inode has been reused and is linked with a different dentry. Because that new inode is not a symlink, we panic trying the 'get_link()' method.
So the problem seems to be that even following this rule (since we're not freeing the inode):
"...
RCU is, unsurprisingly, critical to RCU-walk mode. The rcu_read_lock() is held for the entire time that RCU-walk
is walking down a path. The particular guarantee it provides is that the key data structures — dentries, inodes,
super_blocks, and mounts — will not be freed while the lock is held. They might be unlinked or invalidated in one
way or another, but the memory will not be repurposed, so values in various fields will still be meaningful. This
is the only guarantee that RCU provides; everything else is done using seqlocks.
..."
We're still with a situation that lead to a panic.
I've been doing some tests, and indeed I can see situations where we reuse an inode without freeing it:
ln -s test mytest;ls mytest;rm -f mytest;touch newtest;ls newtest;rm -f newtest
...
xfs_fs_destroy_inode ffff925d03f7f448
audit_inode: ffff925d03f7f448 mytest get_link() => ffffffffc030aa70
audit_inode: ffff925d03f7f448 mytest get_link() => ffffffffc030aa70
audit_inode: ffff925d03f7f448 mytest get_link() => ffffffffc030aa70
xfs_fs_destroy_inode ffff925d03f7f448
audit_inode: ffff925d03f7f448 newtest get_link() => 0
audit_inode: ffff925d03f7f448 newtest get_link() => 0
audit_inode: ffff925d03f7f448 newtest get_link() => 0
audit_inode: ffff925d03f7f448 newtest get_link() => 0
I'm also tracing calls to __xfs_inode_free() (where we do the call_rcu(...) ) for this inode, but we are not calling it...
So somehow, we're allowing to reuse the inode while doing the lookup, and because get_link() changes, we end up using
a NULL and system panics.
Not sure if this is a name lookup issue or XFS.
Will try first with name lookup.
-
-
- Detailed analysis ***
-
crash> bt
PID: 10964 TASK: ffff951c8aa92f80 CPU: 3 COMMAND: "TaniumCX"
…
#7 [ffffae44d0a6fbe0] page_fault at ffffffff8d6010fe
[exception RIP: unknown or invalid address]
RIP: 0000000000000000 RSP: ffffae44d0a6fc90 RFLAGS: 00010246
RAX: ffffffff8da3cc80 RBX: ffffae44d0a6fd30 RCX: 0000000000000000
RDX: ffffae44d0a6fd98 RSI: ffff951aa9af3008 RDI: 0000000000000000
RBP: 0000000000000000 R8: ffffae44d0a6fb94 R9: 0000000000000000
R10: ffff951c95d8c318 R11: 0000000000080000 R12: ffffae44d0a6fd98
R13: ffff951aa9af3008 R14: ffff951c8c9eb840 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffae44d0a6fc90] trailing_symlink at ffffffff8cf24e61
#9 [ffffae44d0a6fcc8] path_lookupat at ffffffff8cf261d1
#10 [ffffae44d0a6fd28] filename_lookup at ffffffff8cf2a700
#11 [ffffae44d0a6fe40] vfs_statx at ffffffff8cf1dbc4
#12 [ffffae44d0a6fe98] __do_sys_newstat at ffffffff8cf1e1f9
#13 [ffffae44d0a6ff38] do_syscall_64 at ffffffff8cc0420b
…
crash> dis -lr ffffffff8cf24e61
…
/usr/src/debug/kernel-4.18.0-305.17.1.el8_4/linux-4.18.0-305.17.1.el8_4.x86_64/fs/namei.c: 1076
…
0xffffffff8cf24e5c <trailing_symlink+0x1ec>: callq 0xffffffff8d801600 <__x86_indirect_thunk_r15>
…
1051 struct inode *inode = nd->link_inode;
…
1074 get = inode->i_op->get_link;
…
1076 res = get(NULL, inode, &last->done);
crash> whatis trailing_symlink
const char *trailing_symlink(struct nameidata *);
crash> dis trailing_symlink | grep -E 'mov.*rdi,'
0xffffffff8cf24c85 <trailing_symlink+0x15>: mov %rdi,%rbx
crash> nameidata.link_inode ffffae44d0a6fd30
link_inode = 0xffff951aa9af3008
crash> inode.i_sb,i_op,i_fop,i_dentry 0xffff951aa9af3008
i_sb = 0xffff951c87c32800
i_op = 0xffffffffc0520e40 <xfs_dir_inode_operations>
i_fop = 0xffffffffc0520580 <xfs_dir_file_operations>
i_dentry =
crash> inode_operations.get_link xfs_dir_inode_operations
get_link = 0x0 // ?????
crash> mount | grep ffff951c87c32800
ffff951b51581e00 ffff951c87c32800 xfs /dev/mapper/volgrp01-root /
crash> kmem 0xffff951c8c9eb170
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
ffff951587d4f500 192 106009 106764 5084 4k dentry
SLAB MEMORY NODE TOTAL ALLOCATED FREE
fffff4dc20327ac0 ffff951c8c9eb000 0 21 21 0
FREE / [ALLOCATED]
[ffff951c8c9eb0c0]
…
crash> files -d ffff951c8c9eb0c0
DENTRY INODE SUPERBLK TYPE PATH
ffff951c8c9eb0c0 ffff951aa9af3008 ffff951c87c32800 DIR /tmp/isvinstall.20980/.git/objects/pack
This is the code flow i think we're following. I've marked RCU important checks with asterisks at the end of the line
filename_lookup()
-> path_lookupat()
...
while (!(err = link_path_walk(s, nd))
-> link_path_walk()
...
if (unlikely(!*name)) {
OK:
/* pathname body, done */
if (!nd->depth)
return 0;
name = nd->stack[nd->depth - 1].name;
/* trailing symlink, done */
if (!name)
return 0;
<- link_path_walk()
...
&& ((err = lookup_last(nd)) > 0))
... |
/* make sure that d_is_symlink above matches inode */ |
if (nd->flags & LOOKUP_RCU) { | if (read_seqcount_retry(&path->dentry->d_seq, seq)) <<-- at this point we know that *IS* a symlink (***) 3rd check | return -ECHILD; | } |
... |
-> pick_link(nd, path, inode, seq); |
... |
error = nd_alloc_stack(nd) |
... |
last = nd->stack + nd->depth++; <<-- here we setup the stack |
last->link = *link; |
clear_delayed_call(&last->done); |
nd->link_inode = inode; <<-- problematic inode |
last->seq = seq; |
return 1; |
<- pick_link() |
<- step_into() |
<- walk_component() |
<- lookup_last() |
-> trailing_symlink(nd) |
... |
nd->stack[0].name = NULL; |
-> get_link(nd); |
... |
nd->last_type = LAST_BIND; |
res = inode->i_link; |
if (!res) { |
const char * (*get)(struct dentry *, struct inode *, |
struct delayed_call *); |
get = inode->i_op->get_link; |
if (nd->flags & LOOKUP_RCU) { |
res = get(NULL, inode, &last->done); <<-- BOOM! |
+-> lookup_fast()
...
if (nd->flags & LOOKUP_RCU) {
...
dentry = __d_lookup_rcu(parent, &nd->last, &seq); <<-- get the last component dentry
...
*inode = d_backing_inode(dentry); <<-- get the inode from it
if (unlikely(read_seqcount_retry(&dentry->d_seq, seq))) <<-- We validate that inode is from the dentry 1st check
return -ECHILD;
/*
- This sequence count validates that the parent had no
- changes while we did the lookup of the dentry above.
* - The memory barrier in read_seqcount_begin of child is
- enough, we can use __read_seqcount_retry here.
*/
if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq))) <<-- see the comment (**) second check
return -ECHILD;
*seqp = seq;
status = d_revalidate(dentry, nd->flags);
if (likely(status > 0)) {
/*
- Note: do negative dentry check after revalidation in
- case that drops it.
path->mnt = mnt;
path->dentry = dentry;
if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
return 1;
This is what we have in nameidata:
crash> struct nameidata ffffae44d0a6fd30
struct nameidata {
path =
,
last = {
{
hash_len = 0x76e47baad
},
name = 0xffff951c8a20603b "tXtCid1" -> Last one
},
root = { mnt = 0xffff951b51581e20, dentry = 0xffff951c88cf7180 },
inode = 0xffff951c8c9409c8,

flags = 0x51, -> LOOKUP_FOLLOW|LOOKUP_PARENT|LOOKUP_RCU
seq = 0x2, -> Good, same as parent dentry seq
m_seq = 0x1a52,
last_type = 0x4, -> LAST_BIND (set in get_link() )
depth = 0x1, -> We've an entry in the stack
total_link_count = 0x1, -> One link
+- stack = 0xffffae44d0a6fd88,
+-> internal = {{
link = { mnt = 0xffff951b51581e20, dentry = 0xffff951c8c9eb840 -> /tmp/isvinstall.20980/.git/tXtCid1, note that dentry.d_seq = 8 > 6 },
done = { fn = 0x0, arg = 0xffff951c8b632460 },
name = 0x0, -> set in get_link()
seq = 0x6 -> NOTE THAT OUR dentry HAS CHANGED!!!
...
name = 0xffff951c8a206000,
saved = 0x0,
link_inode = 0xffff951aa9af3008, -> /tmp/isvinstall.20980/.git/objects/pack (dentry => ffff951c8c9eb0c0)
root_seq = 0x2,
dfd = 0xffffff9c
}
The above state means we have called pick_link(nd => ffffae44d0a6fd30, path->dentry => 0xffff951c8c9eb840, inode => 0xffff951aa9af3008)
crash> dentry 0xffff951c8c9eb840 -> /tmp/isvinstall.20980/.git/tXtCid1
struct dentry {
d_flags = 0x80040, -> DCACHE_REFERENCED|DCACHE_LRU_LIST
...
d_seq = {
sequence = 0x8 -> Changed after (***) 3rd check (see above)
...
d_parent = 0xffff951aa43b1240, -> /tmp/isvinstall.20980/.git/
d_name = {
{
{ hash = 0x6e47baad, len = 0x7 }
,
hash_len = 0x76e47baad
},
name = 0xffff951c8c9eb878 "tXtCid1"
},
d_inode = 0x0, -> Negative dentry. We have called d_delete() on it. This also explain why we don't have the DCACHE_SYMLINK_TYPE.
d_iname = "tXtCid1...
...
d_sb = 0xffff951c87c32800,
...
}
I've also found a file struct, not used any longer, but still allocated linking 0xffff951c8c9eb840 dentry with inode 0xffff951aa9af3008:
crash> struct file.f_path,f_inode,f_op,f_count ffff951c6198fe00
f_path =
f_inode = 0xffff951aa9af3008
f_op = 0xffffffffc05206a0 <xfs_file_operations>
f_count =
But now we have:
crash> files -d ffff951c8c9eb0c0
DENTRY INODE SUPERBLK TYPE PATH
ffff951c8c9eb0c0 ffff951aa9af3008 ffff951c87c32800 DIR /tmp/isvinstall.20980/.git/objects/pack
Which shows that inode has been reused...
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
System should not panic.
Expected results:
Additional info:
vmcore available in galvatron.cee.redhat.com:
/cores/retrace/tasks/605395386/crash/vmcore
- external trackers