Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-56652

nfsd: crash while testing a lock on a re-exported nfs v3 mount simultaneously from local system and client [KWF:kernel-rt]

    • Icon: Task Task
    • Resolution: Done
    • Icon: Undefined Undefined
    • rhel-8.10.z
    • None
    • kernel-rt / Other
    • None
    • rhel-sst-filesystems
    • None
    • False
    • Hide

      None

      Show
      None

      This Task is created to track the testing of RHEL-31515 on the Kernel RT variant. Task will transition to In-progress once builds are ready and please find build information in the Testable build field in RHEL-31515. To facilitate the merging of the main kernel ticket, please transition the Task to CLOSED status after verifying or addressing any blocking issues.

      What were you trying to do that didn't work?

      While locking and testing locks on a re-exported nfs v3 mount simultaneously from both the client-server and a client which has mounted the re-exported filesystem, nfsd can crash the system in nlmclnt_setlockargs while servicing a LOCKT, due to a null file_lock->fl_file

      Please provide the package NVR for which bug is seen:

      kernel-4.18.0-513.5.1.el8_9.x86_64

      How reproducible:

      easy, see reproducer

      Steps to reproduce

      reproducer requires 3 systems and attached test program

       system 1 (nfs server):
        # mkdir /exports
        # touch /exports/testfile
        /etc/exports:
          /exports *(rw,no_root_squash)
        # exportfs -av
      
      system 2 (nfs client + server):
        # mkdir /exports
        # mount system1:/exports /exports -overs=3
        /etc/exports:
          /exports *(rw,no_root_squash,fsid=50)
        # exportfs -av
        copy test_lock_crash.c to /tmp
        # gcc /tmp/test_lock_crash.c -o /tmp/test_lock_crash
        # /tmp/test_lock_crash /exports/testfile
      
      system 3 (nfs client):
        # mkdir /mnt
        # mount system2:/exports /mnt -overs=4
        copy test_lock_crash.c to /tmp
        # gcc /tmp/test_lock_crash.c -o /tmp/test_lock_crash  # /tmp/test_lock_crash /mnt/testfile
      

       

      Expected results

      no crash

      Actual results

      nfsd process crashes while dereferencing null pointer:

      [70489.133762] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      [70489.133899] PGD 0 P4D 0 
      [70489.133935] Oops: 0000 [#1] SMP NOPTI
      [70489.133982] CPU: 10 PID: 49117 Comm: nfsd Kdump: loaded Not tainted 4.18.0-513.18.1.el8_9.x86_64 #1
      [70489.134077] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
      [70489.134185] RIP: 0010:nlmclnt_setlockargs+0x3a/0xf0 [lockd]
        

      The crash occurs in nlmclnt_setlockargs on the following instruction:

       0xffffffffc07590ba <nlmclnt_setlockargs+0x3a>:  mov    0x20(%rax),%rdx
      
          RAX: 0000000000000000  RBX: ffff947ae5cc7c00  RCX: ffff947ae5cc7c44
       PID: 49117    TASK: ffff947b17a44000  CPU: 10   COMMAND: "nfsd"
          [exception RIP: nlmclnt_setlockargs+0x3a]
          RIP: ffffffffc07590ba  RSP: ffffa052045efd38  RFLAGS: 00010286
      ...
       #8 [ffffa052045efd50] nlmclnt_proc at ffffffffc075935a [lockd]
       #9 [ffffa052045efda8] nfsd4_lockt at ffffffffc07a7443 [nfsd]
      #10 [ffffa052045efdf8] nfsd4_proc_compound at ffffffffc07936f1 [nfsd]
      #11 [ffffa052045efe58] nfsd_dispatch at ffffffffc077ecee [nfsd]
      #12 [ffffa052045efe80] svc_process_common at ffffffffc06b4320 [sunrpc]
      #13 [ffffa052045efed8] svc_process at ffffffffc06b4637 [sunrpc]
      #14 [ffffa052045efef0] nfsd at ffffffffc077e663 [nfsd]
      #15 [ffffa052045eff10] kthread at ffffffff8491eb44

      crashing instruction is on line 132 of fs/lockd/clntproc.c:

      125 static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
      126 {
      127         struct nlm_args *argp = &req->a_args;
      128         struct nlm_lock *lock = &argp->lock;
      129         char *nodename = req->a_host->h_rpcclnt->cl_nodename;
      130 
      131         nlmclnt_next_cookie(&argp->cookie);
      132         memcpy(&lock->fh, NFS_FH(locks_inode(fl->fl_file)), sizeof(struct nfs_fh));

      the file_lock (fl) passed into the function is still in %rbp

           RBP: ffff947a8afa1bd8   R8: 0000000000000000   R9: 0000000000000000
      
      crash> file_lock.fl_file ffff947a8afa1bd8
        fl_file = 0x0,

      The 'struct file' was then dereferenced as 'file->f_inode' inside locks_inode()

       

              rhn-support-yoyang Yongcheng Yang
              rhel-process-autobot RHEL Jira bot
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: