Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-36222

cifs: crash while filling pages with received data with cache=none

    • kernel-4.18.0-553.10.1.el8_10
    • None
    • Critical
    • rhel-sst-filesystems
    • ssg_filesystems_storage_and_HA
    • 8
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None

      What were you trying to do that didn't work?

      Multiple customers are getting frequent crashes with the following:

      [67723.909254] BUG: unable to handle kernel paging request at 0000000000038108
      [67723.909676] PGD 0 P4D 0 
      [67723.909905] Oops: 0000 [#1] SMP NOPTI
      [67723.910065] CPU: 0 PID: 16882 Comm: cifsd Kdump: loaded Not tainted 4.18.0-513.18.1.el8_9.x86_64 #1
      [67723.910221] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.18227214.B64.2106252220 06/25/2021
      [67723.910378] RIP: 0010:uncached_fill_pages+0x113/0x1b0 [cifs]
      

      In addition to attempting to dereference address 0000000000038108, the several fields of the cifs_readdata struct are corrupted:

        pagesz = 0x1000,
        page_offset = 0x6080c0,
        tailsz = 0x1000,
        credits = {
          value = 0x0,
          instance = 0x8
        },
        nr_pages = 0x608,
        pages = 0xffff9c4ec0004e00
      

      In all provided vmcores, the invalid address is 0000000000038108, page_offset is 0x6080c0, nr_pages is 0x608, and pages points to a kmalloc-192:

      crash> kmem -s 0xffff9c4ec0004e00
      CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
      ffff9c4ec0004e00      192      45428     45990   2190     4k  kmalloc-192
      

      Please provide the package NVR for which bug is seen:

      4.18.0-513.18.1.el8_9.x86_64
      4.18.0-477.27.1.el8_8.x86_64

      How reproducible:

      Very frequent on multiple servers for one customer, single report from second customer.

      Steps to reproduce

      Customer with multiple crashing servers reports that the crashes began only after they began mounting with cache=none. All vmcores appear to have cache=none

      Expected results

      no system panic

      Actual results

      kernel panic

        1. repro.c
          0.5 kB
          Paulo Alcantara

              paalcant@redhat.com Paulo Alcantara
              rhn-support-fsorenso Frank Sorenson
              CIFS Team CIFS Team
              Murphy Zhou Murphy Zhou
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: