Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-83857

[RHEL 9.5][vdo]NULL ptr duplicate.zone dereference in assert_data_vio_in_duplicate_zone()

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • rhel-9.7
    • rhel-9.5.z
    • vdo
    • kmod-kvdo-8.2.6.3-165.el9
    • No
    • Important
    • ZStream
    • Customer Facing
    • rhel-storage-dm
    • ssg_platform_storage
    • 16
    • 22
    • 2
    • Dev ack, PXE ack
    • False
    • False
    • Hide

      None

      Show
      None
    • Yes
    • None
    • Approved Blocker
    • Bug Fix
    • Hide
      .VDO driver no longer crashes due to null pointer dereference

      Before this update, writing a mix of new and duplicate data to a VDO device under certain timing conditions left a dangling pointer. As a consequence, this caused a null pointer dereference and system crash. With this release, the dangling pointer issue is fixed. As a result, the VDO driver continues to run and saves user data.
      Show
      .VDO driver no longer crashes due to null pointer dereference Before this update, writing a mix of new and duplicate data to a VDO device under certain timing conditions left a dangling pointer. As a consequence, this caused a null pointer dereference and system crash. With this release, the dangling pointer issue is fixed. As a result, the VDO driver continues to run and saves user data.
    • Done
    • Done
    • Done
    • Not Required
    • x86_64
    • None

      Problem:

      This is essentially a reopen of Jira https://issues.redhat.com/browse/RHEL-42515 

      https://issues.redhat.com/browse/RHEL-42515

      System crashes with the kernel panic stack trace:

      crash> bt
      PID: 1375     TASK: ff3746aac7fda300  CPU: 19   COMMAND: "kvdo0:hashQ0"
       #0 [ff7b836248aa3bf0] machine_kexec at ffffffffb6e7a897
       #1 [ff7b836248aa3c48] __crash_kexec at ffffffffb6ffaeba
       #2 [ff7b836248aa3d08] crash_kexec at ffffffffb6ffbfe8
       #3 [ff7b836248aa3d10] oops_end at ffffffffb6e31dea
       #4 [ff7b836248aa3d30] page_fault_oops at ffffffffb6e8c25b
       #5 [ff7b836248aa3d88] exc_page_fault at ffffffffb7ad2d62
       #6 [ff7b836248aa3db0] asm_exc_page_fault at ffffffffb7c00bb2
          [exception RIP: finish_querying+0xca]
          RIP: ffffffffc0e6c87a  RSP: ff7b836248aa3e60  RFLAGS: 00010246
          RAX: 0000000000000000  RBX: ff7b83624a2638b8  RCX: 0000000000000017
          RDX: ff7b83624a248410  RSI: 0000000000000004  RDI: ff7b836249b60f50
          RBP: ff7b836249b60f50   R8: ff7b83624a248410   R9: ff7b83624a248410
          R10: 000000000000002a  R11: ff7b836249b70fe0  R12: ff7b83624a2e2e28
          R13: ff7b83624a23d148  R14: ff7b83624a263950  R15: ff7b83624a23d1c0
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ff7b836248aa3e98] service_work_queue at ffffffffc0eb2cb3 [kvdo]
       #8 [ff7b836248aa3f00] work_queue_runner at ffffffffc0eb2ef8 [kvdo]
       #9 [ff7b836248aa3f18] kthread at ffffffffb6f38abd
      #10 [ff7b836248aa3f50] ret_from_fork at ffffffffb6e03e89

       

      The NULL pointer dereference happens in:

      finish_querying()
         start_locking() — inlined
            launch_data_vio_duplicate_zone_callback() — inlined
       
       
      

       

      static void lock_duplicate_pbn(struct vdo_completion *completion)
      {
         unsigned int increment_limit;
         struct pbn_lock *lock;
         int result;
         struct data_vio *agent = as_data_vio(completion);
         struct slab_depot *depot = vdo_from_data_vio(agent)->depot;
         struct physical_zone *zone = agent->duplicate.zone; <--- this was NULL
         assert_data_vio_in_duplicate_zone(agent); <-- dereference the NULL pointer in this function (see below)
      ...
      }
      

       

       

      /**
      
      assert_data_vio_in_duplicate_zone() - Check that a data_vio is running on the correct thread for its duplicate zone. @data_vio: The data_vio in question.
      */
      static inline void assert_data_vio_in_duplicate_zone(struct data_vio *data_vio)
      {
          thread_id_t expected = data_vio->duplicate.zone->thread_id; <-- this is the place of dereference
      ...
      } 

       

       
      crash> struct data_vio.duplicate 0xff7b836249b60f50
        duplicate = {
          pbn = 0x0,
          state = VDO_MAPPING_STATE_UNMAPPED,
          zone = 0x0   <-- reason for the crash
        },
      

      Further vmcore analysis (and the cofre location) notes in subsequent comments.

       

      What is the impact of this issue to you?

      System panic and crash, disruption to production

      Please provide the package NVR for which the bug is seen:

      • RHEL 9.5.z, kernel version 5.14.0-503.29.1.el9_5.x86_64
      • kmod-kvdo 8.2.4.15-141.el9_5

      How reproducible is this bug?:

      Not at will, during normal production

       

              cchung@redhat.com Chung Chung
              rhn-support-ssaner Stan Saner
              Krishnaswamy Krishna Kumar
              Kenneth Raeburn Kenneth Raeburn
              Filip Suba Filip Suba
              Angana Chakraborty Angana Chakraborty
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: