Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

SWIFT: POC Conversion

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: rhel-9.7
Affects Version/s: rhel-9.5.z
Component/s: vdo
Labels:
- rn-yml

Fixed in Build:
kmod-kvdo-8.2.6.3-165.el9
Regression:
No
Severity:
Important
Keywords:

ZStream
Customer Impact:

Customer Facing

AssignedTeam:
rhel-storage-dm
Sub-System Group:

ssg_platform_storage

Dev Target Milestone:
16
Internal Target Milestone:
22
Story Points:
2
ACKs Check:

Dev ack, PXE ack
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
Yes
Sprint:
None
Release Blocker:
Approved Blocker
Target Backport Versions:

rhel-9.6.z

Preliminary Testing:
Pass
Errata Link:
https://errata.engineering.redhat.com/advisory/146896
Test Coverage:

RegressionOnly

Release Note Type:
Bug Fix
Release Note Text:

Hide
.VDO driver no longer crashes due to null pointer dereference

Before this update, writing a mix of new and duplicate data to a VDO device under certain timing conditions left a dangling pointer. As a consequence, this caused a null pointer dereference and system crash. With this release, the dangling pointer issue is fixed. As a result, the VDO driver continues to run and saves user data.

Show
.VDO driver no longer crashes due to null pointer dereference Before this update, writing a mix of new and duplicate data to a VDO device under certain timing conditions left a dangling pointer. As a consequence, this caused a null pointer dereference and system crash. With this release, the dangling pointer issue is fixed. As a result, the VDO driver continues to run and saves user data.
Release Note Status:
Done
ProdDocsReview-CCS:
Done
ProdDocsReview-Dev:
Done
ProdDocsReview-QE:
Not Required

Experience:
Architecture:

x86_64

PX Impact Score:
PX Technical Impact:
PX Impact Range:
PX Priority Data:
PX Review Complete:
PX Scheduling Request:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

Problem:

This is essentially a reopen of Jira https://issues.redhat.com/browse/RHEL-42515

https://issues.redhat.com/browse/RHEL-42515

System crashes with the kernel panic stack trace:

crash> bt
PID: 1375 TASK: ff3746aac7fda300 CPU: 19 COMMAND: "kvdo0:hashQ0"
#0 [ff7b836248aa3bf0] machine_kexec at ffffffffb6e7a897
#1 [ff7b836248aa3c48] __crash_kexec at ffffffffb6ffaeba
#2 [ff7b836248aa3d08] crash_kexec at ffffffffb6ffbfe8
#3 [ff7b836248aa3d10] oops_end at ffffffffb6e31dea
#4 [ff7b836248aa3d30] page_fault_oops at ffffffffb6e8c25b
#5 [ff7b836248aa3d88] exc_page_fault at ffffffffb7ad2d62
#6 [ff7b836248aa3db0] asm_exc_page_fault at ffffffffb7c00bb2
[exception RIP: finish_querying+0xca]
RIP: ffffffffc0e6c87a RSP: ff7b836248aa3e60 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ff7b83624a2638b8 RCX: 0000000000000017
RDX: ff7b83624a248410 RSI: 0000000000000004 RDI: ff7b836249b60f50
RBP: ff7b836249b60f50 R8: ff7b83624a248410 R9: ff7b83624a248410
R10: 000000000000002a R11: ff7b836249b70fe0 R12: ff7b83624a2e2e28
R13: ff7b83624a23d148 R14: ff7b83624a263950 R15: ff7b83624a23d1c0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ff7b836248aa3e98] service_work_queue at ffffffffc0eb2cb3 [kvdo]
#8 [ff7b836248aa3f00] work_queue_runner at ffffffffc0eb2ef8 [kvdo]
#9 [ff7b836248aa3f18] kthread at ffffffffb6f38abd
#10 [ff7b836248aa3f50] ret_from_fork at ffffffffb6e03e89

The NULL pointer dereference happens in:

finish_querying()
   start_locking() — inlined
      launch_data_vio_duplicate_zone_callback() — inlined

static void lock_duplicate_pbn(struct vdo_completion *completion)
{
   unsigned int increment_limit;
   struct pbn_lock *lock;
   int result;
   struct data_vio *agent = as_data_vio(completion);
   struct slab_depot *depot = vdo_from_data_vio(agent)->depot;
   struct physical_zone *zone = agent->duplicate.zone; <--- this was NULL
   assert_data_vio_in_duplicate_zone(agent); <-- dereference the NULL pointer in this function (see below)
...
}

/**

assert_data_vio_in_duplicate_zone() - Check that a data_vio is running on the correct thread for its duplicate zone. @data_vio: The data_vio in question.
*/
static inline void assert_data_vio_in_duplicate_zone(struct data_vio *data_vio)
{
    thread_id_t expected = data_vio->duplicate.zone->thread_id; <-- this is the place of dereference
...
}

 
crash> struct data_vio.duplicate 0xff7b836249b60f50
  duplicate = {
    pbn = 0x0,
    state = VDO_MAPPING_STATE_UNMAPPED,
    zone = 0x0   <-- reason for the crash
  },

Further vmcore analysis (and the cofre location) notes in subsequent comments.

What is the impact of this issue to you?

System panic and crash, disruption to production

Please provide the package NVR for which the bug is seen:

RHEL 9.5.z, kernel version 5.14.0-503.29.1.el9_5.x86_64
kmod-kvdo 8.2.4.15-141.el9_5

How reproducible is this bug?:

Not at will, during normal production

links to

NULL pointer dereference in kvdo module code

RHBA-2025:146896 kmod-kvdo bug fix and enhancement update

Assignee:: Chung Chung

Reporter:: Stan Saner

Contributors:: Krishnaswamy Krishna Kumar

Developer:: Kenneth Raeburn

QA Contact:: Filip Suba

Doc Contact:: Angana Chakraborty

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Created:: 2025/03/17 9:22 PM

Updated:: 2025/11/11 10:19 AM

Resolved:: 2025/11/11 10:19 AM

Dev Target end:: 2025/06/16

Target end:: 2025/07/28

Next Planned Release Date:: 2025/11/11

Release Date:: 2025/11/11

Details

Description

Problem:

This is essentially a reopen of Jira https://issues.redhat.com/browse/RHEL-42515

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates