Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-135457

[RHEL10.2] Guest gets rebooted due to memory error injection

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • rhel-10.2
    • rhel-10.2
    • qemu-kvm
    • Yes
    • Important
    • 1
    • rhel-virt-hwe-arm-1
    • 23
    • 27
    • None
    • QE ack, Dev ack
    • False
    • False
    • Hide

      None

      Show
      None
    • No
    • Split items
    • Unspecified Release Note Type - Unknown
    • Unspecified
    • Unspecified
    • Unspecified
    • aarch64
    • None
    • Merge Request passes all submitter checks, Merge Request finished CI testing, Merge Request passed CI testing, Merge Request approved by peer review

      What were you trying to do that didn't work?

      The qemu-kvm process on the host is terminated by a SIGBUS signal during non-fatal memory error injection, causing the guest VM to reboot.

      What is the impact of this issue to you?

      The memory RAS feature is broken

      Please provide the package NVR for which the bug is seen:

      Using RHEL-10.2-20251118.1 BaseOS aarch64 
      qemu-kvm-10.1.0-8.el10.aarch64
      libvirt-11.10.0-1.el10.aarch64
      6.12.0-171.el10.aarch64 (both 4k and 64k)

      How reproducible is this bug?:

      100%

      Steps to reproduce

      1. Many steps, follow the document. The failing case is 0x10. https://docs.google.com/document/d/1vboOkC7I8WlTItgKpSDwuegY3HniCkh7G6YY5NA2m7Q/edit?tab=t.0#heading=h.rrbpzx9u90i5 
      2. [root@ampere-mtsnow-altramax-37 /]# ./einj.sh 0x801540f7000 0x10 

      Expected results

      The injected memory error is detected by QEMU and reported to the guest kernel without causing the guest to reboot

      Actual results

      The injected memory error caused the guest to reboot

      Key lines from running `dmesg -w` on the host machine:

      [  874.423343] EDAC MC0: 1 UE multi-bit ECC on unknown memory (node:0 card:2 page:0x801540f7 offset:0x0 grain:1 - APEI location: node:0 card:2 status(0x0000000000000400): Storage error in DRAM memory)
      [  874.446545] Memory failure: 0x801540f7: Sending SIGBUS to qemu-kvm:4470 due to hardware memory corruption
      [  874.456118] Memory failure: 0x801540f7: recovery action for dirty LRU page: Recovered

              rh-ee-gshan Guowen Shan
              rh-ee-jugraham Julia Graham
              virt-maint virt-maint
              virt-bugs virt-bugs
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: