• No
    • Moderate
    • TestOnly
    • 3
    • rhel-sst-virt-arm
    • ssg_virtualization
    • 5
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • Virt ARM 25-1, Virt ARM 25-2, Virt ARM 25-3
    • Pass
    • Automated
    • aarch64
    • None

      What were you trying to do that didn't work?

      Set the guest with single host cpu pinned, then start the guest, wait for 3 seconds then found the guest is crash with core dump.

      Please provide the package NVR for which the bug is seen:

      beaker host: fujitsu-fx700-01-n00.khw.eng.bos2.dc.redhat.com

      host packages versions:

      # rpm -q libvirt qemu-kvm kernel-64k
      libvirt-10.9.0-1.el9.aarch64
      qemu-kvm-9.1.0-1.el9.aarch64
      kernel-64k-5.14.0-527.el9.aarch64

       

      guest kernel: 5.14.0-524.el9.aarch64+64k

      How reproducible is this bug?:  5%

      Steps to reproduce

      1. Config the guest like below libvirt xml:

      <memory unit='KiB'>4194304</memory>
      <currentMemory unit='KiB'>4194304</currentMemory>
      <vcpu placement='static' cpuset='0'>4</vcpu>

      ...

      2. Start the guest

      # virsh start avocado-vt-vm1
      Domain 'avocado-vt-vm1' started
      

       

      3. Wait for 3 seconds then found the guest is shut off, and there is core dump for guest

      # virsh list --all
       Id   Name             State
      ---------------------------------
       -    avocado-vt-vm1   shut off
      

       

      core dump back trace is like below:

      (gdb) bt
      #0  0x0000ffff990c23c8 in __pthread_kill_implementation () from /lib64/libc.so.6
      #1  0x0000ffff9907a6bc in raise () from /lib64/libc.so.6
      #2  0x0000ffff99066fb4 in abort () from /lib64/libc.so.6
      #3  0x0000ffff99074010 in __assert_fail_base () from /lib64/libc.so.6
      #4  0x0000ffff99074080 in __assert_fail () from /lib64/libc.so.6
      #5  0x0000aaaad827f89c in render_memory_region ()
      #6  0x0000aaaad827f4dc in render_memory_region ()
      #7  0x0000aaaad827f59c in render_memory_region ()
      #8  0x0000aaaad827f004 in generate_memory_topology ()
      #9  0x0000aaaad8278334 in memory_region_transaction_commit ()
      #10 0x0000aaaad7ed1848 in pci_bridge_write_config ()
      #11 0x0000aaaad7ed9d2c in rp_write_config ()
      #12 0x0000aaaad8279828 in memory_region_write_accessor ()
      #13 0x0000aaaad8279654 in access_with_adjusted_size ()
      #14 0x0000aaaad82793e4 in memory_region_dispatch_write ()
      #15 0x0000aaaad828cdfc in flatview_write_continue_step ()
      #16 0x0000aaaad8286fbc in flatview_write ()
      #17 0x0000aaaad8286e50 in address_space_write ()
      #18 0x0000aaaad82dc398 in kvm_cpu_exec ()
      #19 0x0000aaaad82e1c5c in kvm_vcpu_thread_fn ()
      #20 0x0000aaaad848ff74 in qemu_thread_start ()
      #21 0x0000ffff990c0778 in start_thread () from /lib64/libc.so.6
      #22 0x0000ffff9912ad5c in thread_start () from /lib64/libc.so.6

      Expected results

      Guest should not crash.

      Actual results

      Guest crashed with core dump file.

      Additional info

      This issue is not reproducible if ping guest to multiple host cpus.

      This issue is not reproducible on server "ampere-mtjade-altra.." or "nvidia-grace-grace.."

       

            [RHEL-67106] Guest crash with single host cpu pinned

            Liang Cong added a comment - Verified as comment: https://issues.redhat.com/browse/RHEL-68997?focusedId=26770242&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-26770242

            The 'blocked by' issue RHEL-68997 is transitioned to Release Pending.

            RHEL Jira bot added a comment - The 'blocked by' issue RHEL-68997 is transitioned to Release Pending.

            Liang Cong added a comment -

            Liang Cong added a comment - As test results from comments: https://issues.redhat.com/browse/RHEL-22598?focusedId=26644564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-26644564 Mark Preliminary Testing result as Pass.

            Eric Auger added a comment -

            lcong@redhat.com 

            I posted https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6397

            Soon brew builds will be avail. Please can you test on FJ HW? Thank you in advance

            Eric Auger added a comment - lcong@redhat.com   I posted https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6397 Soon brew builds will be avail. Please can you test on FJ HW? Thank you in advance

            Liang Cong added a comment -

            hi eauger , currently I am running libvirt related test loops according to https://issues.redhat.com/browse/RHEL-68997, I think the issue what you are talking about is it, right?

            lcong@redhat.com OK thanks. This validates that removing the memmove fixes the issue. Now we need some tests with the original qemu code and various glibc brew build provided by Florian. See https://issues.redhat.com/issues/?filter=12434265

            Liang Cong added a comment - hi eauger , currently I am running libvirt related test loops according to https://issues.redhat.com/browse/RHEL-68997 , I think the issue what you are talking about is it, right? lcong@redhat.com  OK thanks. This validates that removing the memmove fixes the issue. Now we need some tests with the original qemu code and various glibc brew build provided by Florian. See  https://issues.redhat.com/issues/?filter=12434265

            Eric Auger added a comment -

            lcong@redhat.com OK thanks. This validates that removing the memmove fixes the issue. Now we need some tests with the original qemu code and various glibc brew build provided by Florian. See https://issues.redhat.com/issues/?filter=12434265

            Eric Auger added a comment - lcong@redhat.com OK thanks. This validates that removing the memmove fixes the issue. Now we need some tests with the original qemu code and various glibc brew build provided by Florian. See https://issues.redhat.com/issues/?filter=12434265

            Liang Cong added a comment -

            Liang Cong added a comment - Hi eauger   according to https://issues.redhat.com/browse/RHEL-22598?focusedId=26186664&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-26186664 , I have run 60000 test loops of test and did not hit core dump issue

            Eric Auger added a comment -

            lcong@redhat.com Please could you test with this qemu brew.

            http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=66028240

            This is not a final fix but this would help to determine if it is the cause. On my end I cannot reproduce anymore

            Eric Auger added a comment - lcong@redhat.com Please could you test with this qemu brew. http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=66028240 This is not a final fix but this would help to determine if it is the cause. On my end I cannot reproduce anymore

            Yey! Cool! So we potentially have a reproducer with 1out of5 instead of 1 out of 1000! Hope the rootcause the same.

            Please add your debugging updates to https://issues.redhat.com/browse/RHEL-67831

            Alexander Lougovski added a comment - Yey! Cool! So we potentially have a reproducer with 1out of5 instead of 1 out of 1000! Hope the rootcause the same. Please add your debugging updates to https://issues.redhat.com/browse/RHEL-67831

            Eric Auger added a comment -

            I am able to reproduce  with 5.14.0-529.el9.aarch64+64k and 9.6 64kB guest. I needed to reboot the guest about ~ 5 times. At first glance this looks the same as RHEL-22598 except the assertion is slightly different. However it hits in generate_memory_topology() so looks really similar. This time this is an access to the PCI bridge config that triggers the assertion and not pflash.

             

            Eric Auger added a comment - I am able to reproduce  with 5.14.0-529.el9.aarch64+64k and 9.6 64kB guest. I needed to reboot the guest about ~ 5 times. At first glance this looks the same as RHEL-22598 except the assertion is slightly different. However it hits in generate_memory_topology() so looks really similar. This time this is an access to the PCI bridge config that triggers the assertion and not pflash.  

              eauger Eric Auger
              lcong@redhat.com Liang Cong
              virt-maint virt-maint
              Liang Cong Liang Cong
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: