[RHEL-7113] Different behaviors for hotplugging dimm memory in guest with different access attr defined when there is nvdimm device plugged

Type: Bug
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhel-9.2.0
Component/s: libvirt
Labels:
- MigratedToJIRA
- QA_WhiteBoard:Memory_Management

Regression:
None
Severity:
None

Pool Team:

rhel-sst-virt-tools
Sub-System Group:

ssg_virtualization

Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Release Note Type:
If docs needed, set a value

Experience:
Architecture:

Unspecified
Bugzilla Bug:
RHBZ: 2177618

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

Description of problem:
Different behaviors for hotplugging dimm memory in guest with different access attr defined when there is nvdimm device plugged

Version-Release number of selected component (if applicable):
libvirt-9.0.0-8.el9_2.x86_64
qemu-kvm-7.2.0-11.el9_2.x86_64

Guest version:
os version: RHEL9.2
kernel version: 5.14.0-284.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a 512M file
truncate -s 512M /tmp/nvdimm

2. Define and Start a guest with memory, numa and nvdimm related config xml as below:
<maxMemory slots='16' unit='KiB'>52428800</maxMemory>
<memory unit='KiB'>2097152</memory>
<currentMemory unit='KiB'>2097152</currentMemory>
...
<numa>
<cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
<cell id='1' cpus='2-3' memory='1048576' unit='KiB'/>
</numa>
...
<memory model='nvdimm'>
<source>
<path>/tmp/nvdimm</path>
</source>
<target>
<size unit='KiB'>524288</size>
<node>1</node>
<label>
<size unit='KiB'>256</size>
</label>
</target>
</memory>
...

3. Check the guest memory
[in guest]

cat /proc/meminfo | grep MemTotal
MemTotal: 1736156 kB

4. Prepare a access defined dimm memory device config xml:

cat memory1.xml
<memory model='dimm' access='shared'> 
<source>
<pagesize unit='KiB'>4</pagesize>
</source>
<target>
<size unit='KiB'>524288</size>
<node>0</node>
</target>
</memory>

5. Hot plug the dimm memory device with config xml in step3

virsh attach-device vm1 memory1.xml
Device attached successfully

6. Check the guest memory again and guest memory is not increased.
[in guest]

cat /proc/meminfo | grep MemTotal
MemTotal: 1736156 kB

7. Check dmesg in guest and find related error
[in guest]

dmesg
...
[ 198.482981] Block size [0x8000000] unaligned hotplug range: start 0x11ffc0000, size 0x20000000
[ 198.483017] acpi PNP0C80:01: add_memory failed
[ 198.485362] acpi PNP0C80:01: acpi_memory_enable_device() error
[ 198.486377] acpi PNP0C80:01: Enumeration failure

8. If in step4 memory device is not defined without access attr like:

cat memory1.xml
<memory model='dimm'>
<source>
<pagesize unit='KiB'>4</pagesize>
</source>
.....

Then in step6 the guest would increase as:
[in guest]

cat /proc/meminfo | grep MemTotal
MemTotal: 2260444 kB

Actual results:
Different behavior for hotplugging dimm device in guest with different access attr.

Expected results:
Shared or private access defined dimm device should be same behavior with no defined dimm device.

Additional info:
Also checked other scenarios:
Note: the guest area memory size of nvdimm is 524288 KiB - 256 KiB = 524032 KiB, which is not multiple of 128M

If nvdimm guest area memory(total-size - label-size) is multiple of 128M as label size set as: 0 (no label size defined), 128M, 256M, 384M, then no matter how to set access attr, dimm device could be plugged successfully in guest.

For dimm device has no access attr defined. If set nvdimm label size [0, 2M), [128M, 130M), [256M, 258M).. the dimm device could be plugged successfully in guest.

So as the info above, the behaviors with different access attr defined are different.

external trackers

Red Hat Issue Tracker RHELPLAN-151532

Jaroslav Suchanek added a comment - 2024/10/16 1:28 PM

There is no demand for this and it's being kept open for long time. Please reopen if needed.

Jaroslav Suchanek added a comment - 2024/10/16 1:28 PM There is no demand for this and it's being kept open for long time. Please reopen if needed.

Michal Privoznik added a comment - 2024/10/14 8:11 AM

Agreed. Let's close this. We can always reopen if needed.

Michal Privoznik added a comment - 2024/10/14 8:11 AM Agreed. Let's close this. We can always reopen if needed.

John Ferlan added a comment - 2024/10/04 5:01 PM

mprivozn@redhat.com - I see no activity on this for an extended period of time - are you ok with closing this as WONTFIX? Additionally if it's still important perhaps create an upstream tracker.

John Ferlan added a comment - 2024/10/04 5:01 PM mprivozn@redhat.com - I see no activity on this for an extended period of time - are you ok with closing this as WONTFIX? Additionally if it's still important perhaps create an upstream tracker.

pm-rhel added a comment - 2023/09/22 1:24 PM

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

pm-rhel added a comment - 2023/09/22 1:24 PM Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

David Hildenbrand added a comment - 2023/04/19 7:12 AM

(In reply to Michal Privoznik from comment #22)
> (In reply to David Hildenbrand from comment #21)
>
> Spoiler alert: I know next to nothing about memory mgmt.
>
> > It's still sub-optimal, though. Hotplugging a 128 MiB DIMM first followed by
> > a 256 MiB DIMM would unnecessarily create a hole ...
>
> Can you enlighten me please - why are holes bad? Is it because if a DIMM is
> backed by a hugepage then it's wasteful?

Because the GPA will be fragmented. For Linux, this implies that certain operations, such as memory compaction, get more expensive because Linux as to consider holes in memory zones and has to scan over these holes.

Further, Linux cannot make use of that memory for larger allocations (such as gigantic pages). It's a secondary concern, though.

> Also - how is this solved at real HW level? I mean, when I plug a DIMM into
> a slot, it might too create a hole, couldn't it?

I was told by Intel a while ago that real HW does not support hotplug of individual DIMMs, but only complete NUMA nodes. Holes between other nodes are less of a concern (in Linux, it's separate memory zones either way). So it's not really an issue on real HW.

David Hildenbrand added a comment - 2023/04/19 7:12 AM (In reply to Michal Privoznik from comment #22) > (In reply to David Hildenbrand from comment #21) > > Spoiler alert: I know next to nothing about memory mgmt. > > > It's still sub-optimal, though. Hotplugging a 128 MiB DIMM first followed by > > a 256 MiB DIMM would unnecessarily create a hole ... > > Can you enlighten me please - why are holes bad? Is it because if a DIMM is > backed by a hugepage then it's wasteful? Because the GPA will be fragmented. For Linux, this implies that certain operations, such as memory compaction, get more expensive because Linux as to consider holes in memory zones and has to scan over these holes. Further, Linux cannot make use of that memory for larger allocations (such as gigantic pages). It's a secondary concern, though. > Also - how is this solved at real HW level? I mean, when I plug a DIMM into > a slot, it might too create a hole, couldn't it? I was told by Intel a while ago that real HW does not support hotplug of individual DIMMs, but only complete NUMA nodes. Holes between other nodes are less of a concern (in Linux, it's separate memory zones either way). So it's not really an issue on real HW.

Michal Privoznik added a comment - 2023/04/19 6:49 AM

(In reply to David Hildenbrand from comment #21)

Spoiler alert: I know next to nothing about memory mgmt.

> It's still sub-optimal, though. Hotplugging a 128 MiB DIMM first followed by
> a 256 MiB DIMM would unnecessarily create a hole ...

Can you enlighten me please - why are holes bad? Is it because if a DIMM is backed by a hugepage then it's wasteful?
Also - how is this solved at real HW level? I mean, when I plug a DIMM into a slot, it might too create a hole, couldn't it?

Michal Privoznik added a comment - 2023/04/19 6:49 AM (In reply to David Hildenbrand from comment #21) Spoiler alert: I know next to nothing about memory mgmt. > It's still sub-optimal, though. Hotplugging a 128 MiB DIMM first followed by > a 256 MiB DIMM would unnecessarily create a hole ... Can you enlighten me please - why are holes bad? Is it because if a DIMM is backed by a hugepage then it's wasteful? Also - how is this solved at real HW level? I mean, when I plug a DIMM into a slot, it might too create a hole, couldn't it?

David Hildenbrand added a comment - 2023/04/18 3:36 PM

(In reply to David Hildenbrand from comment #20)
> (In reply to Michal Privoznik from comment #19)
> > (In reply to David Hildenbrand from comment #17)
> > > Getting that intended minimum alignment from the user is IMHO better than
> > > hard-coding it in QEMU and having to deal with compat handling.
> >
> > But problem is whether user will know what value to put in. To sum up:
> >
> > QEMU knows what values are acceptable, but not which OS is running in the
> > guest,
> > libvirt does not know what value to pass, nor which OS is running in the
> > guest,
> > user does not know what value to pass, but it knows what OS is running in
> > the guest.
>
> QEMU most certainly knows the least
>
> Again, the user already has to be aware of guest OS restrictions. While
> hotplugging a 128 MiB DIMM to a VM running an arm64 Linux kernel with 4k
> page size will work, it's unusable by an arm64 Linux kernel with a 64k page
> size. Just like the minimum granularity, the alignment is guest-OS specific.
>
> >
> > So I wonder whether we should:
> > a) chose a reasonable default in QEMU, and possibly
>
> I'm afraid that will require compat machine changes.
>
> And there is no reasonable default for arm64, for example, without knowing
> what's running inside the VM. Using an alignment of 512MiB just because the
> guest could be running a 64k kernel fragments guest physical address space
> when hotplugging 128 MiB DIMMs.

BTW, I was playing with the idea of deciding the alignment based on the size.

DIMM size is multiples of 128 MiB -> align to 128 MiB
DIMM size is multiples of 256 MiB -> align to 256 MiB
DIMM size is multiples of 512 MiB -> align to 512 MiB

It's still sub-optimal, though. Hotplugging a 128 MiB DIMM first followed by a 256 MiB DIMM would unnecessarily create a hole ...

David Hildenbrand added a comment - 2023/04/18 3:36 PM (In reply to David Hildenbrand from comment #20) > (In reply to Michal Privoznik from comment #19) > > (In reply to David Hildenbrand from comment #17) > > > Getting that intended minimum alignment from the user is IMHO better than > > > hard-coding it in QEMU and having to deal with compat handling. > > > > But problem is whether user will know what value to put in. To sum up: > > > > QEMU knows what values are acceptable, but not which OS is running in the > > guest, > > libvirt does not know what value to pass, nor which OS is running in the > > guest, > > user does not know what value to pass, but it knows what OS is running in > > the guest. > > QEMU most certainly knows the least > > Again, the user already has to be aware of guest OS restrictions. While > hotplugging a 128 MiB DIMM to a VM running an arm64 Linux kernel with 4k > page size will work, it's unusable by an arm64 Linux kernel with a 64k page > size. Just like the minimum granularity, the alignment is guest-OS specific. > > > > > So I wonder whether we should: > > a) chose a reasonable default in QEMU, and possibly > > I'm afraid that will require compat machine changes. > > And there is no reasonable default for arm64, for example, without knowing > what's running inside the VM. Using an alignment of 512MiB just because the > guest could be running a 64k kernel fragments guest physical address space > when hotplugging 128 MiB DIMMs. BTW, I was playing with the idea of deciding the alignment based on the size. DIMM size is multiples of 128 MiB -> align to 128 MiB DIMM size is multiples of 256 MiB -> align to 256 MiB DIMM size is multiples of 512 MiB -> align to 512 MiB It's still sub-optimal, though. Hotplugging a 128 MiB DIMM first followed by a 256 MiB DIMM would unnecessarily create a hole ...

David Hildenbrand added a comment - 2023/04/18 2:57 PM

(In reply to Michal Privoznik from comment #19)
> (In reply to David Hildenbrand from comment #17)
> > Getting that intended minimum alignment from the user is IMHO better than
> > hard-coding it in QEMU and having to deal with compat handling.
>
> But problem is whether user will know what value to put in. To sum up:
>
> QEMU knows what values are acceptable, but not which OS is running in the
> guest,
> libvirt does not know what value to pass, nor which OS is running in the
> guest,
> user does not know what value to pass, but it knows what OS is running in
> the guest.

QEMU most certainly knows the least

Again, the user already has to be aware of guest OS restrictions. While hotplugging a 128 MiB DIMM to a VM running an arm64 Linux kernel with 4k page size will work, it's unusable by an arm64 Linux kernel with a 64k page size. Just like the minimum granularity, the alignment is guest-OS specific.

>
> So I wonder whether we should:
> a) chose a reasonable default in QEMU, and possibly

I'm afraid that will require compat machine changes.

And there is no reasonable default for arm64, for example, without knowing what's running inside the VM. Using an alignment of 512MiB just because the guest could be running a 64k kernel fragments guest physical address space when hotplugging 128 MiB DIMMs.

David Hildenbrand added a comment - 2023/04/18 2:57 PM (In reply to Michal Privoznik from comment #19) > (In reply to David Hildenbrand from comment #17) > > Getting that intended minimum alignment from the user is IMHO better than > > hard-coding it in QEMU and having to deal with compat handling. > > But problem is whether user will know what value to put in. To sum up: > > QEMU knows what values are acceptable, but not which OS is running in the > guest, > libvirt does not know what value to pass, nor which OS is running in the > guest, > user does not know what value to pass, but it knows what OS is running in > the guest. QEMU most certainly knows the least Again, the user already has to be aware of guest OS restrictions. While hotplugging a 128 MiB DIMM to a VM running an arm64 Linux kernel with 4k page size will work, it's unusable by an arm64 Linux kernel with a 64k page size. Just like the minimum granularity, the alignment is guest-OS specific. > > So I wonder whether we should: > a) chose a reasonable default in QEMU, and possibly I'm afraid that will require compat machine changes. And there is no reasonable default for arm64, for example, without knowing what's running inside the VM. Using an alignment of 512MiB just because the guest could be running a 64k kernel fragments guest physical address space when hotplugging 128 MiB DIMMs.

Michal Privoznik added a comment - 2023/04/18 2:35 PM

(In reply to David Hildenbrand from comment #17)
> Getting that intended minimum alignment from the user is IMHO better than
> hard-coding it in QEMU and having to deal with compat handling.

But problem is whether user will know what value to put in. To sum up:

QEMU knows what values are acceptable, but not which OS is running in the guest,
libvirt does not know what value to pass, nor which OS is running in the guest,
user does not know what value to pass, but it knows what OS is running in the guest.

So I wonder whether we should:
a) chose a reasonable default in QEMU, and possibly
b) offer users a way to tweak the alignment.

Michal Privoznik added a comment - 2023/04/18 2:35 PM (In reply to David Hildenbrand from comment #17) > Getting that intended minimum alignment from the user is IMHO better than > hard-coding it in QEMU and having to deal with compat handling. But problem is whether user will know what value to put in. To sum up: QEMU knows what values are acceptable, but not which OS is running in the guest, libvirt does not know what value to pass, nor which OS is running in the guest, user does not know what value to pass, but it knows what OS is running in the guest. So I wonder whether we should: a) chose a reasonable default in QEMU, and possibly b) offer users a way to tweak the alignment.

David Hildenbrand added a comment - 2023/04/14 8:05 AM

(In reply to Igor Mammedov from comment #16)
> QEMU already reserves 1G of GPA per device, so why not align every one on 1G
> border (without adding any new options)?

We only do that on x86 so far IIRC, and only for memory devices that require an ACPI slot (we don't know how many other devices we might have). The underlying reason IIRC, was to handle memory backends with gigantic pages that require a certain alignment in GPA. So on x86 we could eventually align only such devices (DIMMs/NVDIMMs) to 1 GiB without further changes. For everything else, we could break existing setups eventually and would require some compat handling (I recall that any such gpa layout changes might require compat handling, but at least libvirt should be able to deal with that). A user option won't require gluing that to compat machines.

Aligning all DIMMs to 1 GiB is also not really desired IMHO. If you hotplug multiple smaller DIMMs (< 1 GiB, which apparently users do for Kata and such), you'd get quite a lot of (large) GPA holes in between, implying that PFN walkers (like compaction) inside the VM get more expensive (i.e., zones not contiguous) and that such memory can never get used for larger contiguous allocations (such as gigantic pages).

Ideally, we don't get any holes, even when hotplugging DIMMs that are any multiples of 128 MiB (on x86), which is the common case and only doesn't work because NVDIMMs do weird stuff with the labels. But that 128 MiB alignment is both guest and arch specific.

Getting that intended minimum alignment from the user is IMHO better than hard-coding it in QEMU and having to deal with compat handling.

David Hildenbrand added a comment - 2023/04/14 8:05 AM (In reply to Igor Mammedov from comment #16) > QEMU already reserves 1G of GPA per device, so why not align every one on 1G > border (without adding any new options)? We only do that on x86 so far IIRC, and only for memory devices that require an ACPI slot (we don't know how many other devices we might have). The underlying reason IIRC, was to handle memory backends with gigantic pages that require a certain alignment in GPA. So on x86 we could eventually align only such devices (DIMMs/NVDIMMs) to 1 GiB without further changes. For everything else, we could break existing setups eventually and would require some compat handling (I recall that any such gpa layout changes might require compat handling, but at least libvirt should be able to deal with that). A user option won't require gluing that to compat machines. Aligning all DIMMs to 1 GiB is also not really desired IMHO. If you hotplug multiple smaller DIMMs (< 1 GiB, which apparently users do for Kata and such), you'd get quite a lot of (large) GPA holes in between, implying that PFN walkers (like compaction) inside the VM get more expensive (i.e., zones not contiguous) and that such memory can never get used for larger contiguous allocations (such as gigantic pages). Ideally, we don't get any holes, even when hotplugging DIMMs that are any multiples of 128 MiB (on x86), which is the common case and only doesn't work because NVDIMMs do weird stuff with the labels. But that 128 MiB alignment is both guest and arch specific. Getting that intended minimum alignment from the user is IMHO better than hard-coding it in QEMU and having to deal with compat handling.

Assignee:: Michal Privoznik

Reporter:: Liang Cong

Developer:: Michal Privoznik

QA Contact:: Liang Cong

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2023/09/22 1:24 PM

Updated:: 2025/02/17 4:38 AM

Resolved:: 2024/10/16 1:28 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Jaroslav Suchanek added a comment - 2024/10/16 1:28 PM

Expand comment: Jaroslav Suchanek added a comment - 2024/10/16 1:28 PM

Collapse comment: Michal Privoznik added a comment - 2024/10/14 8:11 AM

Expand comment: Michal Privoznik added a comment - 2024/10/14 8:11 AM

Collapse comment: John Ferlan added a comment - 2024/10/04 5:01 PM

Expand comment: John Ferlan added a comment - 2024/10/04 5:01 PM

Collapse comment: pm-rhel added a comment - 2023/09/22 1:24 PM

Expand comment: pm-rhel added a comment - 2023/09/22 1:24 PM

Collapse comment: David Hildenbrand added a comment - 2023/04/19 7:12 AM

Expand comment: David Hildenbrand added a comment - 2023/04/19 7:12 AM

Collapse comment: Michal Privoznik added a comment - 2023/04/19 6:49 AM

Expand comment: Michal Privoznik added a comment - 2023/04/19 6:49 AM

Collapse comment: David Hildenbrand added a comment - 2023/04/18 3:36 PM

Expand comment: David Hildenbrand added a comment - 2023/04/18 3:36 PM

Collapse comment: David Hildenbrand added a comment - 2023/04/18 2:57 PM

Expand comment: David Hildenbrand added a comment - 2023/04/18 2:57 PM

Collapse comment: Michal Privoznik added a comment - 2023/04/18 2:35 PM

Expand comment: Michal Privoznik added a comment - 2023/04/18 2:35 PM

Collapse comment: David Hildenbrand added a comment - 2023/04/14 8:05 AM

Expand comment: David Hildenbrand added a comment - 2023/04/14 8:05 AM

People

Dates