-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
rhel-8.6.0
-
None
-
Important
-
rhel-plumbers
-
ssg_core_services
-
13
-
False
-
False
-
-
None
-
None
-
None
-
None
-
---
-
-
x86_64
-
None
-
57,005
Description of problem:
numad -w will occasionally freeze if invoked concurrently with the numad daemon running
Version-Release number of selected component (if applicable):
0.5-26.20150602git
How reproducible:
Always
Steps to Reproduce:
1.Execute numad -w 4:4096 from multiple concurrent sessions. A good command line is: while true; do numad -w 4:4096; numad -w 4:4096; numad -w 4:4096; done
2.Wait for some time. Perhaps restart the command sequence to create a suitable timing skew.
3.
Actual results:
You will see one of the sessions freezing after a while
Expected results:
The sequence should run smoothly
Additional info:
This problem appeared during concurrent migration of several (n>3) VM:s in a KVM cluster. The VM:s had a numatune section looking like this:
<vcpu placement='auto'>8</vcpu>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
libvirt tried to use numad -w 8:4096 to figure out the placement and the freeze of numad would also freeze libvirtd, which led to pcs rebooting the node.
Looking at the source of numad it seems like the potential problem is that each instance of numad will run init_msg_queue() at invocation. As this also does a flush_msg_queue() the queue will be emptied at every invocation. If this happens in the middle of the send/rcv-sequence of another invocation the other process will never get an answer and hang.
- is duplicated by
-
RHEL-88104 numad freezes during multiple access
-
- New
-
- external trackers