Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-88104

numad freezes during multiple access

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-8.6.0, rhel-9.4.z
    • numad
    • None
    • No
    • Important
    • rhel-systemd
    • ssg_core_services
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      Description of problem:
      numad -w will occasionally freeze if invoked concurrently with the numad daemon running

      Version-Release number of selected component (if applicable):
      0.5-26.20150602git

      How reproducible:
      Always

      Steps to Reproduce:
      1.Execute numad -w 4:4096 from multiple concurrent sessions. A good command line is: while true; do numad -w 4:4096; numad -w 4:4096; numad -w 4:4096; done

      2.Wait for some time. Perhaps restart the command sequence to create a suitable timing skew.
      3.

      Actual results:
      You will see one of the sessions freezing after a while

      Expected results:
      The sequence should run smoothly

      Additional info:

      This problem appeared during concurrent migration of several (n>3) VM:s in a KVM cluster. The VM:s had a numatune section looking like this:

      <vcpu placement='auto'>8</vcpu>
      <numatune>
      <memory mode='strict' placement='auto'/>
      </numatune>

      libvirt tried to use numad -w 8:4096 to figure out the placement and the freeze of numad would also freeze libvirtd, which led to pcs rebooting the node.

      Looking at the source of numad it seems like the potential problem is that each instance of numad will run init_msg_queue() at invocation. As this also does a flush_msg_queue() the queue will be emptied at every invocation. If this happens in the middle of the send/rcv-sequence of another invocation the other process will never get an answer and hang.

              lnykryn@redhat.com Lukáš Nykrýn
              jira-bugzilla-migration RH Bugzilla Integration
              Lukáš Nykrýn Lukáš Nykrýn
              RHEL CS Plumbers QE Bot RHEL CS Plumbers QE Bot
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: