Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-20355

System fault with over four NVMe disks using PCI passthrough

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • rhos-18.0.15
    • 2025.2 (Flamingo), rhos-18.0.11
    • openstack-placement
    • 5
    • False
    • False
    • ?
    • rhos-workloads-compute
    • None
    • Hide
      .Potential scaling issue in PCI in Placement

      If there are many similar child providers defined under the same root provider, the allocation candidate generation algorithm in the Placement service scales poorly with the default configuration of placement.

      For example, if a Compute node has 8 or more child resource providers, each providing one resource, and an instance requests 8 or more such resources each in independent request groups, then without further optimization enabled, the GET `allocation_candidates` query takes too long to calculate and the scheduling of the instance will fail.

      In this situation, make the following configuration changes in the OpenStackControlPlane CR:
      spec:
      placement:
          template:
            customServiceConfig: |
              [workarounds]
              optimize_for_wide_provider_trees = True
              [placement]
             max_allocation_candidates = 1000
              allocation_candidates_generation_strategy = breadth-first
      Show
      .Potential scaling issue in PCI in Placement If there are many similar child providers defined under the same root provider, the allocation candidate generation algorithm in the Placement service scales poorly with the default configuration of placement. For example, if a Compute node has 8 or more child resource providers, each providing one resource, and an instance requests 8 or more such resources each in independent request groups, then without further optimization enabled, the GET `allocation_candidates` query takes too long to calculate and the scheduling of the instance will fail. In this situation, make the following configuration changes in the OpenStackControlPlane CR: spec: placement:     template:       customServiceConfig: |         [workarounds]         optimize_for_wide_provider_trees = True         [placement]        max_allocation_candidates = 1000         allocation_candidates_generation_strategy = breadth-first
    • Known Issue
    • Done
    • Rejected
    • Sprint 5 Quasar & Pulsar, Sprint 6 Quasar & Pulsar, Sprint 9 Quasar & Pulsar
    • 3
    • Critical

      To Reproduce Steps to reproduce the behavior:

      When attempting to create a virtual instance with more than four NVMe disks using PCI passthrough, a system fault occurs.

      The following flavor is being used:

      $ openstack flavor show s4a.64x512.NVMEx8
      +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Field                      | Value                                                                                                                                                   |
      +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
      | OS-FLV-DISABLED:disabled   | False                                                                                                                                                   |
      | OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                                       |
      | access_project_ids         | None                                                                                                                                                    |
      | description                | None                                                                                                                                                    |
      | disk                       | 0                                                                                                                                                       |
      | id                         | 2e900e83-382a-48d5-a4e2-b20ea4a6cd0c                                                                                                                    |
      | name                       | s4a.64x512.NVMEx8                                                                                                                                       |
      | os-flavor-access:is_public | True                                                                                                                                                    |
      | properties                 | aggregate_instance_extra_specs:type='amd48nvme', hw:cpu_policy='dedicated', hw:mem_page_size='large', hw:numa_nodes='2', pci_passthrough:alias='nvme:8' |
      | ram                        | 524288                                                                                                                                                  |
      | rxtx_factor                | 1.0                                                                                                                                                     |
      | swap                       | 0                                                                                                                                                       |
      | vcpus                      | 64                                                                                                                                                      |
      +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+

      With this flavor it functions flawlessly w/ or without hw:numa_nodes

      $ openstack flavor show s4a.32x256.NVMEx4
      +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
      | Field                      | Value                                                                                                                                |
      +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
      | OS-FLV-DISABLED:disabled   | False                                                                                                                                |
      | OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                    |
      | access_project_ids         | None                                                                                                                                 |
      | description                | None                                                                                                                                 |
      | disk                       | 0                                                                                                                                    |
      | id                         | 742a1af3-2eed-4a27-b8e6-21a383b57288                                                                                                 |
      | name                       | s4a.32x256.NVMEx4                                                                                                                    |
      | os-flavor-access:is_public | True                                                                                                                                 |
      | properties                 | aggregate_instance_extra_specs:type='amd48nvme', hw:cpu_policy='dedicated', hw:mem_page_size='large', pci_passthrough:alias='nvme:4' |
      | ram                        | 262144                                                                                                                               |
      | rxtx_factor                | 1.0                                                                                                                                  |
      | swap                       | 0                                                                                                                                    |
      | vcpus                      | 32                                                                                                                                   |
      +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------+

       

      Device Info:

      $ oc get openstackversions.core.openstack.org
      NAME                    TARGET VERSION       AVAILABLE VERSION    DEPLOYED VERSION
      openstackcontrolplane   18.0.11-20250812.2   18.0.11-20250812.2   18.0.11-20250812.2

       

      02:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      03:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      04:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      05:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      24:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      25:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      26:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      27:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      64:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      65:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      66:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      67:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      84:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      85:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      c3:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      c4:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      c5:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      c6:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      e3:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      e4:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      e5:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      e6:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)

       

      I have unsuccessfully attempted to mitigate the issue using several extraconfig:

      1st attempt:

      $ oc get openstackcontrolplanes.core.openstack.org openstackcontrolplane -oyaml | yq -rC '.spec.placement.template.customServiceConfig' | sed 's/\\n/\n/g'
      [placement]
      randomize_allocation_candidates = True
      max_allocation_candidates = 100000
      allocation_candidates_generation_strategy = breadth-first
      
      $ oc exec -c placement-api pod/placement-6484d89798-wflxw -- cat /etc/placement/placement.conf.d/custom.conf
      [placement]
      randomize_allocation_candidates = True
      max_allocation_candidates = 100000
      allocation_candidates_generation_strategy = breadth-first

      2nd attempt

      $ oc get openstackcontrolplanes.core.openstack.org openstackcontrolplane -oyaml | yq -rC '.spec.placement.template.customServiceConfig' | sed 's/\\n/\n/g'
      [placement]
      max_allocation_candidates = -1
      allocation_candidates_generation_strategy = depth-first
      
      $ oc exec -c placement-api pod/placement-7fbb5fcd-h58zf -- cat /etc/placement/placement.conf.d/custom.conf
      [placement]
      max_allocation_candidates = -1
      allocation_candidates_generation_strategy = depth-first

      3rd attempt

      $ oc get openstackcontrolplanes.core.openstack.org openstackcontrolplane -oyaml | yq -rC '.spec.placement.template.customServiceConfig' | sed 's/\\n/\n/g'
      [placement]
      max_allocation_candidates = 50
      allocation_candidates_generation_strategy = breadth-first
      
      
      $ oc exec -c placement-api pod/placement-6fdbdb464-5lcfp -- cat /etc/placement/placement.conf.d/custom.conf
      [placement]
      max_allocation_candidates = 50
      allocation_candidates_generation_strategy = breadth-first

       

      Bug impact

      This impacts the platform release and customer on-boarding, both scheduled in the next few days.

        1. nova-scheduler-extract.log
          66 kB
          Matteo Piccinini
        2. placement-77747679fd-8cv8w.log
          9.93 MB
          Matteo Piccinini
        3. placement-77747679fd-lv67w.log
          44.10 MB
          Matteo Piccinini
        4. placement-77747679fd-nn47m.log
          33.68 MB
          Matteo Piccinini
        5. placement-api-containers-live.log
          27.50 MB
          Matteo Piccinini
        6. placement-55c9549974-v2qwn.log
          41.48 MB
          Matteo Piccinini
        7. placement-55c9549974-5dxtw.log
          38.58 MB
          Matteo Piccinini
        8. placement-55c9549974-sg9vs.log
          35.85 MB
          Matteo Piccinini
        9. scaled_pci_devices_system_fault
          44 kB
          James Parker

              rhn-gps-jparker James Parker
              rh-ee-mapiccin Matteo Piccinini
              James Parker James Parker
              rhos-workloads-compute
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: