Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-20355

System fault with over four NVMe disks using PCI passthrough

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • rhos-18.0.15
    • 2025.2 (Flamingo), rhos-18.0.11
    • openstack-placement
    • 5
    • False
    • False
    • ?
    • rhos-workloads-compute
    • None
    • Hide
      .Potential scaling issue in PCI in Placement

      If there are many similar child providers defined under the same root provider, the allocation candidate generation algorithm in the Placement service scales poorly with the default configuration of placement.

      For example, if a Compute node has 8 or more child resource providers, each providing one resource, and an instance requests 8 or more such resources each in independent request groups, then without further optimization enabled, the GET `allocation_candidates` query takes too long to calculate and the scheduling of the instance will fail.

      In this situation, make the following configuration changes in the OpenStackControlPlane CR:
      spec:
      placement:
          template:
            customServiceConfig: |
              [workarounds]
              optimize_for_wide_provider_trees = True
              [placement]
             max_allocation_candidates = 1000
              allocation_candidates_generation_strategy = breadth-first
      Show
      .Potential scaling issue in PCI in Placement If there are many similar child providers defined under the same root provider, the allocation candidate generation algorithm in the Placement service scales poorly with the default configuration of placement. For example, if a Compute node has 8 or more child resource providers, each providing one resource, and an instance requests 8 or more such resources each in independent request groups, then without further optimization enabled, the GET `allocation_candidates` query takes too long to calculate and the scheduling of the instance will fail. In this situation, make the following configuration changes in the OpenStackControlPlane CR: spec: placement:     template:       customServiceConfig: |         [workarounds]         optimize_for_wide_provider_trees = True         [placement]        max_allocation_candidates = 1000         allocation_candidates_generation_strategy = breadth-first
    • Known Issue
    • Done
    • Rejected
    • Sprint 5 Quasar & Pulsar, Sprint 6 Quasar & Pulsar, Sprint 9 Quasar & Pulsar
    • 3
    • Critical

      To Reproduce Steps to reproduce the behavior:

      When attempting to create a virtual instance with more than four NVMe disks using PCI passthrough, a system fault occurs.

      The following flavor is being used:

      $ openstack flavor show s4a.64x512.NVMEx8
      +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Field                      | Value                                                                                                                                                   |
      +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
      | OS-FLV-DISABLED:disabled   | False                                                                                                                                                   |
      | OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                                       |
      | access_project_ids         | None                                                                                                                                                    |
      | description                | None                                                                                                                                                    |
      | disk                       | 0                                                                                                                                                       |
      | id                         | 2e900e83-382a-48d5-a4e2-b20ea4a6cd0c                                                                                                                    |
      | name                       | s4a.64x512.NVMEx8                                                                                                                                       |
      | os-flavor-access:is_public | True                                                                                                                                                    |
      | properties                 | aggregate_instance_extra_specs:type='amd48nvme', hw:cpu_policy='dedicated', hw:mem_page_size='large', hw:numa_nodes='2', pci_passthrough:alias='nvme:8' |
      | ram                        | 524288                                                                                                                                                  |
      | rxtx_factor                | 1.0                                                                                                                                                     |
      | swap                       | 0                                                                                                                                                       |
      | vcpus                      | 64                                                                                                                                                      |
      +----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+

      With this flavor it functions flawlessly w/ or without hw:numa_nodes

      $ openstack flavor show s4a.32x256.NVMEx4
      +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
      | Field                      | Value                                                                                                                                |
      +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
      | OS-FLV-DISABLED:disabled   | False                                                                                                                                |
      | OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                    |
      | access_project_ids         | None                                                                                                                                 |
      | description                | None                                                                                                                                 |
      | disk                       | 0                                                                                                                                    |
      | id                         | 742a1af3-2eed-4a27-b8e6-21a383b57288                                                                                                 |
      | name                       | s4a.32x256.NVMEx4                                                                                                                    |
      | os-flavor-access:is_public | True                                                                                                                                 |
      | properties                 | aggregate_instance_extra_specs:type='amd48nvme', hw:cpu_policy='dedicated', hw:mem_page_size='large', pci_passthrough:alias='nvme:4' |
      | ram                        | 262144                                                                                                                               |
      | rxtx_factor                | 1.0                                                                                                                                  |
      | swap                       | 0                                                                                                                                    |
      | vcpus                      | 32                                                                                                                                   |
      +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------+

       

      Device Info:

      $ oc get openstackversions.core.openstack.org
      NAME                    TARGET VERSION       AVAILABLE VERSION    DEPLOYED VERSION
      openstackcontrolplane   18.0.11-20250812.2   18.0.11-20250812.2   18.0.11-20250812.2

       

      02:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      03:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      04:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      05:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      24:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      25:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      26:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      27:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      64:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      65:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      66:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      67:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      84:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      85:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      c3:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      c4:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      c5:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      c6:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      e3:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      e4:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      e5:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)
      e6:00.0 Non-Volatile memory controller: SK hynix PE81x0 U.2/3 NVMe Solid State Drive (rev 21)

       

      I have unsuccessfully attempted to mitigate the issue using several extraconfig:

      1st attempt:

      $ oc get openstackcontrolplanes.core.openstack.org openstackcontrolplane -oyaml | yq -rC '.spec.placement.template.customServiceConfig' | sed 's/\\n/\n/g'
      [placement]
      randomize_allocation_candidates = True
      max_allocation_candidates = 100000
      allocation_candidates_generation_strategy = breadth-first
      
      $ oc exec -c placement-api pod/placement-6484d89798-wflxw -- cat /etc/placement/placement.conf.d/custom.conf
      [placement]
      randomize_allocation_candidates = True
      max_allocation_candidates = 100000
      allocation_candidates_generation_strategy = breadth-first

      2nd attempt

      $ oc get openstackcontrolplanes.core.openstack.org openstackcontrolplane -oyaml | yq -rC '.spec.placement.template.customServiceConfig' | sed 's/\\n/\n/g'
      [placement]
      max_allocation_candidates = -1
      allocation_candidates_generation_strategy = depth-first
      
      $ oc exec -c placement-api pod/placement-7fbb5fcd-h58zf -- cat /etc/placement/placement.conf.d/custom.conf
      [placement]
      max_allocation_candidates = -1
      allocation_candidates_generation_strategy = depth-first

      3rd attempt

      $ oc get openstackcontrolplanes.core.openstack.org openstackcontrolplane -oyaml | yq -rC '.spec.placement.template.customServiceConfig' | sed 's/\\n/\n/g'
      [placement]
      max_allocation_candidates = 50
      allocation_candidates_generation_strategy = breadth-first
      
      
      $ oc exec -c placement-api pod/placement-6fdbdb464-5lcfp -- cat /etc/placement/placement.conf.d/custom.conf
      [placement]
      max_allocation_candidates = 50
      allocation_candidates_generation_strategy = breadth-first

       

      Bug impact

      This impacts the platform release and customer on-boarding, both scheduled in the next few days.

        1. nova-scheduler-extract.log
          66 kB
        2. placement-77747679fd-8cv8w.log
          9.93 MB
        3. placement-77747679fd-lv67w.log
          44.10 MB
        4. placement-77747679fd-nn47m.log
          33.68 MB
        5. placement-api-containers-live.log
          27.50 MB
        6. placement-55c9549974-v2qwn.log
          41.48 MB
        7. placement-55c9549974-5dxtw.log
          38.58 MB
        8. placement-55c9549974-sg9vs.log
          35.85 MB
        9. scaled_pci_devices_system_fault
          44 kB

              rhn-gps-jparker James Parker
              rh-ee-mapiccin Matteo Piccinini
              James Parker James Parker
              rhos-workloads-compute
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: