Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-13064

Provide a workaround for NVMe disk cleanup in 18.0

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None

      Goal

      Describe existing Nova interfaces that can be used to implement an external service handling the cleanup.

      Assumptions

      • NVMe disks are passed through to the VMs by passing through the NVMe controller PCI device from the hypevisor.
      • RHOSO 18.0 is deployed and PCI passtrhough is configured for the NVMe controller PCI devices
      • PCI in Placement feature is enabled in RHOSO. (In tech preview status as of 18.0-FR1)
      • The external cleanup tool has access to the hypervisor and the OpenStack APIs as admin
      • Only VM create and delete needs to be handled for now, no migrations, resize, evacuation etc.

      Suggested workaround

      High level steps:

      • Detect the creation and deletion of VMs using NVMe device(s) via Nova notifications
      • Reserve the NVMe PCI device via the Placement API when a VM is created allocating the device
      • Wipe the NVMe device after the VM is deleted and then unreserve the PCI device via the Placement API

      Note that the doc below uses pure Nova config options for simplicity. These need to be translated to RHOSO 18.0 configuration.

      Recap for PCI passthrough configuration

      nova compute conf:

        [pci]
        device_spec = { "vendor_id":"2646", "product_id":"5013", "device_type": "type-PCI"}
        alias = { "name": "nvme-type-1", "vendor_id":"2646", "product_id":"5013", "device_type": "type-PCI"}
      

      nova api conf

        [pci]
        alias = { "name": "nvme-type-1", "vendor_id":"2646", "product_id":"5013", "device_type": "type-PCI"}
      

      nova flavor:

        $ openstack flavor show m1.nvme1 | grep properties
        | properties                 | pci_passthrough:alias='nvme-type-1:1' |
      

      Recap enable PCI in Placement

      Follow the documentation in https://docs.openstack.org/nova/latest/admin/pci-passthrough.html#pci-tracking-in-placement

      nova compute conf:

      [pci]
      report_in_placement = True
      

      nova api conf:

      [filter_scheduler]
      pci_in_placement = True
      

      Detect VM creation and deletion via Nova notifications

      Configure Nova to emit notification to the message bus

      nova compute conf:

        [oslo_messaging_notifications]
        driver = messagingv2
        transport_url = <rabbitmq address>
        notification_format = versioned   # or both if other tools are also using the notifications and cannot use the new versioned format
      

      Listen on the message bus for notifications

      Example python code to connect and listen for notifications

      • the instance.create.end notification is sent after the VM scheduled to a compute host (hypevisor) and nova allocated the requested resources in Placement API for the VM. After this nova will start the VM on the compute host.
      • the instance.delete.end notification is sent after the VM is stopped, and deleted from the hypervisor and nova removed the resource allocation of the VM from the Placement API.

      For example and instance.delete.end will look like this:

      {"message_id": "1323c4b9-bca0-453c-b700-c226769552d1", "publisher_id": "nova-compute:aio", "event_type": "instance.delete.end", "priority": "INFO", "payload": {"nova_object.name": "InstanceActionPayload", "nova_object.namespace": "nova", "nova_object.version": "1.8", "nova_object.data": {"fault": null, "request_id": "req-5847f682-e4dc-44ed-8473-71403424d114", "uuid": "a81880d0-e1f3-4195-8785-9078c899f69e", "user_id": "7e9f6361d07d41b8bd0d2a133c1d5d48", "tenant_id": "82cec4de18334e79b39916d53c3fdaab", "reservation_id": "r-9uplxpf3", "display_name": "vm1", "display_description": null, "host_name": "vm1", "host": "aio", "node": "aio", "os_type": null, "architecture": null, "availability_zone": "nova", "flavor": {"nova_object.name": "FlavorPayload", "nova_object.namespace": "nova", "nova_object.version": "1.4", "nova_object.data": {"flavorid": "9e4b2a6d-5239-4269-a86d-febeb6400505", "memory_mb": 2048, "vcpus": 1, "root_gb": 4, "ephemeral_gb": 0, "name": "m1.nvme1", "swap": 0, "rxtx_factor": 1.0, "vcpu_weight": 0, "disabled": false, "is_public": true, "extra_specs": {"pci_passthrough:alias": "nvme-type-1:1"}, "projects": null, "description": null}}, "image_uuid": "505d3021-b162-4ca2-a83c-f86637de2d31", "key_name": null, "kernel_id": "", "ramdisk_id": "", "created_at": "2024-11-13T13:34:05Z", "launched_at": "2024-11-13T13:34:18Z", "terminated_at": "2024-11-13T14:34:45Z", "deleted_at": "2024-11-13T14:34:48Z", "updated_at": "2024-11-13T14:34:46Z", "state": "deleted", "power_state": "pending", "task_state": null, "progress": 0, "ip_addresses": [], "block_devices": [], "metadata": {}, "locked": false, "auto_disk_config": "MANUAL", "action_initiator_user": "7e9f6361d07d41b8bd0d2a133c1d5d48", "action_initiator_project": "82cec4de18334e79b39916d53c3fdaab", "locked_reason": null}}, "timestamp": "2024-11-13 14:34:49.430346"}
      

      The interesting bits are:

      • FlavorPayload with the extra_spec "pci_passthrough:alias": "nvme-type-1:1" from this the external tool can detect if a VM was using an NVMe device type (pci alias).
      • The field "node": "aio" tells the external tool which compute host the VM was running
      • The "uuid": "a81880d0-e1f3-4195-8785-9078c899f69e" tells which VM was deleted.

      Reserve the PCI device in Placement

      To prevent nova to re-assign a PCI device to the next VM before the cleanup can happen the external tool needs to reserve the PCI resource in Placement.

      • Based on the instance.create.end notification the external tool can detect if the VM uses a flavor that has PCI alias that matches an NVMe PCI device.
      • If so the external tool can look up the allocation for of VM in Placement. The Placement consumer uuid is the VM uuid from the notification. Based on the resource class the external tool can find which resource providers are representing NVMe PCI devices and allocated for the VM. The name of the resource provider encodes both the hostname of the nova compute host the VM is scheduled to and the PCI address of the NVMe device.
      • On each of these resource providers the external tool needs change the reserved value of the Placement resource inventory from 0 to 1.

      Wipe the device at VM deletion and unreserve the device

      The external tool can detect when a VM is deleted via the instance.delete.end notification. It can use its existing information about the devices reserved in Placement for this VM to know which devices on which host needs to be cleaned.

      After the tool finished cleaning the device it needs to go back to Placement API and change the reserved value in the inventory from 1 to 0 to signal that the device can be assigned again to the next VM.

      Dependencies

      • RHOSO 18.0.5 to have the workaround to support many PCI devices with PCI in Placement: OSPRH-12962
      • RHOSO 18.0-FR3 is planned to graduate PCI in Placement from Tech preview to Supported: OSPRH-13106
      • RHOSO 18.0-FR3 is planned to support configuring Nova notification message bus via the standard OpenStackControlPlane CR interface: OSPRH-230

      Not covered aspects

      • how to deploy the external service on top of RHOSO 18.0
      • what is the exact RHOSO 18.0 configuration procedure to enable PCI in Placement and Notifications. (only pure Nova config options are provided above)

      References

              rh-ee-bgibizer Balazs Gibizer
              rh-ee-bgibizer Balazs Gibizer
              rhos-dfg-compute
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: