Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-16536

[RHOSO 18] - missing dns domain in edpm network_config will cause live migration failures

XMLWordPrintable

    • EDPM Sprint 2, EDPM Sprint 3, EDPM Sprint 4, EDPM Sprint 5
    • 4
    • Important

      This is kind of an ugly issue and I'm not sure of a fix other than re-deploying the edpm nodes (or db surgury). Hopefully there is an easier solution.

      New RHOSO 18 environment - deploys successfully and runs VM instances fine. However, live migration fails with the following:

      2025-05-05T14:40:01.747623000Z libvirt.libvirtError: operation failed: job 'migration out' failed: address resolution failed for compute-node5:61152: Name or service not known
      2025-05-05T14:40:01.810586000Z 2025-05-05 14:40:01.810 2 DEBUG nova.virt.libvirt.driver [None req-6ea75b73-61e0-4068-80e1-3aca268504b5 121d3d39bd634def98c0f3e824b52570 7df579c77c2e4141acd10edca9045c97 - - default default] [instance: 7c874c2e-200d-4c94-94f1-a191330a41ff] Live migration monitoring is all done _live_migration /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:10947

      Upon further inspection, the computes are registered with short names where normally they would be <name>.ctlplane.<domain>.

      1. oc rsh openstackclient openstack hypervisor list
        -----------------------------------------------------------------------------------------
        ID Hypervisor Hostname Hypervisor Type Host IP State

        -----------------------------------------------------------------------------------------

        27e040bd-7907-4475-9fbb-89852bd27ccc compute-node1 QEMU x.x.x.x up
        813323dd-dacc-42c9-b04e-367fb46ac0e6 compute-node2 QEMU x.x.x.x down
        efce87fb-c458-40fc-acda-b49d5c4ac800 compute-node4 QEMU x.x.x.x down
        41b167fc-988f-4660-8f09-d77ef635476d compute-node6 QEMU x.x.x.x down
        2ed12ccc-3301-4d05-888f-61a9330419ac compute-node5 QEMU x.x.x.x up
        88a64875-6de1-4ed5-b6d3-dca33369ae9b compute-node3 QEMU x.x.x.x up
        00b64641-4951-4c69-accf-3ae21b20de6f compute-node7 QEMU x.x.x.x down

        -----------------------------------------------------------------------------------------

      The resolv.conf has no dns domain and the system has no fqdn (hostname -f).

      compute-node6]$ cat etc/resolv.conf

      1. Generated by NetworkManager
        nameserver x.x.x.x

      $ cat sos_commands/host/hostname_-f
      compute-node6

      The underlying cause here is missing dns domain config in the OpenStackDataPlaneNodeSet (from example in step 9 of this procedure: https://docs.redhat.com/en/documentation/red_hat_openstack_services_on_openshift/18.0/html/deploying_red_hat_openstack_services_on_openshift/assembly_creating-the-data-plane#proc_creating-an-OpenStackDataPlaneNodeSet-CR-with-preprovisioned-nodes_dataplane)

      network_config:

      • type: ovs_bridge
        name: {{ neutron_physical_bridge_name }}
        mtu: {{ min_viable_mtu }}
        use_dhcp: false
        dns_servers: {{ ctlplane_dns_nameservers }}
        domain: {{ dns_search_domains }} #<<<<<<MISSING

      After adding this config back nova_compute will fail to restart with this error:

      May 8 09:21:12 compute-node6 nova_compute[94947]: 2025-05-08 09:21:12.462 2 ERROR oslo_service.service nova.exception.InvalidConfiguration: My compute node 41b167fc-988f-4660-8f09-d77ef635476d has hypervisor_hostname compute-node6 but virt driver reports it should be compute-node6.ctlplane.<DOMAIN>. Possible rename detected, refusing to start!

      Removing and re-adding the compute node in nova does not work as there are protections to prevent hostname changes or re-registering computes with the same UUID it seems.

      1. Is there a way to fix this without redeploying the EDPM node or a manual DB update?

      2. A user customizing the network_config may not realize how important this config is if it cannot be fixed post deployment. The missing config should result in a deployment failure IMO.

      Thanks for looking at this issue.

              jslagle@redhat.com James Slagle
              mflusche@redhat.com Mathew Flusche
              rhos-dfg-df
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: