Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-6335

Certain 17 compute hostname configurations cannot be adopted into 18

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • rhos-18.0.0
    • None
    • edpm-ansible
    • None
    • Important

      The current 18 logic uses the canonical_hostname ansible variable as the source of truth about the name of both the compute service host (the machine where the nova-compute service runs) and the compute node (the hypervisor). See https://github.com/openstack-k8s-operators/edpm-ansible/pull/587
      The 18 deployment ensures that both libvirt and the nova-compute sees the same hostname even though they are querying the name differently and therefore the compute service hostname and the nodename are the same. Libvirt relys on hostname -f, while nova compute has configured directly with the host=<canonical_hostname>.

      There was an assumption that in 17 the situation is the same so both nova and libvirt uses the same name. See https://issues.redhat.com/browse/OSPRH-5197

      However we recently see 17 configurations in CI where the hostname and the nodename is differently configured:

      [core@crc-pjmnl-master-0 ~]$ oc rsh openstackclient
      sh-5.1$ openstack compute service list
      +--------------------------------------+----------------+--------------------------+----------+---------+-------+----------------------------+
      | ID                                   | Binary         | Host                     | Zone     | Status  | State | Updated At                 |
      +--------------------------------------+----------------+--------------------------+----------+---------+-------+----------------------------+
      | 0d007ee0-5ec1-4ea2-89ab-3d65fd92d51e | nova-conductor | nova-cell0-conductor-0   | internal | enabled | up    | 2024-04-16T08:22:48.000000 |
      | 320f339c-e481-46d6-b89e-56320aa26b51 | nova-scheduler | nova-scheduler-0         | internal | enabled | up    | 2024-04-16T08:22:43.000000 |
      | f0e967b7-9eb2-421e-afab-83d1e1118a10 | nova-compute   | np0004604836.localdomain | nova     | enabled | down  | 2024-04-15T12:01:42.000000 |
      | 075eaab0-ec50-4b44-9f38-7929172e54cc | nova-conductor | nova-cell1-conductor-0   | internal | enabled | up    | 2024-04-16T08:22:51.000000 |
      +--------------------------------------+----------------+--------------------------+----------+---------+-------+----------------------------+
      sh-5.1$ openstack hypervisor list
      +--------------------------------------+-----------------------------------+-----------------+-----------------+-------+
      | ID                                   | Hypervisor Hostname               | Hypervisor Type | Host IP         | State |
      +--------------------------------------+-----------------------------------+-----------------+-----------------+-------+
      | 2f79093b-aad3-456d-84e8-eeab5f657cd2 | np0004604836.ctlplane.localdomain | QEMU            | 192.168.122.106 | down  |
      +--------------------------------------+-----------------------------------+-----------------+-----------------+-------+
      
      [root@np0004604836 ~]# hostname
      np0004604836.ctlplane.localdomain
      [root@np0004604836 ~]# cat /etc/hostname
      np0004604836.ctlplane.localdomain
      [root@np0004604836 ~]# podman exec -it nova_compute /bin/bash
      bash-5.1$ virsh hostname
      Error registering authentication agent: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: Cannot determine user of subject (polkit-error-quark, 0)
      np0004604836.ctlplane.localdomain
      bash-5.1$ egrep -e "^host" /etc/nova/nova.conf 
      host=np0004604836.localdomain
      [root@np0004604836 ~]# podman exec -it ovn_metadata_agent /bin/bash
      bash-5.1$ egrep -e '^host' /etc/neutron/ -R
      /etc/neutron/neutron.conf:host=np0004604836.localdomain
      [root@np0004604836 ~]# egrep -e '^host' /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf 
      host=np0004604836.localdomain
      

      The current pre-adoption-validation code detects that our assumption about the names does not hold for this compute and preventing the adoption.

      Is this a valid 17 configuration which we should be able to adopt? If yes the we cannot simply use the canonical_hostname as the source of truth for both hostname and nodename during adoption and both the 18 deployment code and the pre-adoption validation code needs to be changed.

              rh-ee-bgibizer Balazs Gibizer
              rh-ee-bgibizer Balazs Gibizer
              rhos-dfg-upgrades
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: