Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-364

Support VM migration between compute nodes

XMLWordPrintable

    • 2023Q4

      Cold migration, live migration and resize support

      High level requirements

      • ssh access between compute hosts
        • qemu needs it to transfer guest memory state during live migration
        • nova needs it to transfer local disk content during cold migration (and resize)
      • a way to ensure proper file access rights for the transferred data
        • when nova-compute copies local disk content from the source to the target node then the nova-compute service and the libvirtd on the target host needs to get access rights to the copied disk.
      • limit the ssh access of the nova user on the compute node

      Greenfield deployment

      • We need a known_hosts file so the remote host key is known at the time of the initiation of a connection and therefore no human interaction is needed to accept the remote host key. The edpm_ssh_known_hosts ansible role already part of the configure_os service. But it has two limitations:
        • the role only distributes ssh-rsh host keys and the ssh client prefers ecdsa host keys. We need to extend the role to distribute those key type as well.
        • the role is only run on a given DataPlaneNodeSet and therefore nodes in different NodeSets will not know each others identity. We need add a feature for the dataplane-operator to allow executing specific service deployments on a merged inventory of all the existing NodeSets. This is tracked in OSPRH-2492
      • We need the nova user to exists on the EDPM node with the proper access rights (user/group) to be able to copy data to the /etc/nova/instances directory
      • We need an SSH key-pair for the nova user
      • We need to restrict the nova user SSH access on the EDPM node as much as possible while we try to avoid replicating the nova_migration_target container and the nova migration helper from tripleo
      •   The public key of the nova key-pair needs to be added to the authorized_keys file of the nova user on the EDPM node while the private key needs to be accessible in the nova users $HOME/.ssh directory.

      Adoption

      • The known_hosts file will be regenerated during compute adoption so if any reasons (e.g. switching from RSA to ED25519 host keys due to FIPS) the host identity changes then adoption will handle that.
      • There is a nova migration used by tripleo in the from state. We can extract that from the tripleo configuration and reuse it as the nova key-pair during adoption. Alternatively we can choose to regenerate the key, but it is not strictly needed. (enabling FIPS my require new keys due to key type changes)
      • The nova user creation and SSH access restriction and key distribution is expected to be the same as in the greenfield case
      • After adoption the nova_migration_target container needs to be deleted from the EDPM node.

      Scale out

      • The edpm_ssh_known_hosts needs to handle scale out to update the known_hosts file on every existing EDPM node
      • The new node needs to use the existing nova key-pair same way as in the greenfield case.

      Key rotation

      • We only consider the key rotation of the nova user. The key rotation of the provisioning key is out of scope.
      • A new key-pair can be generated and configured for the deployment in k8s and a new ansible run (triggered by a new Deployment CR) will add the new public key to the nova user's authorized_keys file and the private key to the $HOME/.ssh directory.
      • Special care should be taken to remove the old key-pair from the EDPM nodes after the new key-pair is distributed to every node.

      Future improvements

      Unique nova key-pair per EDPM node

      The current proposal uses a single key-pair for the nova users on every EDPM compute node. This is bad security practices. A better solution would be to use unique key-pairs per EDPM compute node. However this requires complicated key management when a new compute is provisioned as the new nova public key of that node needs to be distributed to every existing EDPM compute node.

      Moreover tripleo today uses a single key-pair for nova migration support too, so the current proposal is not a regression compared to 17.1.

      There is a separate discussion to change the SSH access scheme to use certificates instead of key-pairs. If that is implemented then public key redistribution at scale out might be avoidable.

      Out of Scope: Ensure chain of trust when collecting host identities

      There is a need to trust the host keys of the nodes. Today we collect them via the provisioning ssh key-pair while host key verification is turned off. This is not secure. However fixing this is out of scope of this story.

      During the discussion we foresee that this could be fixed by using pre-generated host key in baremetal-operator and using secure facilities in ironic to transfer the pre-generated host key via SSL to the node being deployed.

      Out of Scope: Use certificate based host verification and user authentication for SSH

      This could simplify host key verification and key rotation use cases but it needs a separate study to identify impacts on various areas of the system (bare metal greenfield deployment, adoption, cert rotation)

              rh-ee-bgibizer Balazs Gibizer
              rh-ee-bgibizer Balazs Gibizer
              rhos-dfg-compute
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: