Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7047

[RFE] expose public interface to trigger qemu announce-self

    • Normal
    • sst_virtualization
    • ssg_virtualization
    • 5
    • False
    • Hide

      None

      Show
      None
    • Red Hat OpenStack Platform
    • If docs needed, set a value

      Description of problem:

      Currently when libvirt/qemu are used with ovn and openstack there is excessive packet loss after a live migration.
      https://bugzilla.redhat.com/show_bug.cgi?id=1903653

      This is caused by the fact openstack programs the requested chassis in the ovn to which limits ovn to only install openflow rule on the host listed in the requeted chassis. with out the requested chassis set the ovn southd on both the souce and dest host will fight over which chassis the port is currently on (its actully on both during the migration) and that put load on the ovn database and cause flows to be installed and reinstalled every time it changes.

      as a result we cannot remove the use of requested chassis to allow the flow rules to be installed on the destination host and we cant update the requested chassis before we start the migration as that would remove the flow rules, disconnecting the vm.

      As a result the RARP packets sent by QEMU when libvirt unpauses the vm on the destination host are currently lost and the vms mac is not updated on top of rack switchs until it send a packet.

      one mitigation for this until OVN can be enhanced to support live migration
      https://bugzilla.redhat.com/show_bug.cgi?id=2012179
      is to use the announce_self qemu monitor command to trigger the sending of RARP packets after OVN has installed the flows however that will taint the vm.

      To that end we would like this to be exposed via a public libvirt api instead of relying on the internal qemu monitor command interface.

      Version-Release number of selected component (if applicable):

      How reproducible:

      This is cause by complex race between nova notifiying ovn via neturon that the vm is now running on the destination host and qemu sendign the 3 RARP packets on vm unpasue at the completion of the live migration.

      nova only starts this chain of notification once it received the migration_complete or post_copy_pause event form libvirt.

      the former always happens after the vm has started running on the destination and the latter event only give nova a slight advatage to win the race so typically openstack will lose the race to install the flows before the final RARP packet is sent unless the system i effectively entirely idle.

      This means the race will not always cause packet loss on development/test environment but happen much more often in production deployments.

      Steps to Reproduce:

      1. boot a vm with ovs networking and oepnflow rules programed by ovn
      2. start a tcpdump on the destion host to track the sendign of the RARP packets
      3. start a live migration in libvirt and wait for it to compelte
      4 updated the request chassis with the destionat host name.

      ^ realistically this is not relevent to this RFE request but that is how you would simulate the race without actually deploying openstack ovn ectra.

      Actual results:

      The RARP packets are lost because the flow rules are not installed at the time they are sent.

      Expected results:

      The RARP packets are lost because the flow rules are not installed at the time they are sent.

      we expect this to fail because we have not told ovn to install the flow until after the vm is running so we need a way to resend them once the flows are installed.

      Additional info:

      This is clearly an RFE request to workaround a lack fo a feature in OVN.
      We have customer that will be impacted by this on osp 13 and 16 which are based on rhel 7 and 8.2/8.4 respectively.

      while it would be ideal to backport this public api to rhel 7 we woudl like to use the qemu monitor command directly on older verison of libvirt instead to avoid the need for a libvirt backport and we can adapt to the new public interface when that is available.

            laine@redhat.com laine@redhat.com
            smooney@redhat.com Sean Mooney
            laine@redhat.com laine@redhat.com
            Yanqiu Zhang Yanqiu Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: