Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-13009

BZ#2312197 Conflict with default values for live_migration_permit_post_copy and live_migration_permit_auto_converge

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Committed
    • rhos-docs
    • None

      Description of problem:

      The default values set by THT on live_migration_permit_post_copy and live_migration_permit_auto_converge are in conflict.

      Version-Release number of selected component (if applicable):
      17.1
      16.2

      How reproducible:
      Default behavior

      Steps to Reproduce:
      1. Deploy RHOSP
      2. Check values of both parameters in any compute host:
      ~~~
      $ sudo egrep "^[libvirt|^live_migration_permit" /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf
      [libvirt]
      live_migration_permit_post_copy=True
      live_migration_permit_auto_converge=True
      ~~~

      Actual results:
      Both parameters are set to True by default:

      ~~~
      $ egrep -nA9 "NovaLiveMigrationPermitPostCopy:$|NovaLiveMigrationPermitAutoConverge:$" /usr/share/openstack-tripleo-heat-templates/deployment/nova/nova-compute-container-puppet.yaml
      347: NovaLiveMigrationPermitPostCopy:
      348- description: >
      349- If "True" activates the instance on the destination node before migration is complete,
      350- and to set an upper bound on the memory that needs to be transferred. Post copy
      351- gets enabled per default if the compute roles is not a realtime role or disabled
      352- by this parameter.
      353- default: true
      354- type: boolean
      355- tags:
      356- - role_specific
      357: NovaLiveMigrationPermitAutoConverge:
      358- description: >
      359- Defaults to "True" to slow down the instance CPU until the memory copy process is faster than
      360- the instance's memory writes when the migration performance is slow and might not complete.
      361- Auto converge will only be used if this flag is set to True and post copy is not permitted
      362- or post copy is unavailable due to the version of libvirt and QEMU.
      363- default: true
      364- type: boolean
      365- tags:
      366- - role_specific
      ~~~

      Note however, that auto_converge is only used if post copy is not permitted or unavailable, which is not the case in our default configuration.

      Expected results:

      Either post_copy or auto_converge should be enabled, not both at the same time. Based on the findings on the related BZ#2312196, I'm inclined to think that post_copy should be disabled by default, as auto_converge performs way better, especially on workloads that are under heavy memory pressure.

      Additional info:

      https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.live_migration_permit_auto_converge
      https://github.com/openstack/nova/blob/8a24acd9240f2a2705ccd979577e0e2338a238ef/nova/virt/libvirt/driver.py#L1022-L1029

              joflynn@redhat.com Joanne O'Flynn
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-docs@redhat.com rhos-docs@redhat.com
              rhos-workloads-compute
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: