Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-13143

confirm resize fails with CPUUnpinningInvalid when resizing to the same host

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Can't Do
    • Icon: Normal Normal
    • None
    • None
    • openstack-nova
    • Important

      Description of problem:

      One of our customers in Japan was faced with the same problem as an upstream bug[1].
      The customer asked us to fix this problem in OSP17.1 because they need to set `allow_resize_to_same_host = True` in their environment.
      Would it be possible to fix this in OSP17.1?

      The following description is a copy from [1].

      ~~~
      This is very similar to https://bugs.launchpad.net/nova/+bug/1944759 (which should be fixed already) but still happens when resizing to the same host.

      reproduction:

      fresh single node devstack/master (Nova commit b5029890c1c5b1b5153c9ca2fc9a8ea2437f635d)

      in nova-cpu.conf I set (have 4 vcpus in my devstack VM)

      [DEFAULT]
      allow_resize_to_same_host = True # already set by default on a single node devstack
      update_resources_interval = 20 # to increase chances of a race

      [compute]
      cpu_shared_set = 0
      cpu_dedicated_set = 1-3

      create 2 flavors with 1 and 2 pinned cpu each, and start resizing (and confirming) a cirros-based instance between them back and forth.

      Some times the resize confirm fails with

      Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "a3b3ecbe-2039-42fb-8365-da12e3c93bae" acquired by "nova.compute.manager.ComputeManager.confirm_resize.<lo
      cals>.do_confirm_resize" :: waited 0.000s (pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386
      Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Going to confirm migration 33 {{(pid=136855) do_confirm_resize /opt/
      stack/nova/nova/compute/manager.py:4287}}
      Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Acquired lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.8
      /dist-packages/oslo_concurrency/lockutils.py:294}}
      Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Building network info cache for instance {{(pid=136855) _get_instanc
      e_nw_info /opt/stack/nova/nova/network/neutron.py:1997}}
      Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'info_cache' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr /opt/st
      ack/nova/nova/objects/instance.py:1099}}
      Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Instance cache missing network info. {{(pid=136855) _get_preexisting
      _port_ids /opt/stack/nova/nova/network/neutron.py:3300}}
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Updating instance_info_cache with network_info: [] {{(pid=136855) up
      date_instance_cache_with_nw_info /opt/stack/nova/nova/network/neutron.py:117}}
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Releasing lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.
      8/dist-packages/oslo_concurrency/lockutils.py:312}}
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'migration_context' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr
      /opt/stack/nova/nova/objects/instance.py:1099}}
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" acquired by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source"
      :: waited 0.000s (pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" "released" by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source
      " :: held 0.037s (pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Confirm resize failed on source host master-dsvm. Resource allocatio
      ns in the placement service will be removed regardless because the instance is now on the destination host master-dsvm. You can try hard rebooting the instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subs
      et of pinned CPU set [2, 3]
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Traceback (most recent call last):
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/manager.py", line 4316, in do_confirm_resize
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._confirm_resize(
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/manager.py", line 4401, in _confirm_resize
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self.rt.drop_move_claim_at_source(context, instance, migration)
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 391, in inner
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] return f(*args, **kwargs)
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 563, in drop_move_claim_at_source
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._drop_move_claim(
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 638, in _drop_move_claim
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._update_usage(usage, nodename, sign=-1)
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1321, in _update_usage
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] cn.numa_topology = hardware.numa_usage_from_instance_numa(
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/virt/hardware.py", line 2476, in numa_usage_from_instance_numa
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] new_cell.unpin_cpus(pinned_cpus)
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/objects/numa.py", line 106, in unpin_cpus
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] raise exception.CPUUnpinningInvalid(requested=list(cpus),
      Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subset of pinned CPU set [2, 3]

      full log snippet is at https://paste.opendev.org/show/biKlHnGI4PPt451riHXn/
      ~~~

      [1] https://bugs.launchpad.net/nova/+bug/1961188

              Unassigned Unassigned
              jira-bugzilla-migration RH Bugzilla Integration
              RH Bugzilla Integration RH Bugzilla Integration
              rhos-dfg-compute
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: