Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-13148

Cinder volume stuck in detaching on error exception

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • openstack-nova
    • False
    • False
    • None
    • Undefined

      Description of problem:

      Hi,

      At a customer site, we are running into a complex chain of timeouts upon volume attachment deletion. For their backup solution, the customer creates an instance, then creates volume snapshots, attached these snapshots to the backup instance. After their backups complete, they then proceed with snapshot detachment and deletion.

      The volume detachment regularly times out in the customer environment, leaving their backup instances in an inconsistent state. The problem is that the attachment delete operation takes longer than the various timeouts in OpenStack.

      On the nova compute nodes, we see 504 timeouts which come from haproxy or httpd wsgi integration, depending on the configured timeouts:
      ~~~
      2021-01-19 01:02:09.674 7 ERROR nova.volume.cinder [req-9febe6f3-d031-4ff2-afb8-0be032b79808 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Delete attachment failed for attachment 81 [details]cb9dd7-b20d-4ee8-b41d-65fbb81d3aa3. Error: Gateway Timeout (HTTP 504) Code: 504: cinderclient.exceptions.ClientException: Gateway Timeout (HTTP 504)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server [req-9febe6f3-d031-4ff2-afb8-0be032b79808 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Exception during message handling: cinderclient.exceptions.ClientException: Gateway Timeout (HTTP 504)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in _exit_
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server self.force_reraise()
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server raise value
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1346, in decorated_function
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 215, in decorated_function
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in _exit_
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server self.force_reraise()
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server raise value
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 203, in decorated_function
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 6002, in detach_volume
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server do_detach_volume(context, volume_id, instance, attachment_id)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 6000, in do_detach_volume
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server attachment_id=attachment_id)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 5951, in _detach_volume
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server attachment_id=attachment_id, destroy_bdm=destroy_bdm)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/block_device.py", line 446, in detach
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server attachment_id, destroy_bdm)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/block_device.py", line 427, in _do_detach
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server volume_api.attachment_delete(context, self['attachment_id'])
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/volume/cinder.py", line 398, in wrapper
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server res = method(self, ctx, *args, **kwargs)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/volume/cinder.py", line 448, in wrapper
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server res = method(self, ctx, attachment_id, *args, **kwargs)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/volume/cinder.py", line 882, in attachment_delete
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server 'code': getattr(ex, 'code', None)})
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in _exit_
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server self.force_reraise()
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server raise value
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/volume/cinder.py", line 875, in attachment_delete
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server attachment_id)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/api_versions.py", line 407, in substitution
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return method.func(obj, *args, **kwargs)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/v3/attachments.py", line 44, in delete
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return self._delete("/attachments/%s" % base.getid(attachment))
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/base.py", line 336, in _delete
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server resp, body = self.api.client.delete(url)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/client.py", line 209, in delete
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return self._cs_request(url, 'DELETE', **kwargs)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/client.py", line 191, in _cs_request
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return self.request(url, method, **kwargs)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/client.py", line 177, in request
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server raise exceptions.from_response(resp, body)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server cinderclient.exceptions.ClientException: Gateway Timeout (HTTP 504)
      2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server
      ~~~

      We are currently trying to work around this with the following steps.
      -----------------------------------------------------------------------------------------------------------------------------------------

      Summary of all actions so far - if we have success with this, we will have to tweak this with OpenStack Director, as well
      =====================================================================================================================================

      i) Increasing the rabbitmq timeout for cinder-api from 60 seconds to 300 seconds to longer would be a possible avenue.
      ~~~
      [$ egrep '^[|rpc_response_timeout' /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf | grep rpc_resp -B1
      [DEFAULT]
      #rpc_response_timeout = 60
      ~~~

      Can you run the following on all controllers:
      ~~~
      crudinit --set /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf DEFAULT rpc_response_timeout = 300
      podman ps | grep cinder | awk '

      {print $NF}' | xargs podman restart
      ~~~

      ii)
      On all controllers, edit file /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg
      Increase the default timeout from 2 minutes to 10 minutes for client and server:
      ~~~
      defaults
      log global
      maxconn 4096
      mode tcp
      retries 3
      timeout http-request 10s
      timeout queue 2m
      timeout connect 10s
      timeout client 10m
      timeout server 10m
      timeout check 10s
      ~~~

      Then, run the haproxy container:
      ~~~
      podman ps | grep haproxy-bundle | xargs podman restart
      ~~~

      iii) Raise the httpd / wsgi timeouts to 300 seconds for cinder:

      Edit:
      /var/lib/config-data/puppet-generated/cinder/etc/httpd/conf/httpd.conf

      Set:
      ~~~
      Timeout 300
      ~~~

      Then, restart all cinder services:
      ~~~
      podman ps | grep cinder | awk '{print $NF}

      ' | xargs podman restart
      ~~~
      -----------------------------------------------------------------------------------------------------------------------------------------

      Our goal is to fix the actual timeout issue with the above steps (or some further steps).

      However, hitting an Exception in volume_api.attachment_delete has a nasty side effect, and I'm filing this BZ to have that side effect addressed. Nova is not handling these exceptions correctly and nova's DB ends up with an inconsistent state with regards to the instance XML. Any new volume attachment operation to the affected instance will fail due to this inconsistency.
      Nova should reconcile its database and the instacne XML even when the aforementioned exception is thrown due to timeout issues. An upstream bug was already reported for this: https://bugs.launchpad.net/nova/+bug/1888665

      I acknowledge that the problem must be fixed by adjusting the timeouts of all involved OpenStack components so that we account for potentially slow volume attachment deletion. However, the customer's database is regularly left in a messy state when these timeouts are hit because nova simply gives up during the detach and does not reconcile the instance XML and nova DB after the Exception is thrown.
      By the way, the detach eventually succeeds on the backend. I don't remember why exactly, but the instance XML and nova's database do not match.
      Subsequently, when the backup solution wants to attach new volumes to the affected instance, nova tries to attach volumes to the instance with device names that already exist for the libvirt instance. The instance XML contains invalid volume names (those of the volumes for which the attachment deletion succeeds on the volume backend, but for which the detachment operation timed out and threw that Exception in nova compute).

      Here is one of the symptoms that we see further down the road in the customer's environment, and our customer must manually clean up the broken instance state to fix this:

      ------------------------------------------------------------------------------------------------------------------------------------------

      VSA VM hitting issue of reattaching volumes:

      ----------------------------------------------------------------------------------------------------+

      Field Value

      ----------------------------------------------------------------------------------------------------+

      OS-DCF:diskConfig AUTO
      OS-EXT-AZ:availability_zone nova
      OS-EXT-SRV-ATTR:host pod-compute-017.subdomain.example.com
      OS-EXT-SRV-ATTR:hypervisor_hostname pod-compute-017.subdomain.example.com
      OS-EXT-SRV-ATTR:instance_name instance-00001f2a
      OS-EXT-STS:power_state Running
      OS-EXT-STS:task_state None
      OS-EXT-STS:vm_state active
      OS-SRV-USG:launched_at 2020-09-16T11:29:40.000000
      OS-SRV-USG:terminated_at None
      accessIPv4  
      accessIPv6  
      addresses backupprovider-Backup=10.80.123.102
      config_drive  
      created 2020-09-16T11:29:14Z
      flavor subdomain-16vCPU-64GBMemory (d91b618d-61bf-43e6-8ff5-19eafeb06a81)
      hostId e0d387cb30bafa2993744a5ab6c98f7d4ddf6a4a4411f086dcf2f31a
      id 935cc10d-604f-4f19-9222-d217f0e2fcbc
      image N/A (booted from volume)
      key_name keyname
      name instancename
      progress 0
      project_id 0c72f838568c43ae870177a247bcae0a
      properties  
      security_groups name='All_Traffic'
      status ACTIVE
      updated 2021-01-22T01:40:27Z
      user_id 8e139a98904248cdbec36449d94184a4
      volumes_attached id='9e3591b8-2e02-43a7-bdd4-5ce14c77f8ff'
        id='63605a94-607e-470c-9951-eaf80875d618'
        id='26425f2f-4227-416a-a365-4a086882240e'
        id='12525859-32ba-4666-b875-e01304634c1a'
        id='e5015855-bedc-4189-bb2a-e3bba68968b0'
        id='fc7dd0d9-8919-41e7-8c86-e40e68915dff'
        id='23e443fd-2e14-44bf-bbde-c26814b479b0'
        id='d23f1fc8-fe84-4538-baa4-dc3d486e195b'
        id='210b5d5d-2575-4ff2-8991-95dddb18e306'
        id='b090ce10-149a-4f44-872d-789f5eb58280'
        id='370a06e8-ecf4-4b2f-a178-17369c6969a8'
        id='58a6e872-7b80-475d-a822-0879faca1ccf'
        id='1304e9bc-8846-4a73-bc14-ac9fb60a6b9c'
        id='44f4d0a4-a1af-4d12-8be5-a6ab7fd44f3a'
        id='ea507fe9-af11-4e22-bf97-ce71ab8a2698'

      ----------------------------------------------------------------------------------------------------+
      -------------------------
      ~~~

      ~~~
      [akaris@supportshell casenumber]$ find . -name 'nova*api.log' | xargs grep 'is not attached to volume'
      ./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log:2021-01-26 19:14:06.807 23 INFO nova.api.openstack.wsgi [req-5a62931d-e7d5-44c3-b796-76158de53a51 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] HTTP exception thrown: Instance 935cc10d-604f-4f19-9222-d217f0e2fcbc is not attached to volume 335c3afe-cd7a-48de-939c-20ca539ac9ec
      ./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log:2021-01-26 19:14:06.807 23 DEBUG nova.api.openstack.wsgi [req-5a62931d-e7d5-44c3-b796-76158de53a51 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Returning 404 to user: Instance 935cc10d-604f-4f19-9222-d217f0e2fcbc is not attached to volume 335c3afe-cd7a-48de-939c-20ca539ac9ec _call_ /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:943
      ./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log:2021-01-26 19:18:01.848 27 INFO nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] HTTP exception thrown: Instance 38dbc2ea-8f2a-4514-b2af-c77f572cf62f is not attached to volume ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7
      ./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log:2021-01-26 19:18:01.849 27 DEBUG nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Returning 404 to user: Instance 38dbc2ea-8f2a-4514-b2af-c77f572cf62f is not attached to volume ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7 _call_ /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:943
      ~~~

      ~~~
      [akaris@supportshell casenumber]$ grep req-d47a537e-241f-43f4-a975-826e047a3674 ./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log
      2021-01-26 19:18:01.799 27 DEBUG nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Calling method '<bound method VolumeAttachmentController.show of <nova.api.openstack.compute.volumes.VolumeAttachmentController object at 0x7f7fa41f8518>>' _process_stack /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:523
      2021-01-26 19:18:01.800 27 DEBUG nova.compute.api [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 38dbc2ea-8f2a-4514-b2af-c77f572cf62f] Fetching instance by UUID get /usr/lib/python3.6/site-packages/nova/compute/api.py:2633
      2021-01-26 19:18:01.806 27 DEBUG oslo_concurrency.lockutils [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Lock "d5b009bd-2183-472f-b385-9f88d0c726ad" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
      2021-01-26 19:18:01.806 27 DEBUG oslo_concurrency.lockutils [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Lock "d5b009bd-2183-472f-b385-9f88d0c726ad" released by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
      2021-01-26 19:18:01.807 27 DEBUG oslo_concurrency.lockutils [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Lock "d5b009bd-2183-472f-b385-9f88d0c726ad" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
      2021-01-26 19:18:01.807 27 DEBUG oslo_concurrency.lockutils [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Lock "d5b009bd-2183-472f-b385-9f88d0c726ad" released by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
      2021-01-26 19:18:01.848 27 INFO nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] HTTP exception thrown: Instance 38dbc2ea-8f2a-4514-b2af-c77f572cf62f is not attached to volume ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7
      2021-01-26 19:18:01.849 27 DEBUG nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Returning 404 to user: Instance 38dbc2ea-8f2a-4514-b2af-c77f572cf62f is not attached to volume ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7 _call_ /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:943
      2021-01-26 19:18:01.849 27 INFO nova.api.openstack.requestlog [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] 10.133.0.136 "GET /v2.1/0c72f838568c43ae870177a247bcae0a/servers/38dbc2ea-8f2a-4514-b2af-c77f572cf62f/os-volume_attachments/ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7" status: 404 len: 154 microversion: 2.1 time: 0.053621
      ~~~

      So, let's look into 935cc10d-604f-4f19-9222-d217f0e2fcbc is not attached to volume 335c3afe-cd7a-48de-939c-20ca539ac9ec
      ~~~
      $ find . -ipath 'var/log/containers/nova/.log*' | xargs egrep '935cc10d-604f-4f19-9222-d217f0e2fcbc|335c3afe-cd7a-48de-939c-20ca539ac9ec' -h | sort
      $ find . -ipath 'var/log/containers/cinder/.log*' | xargs egrep '935cc10d-604f-4f19-9222-d217f0e2fcbc|335c3afe-cd7a-48de-939c-20ca539ac9ec' -h | sort
      ~~~

      ~~~
      nova-compute.log.1:2021-01-26 19:13:55.123 7 INFO nova.compute.manager [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Attaching volume 335c3afe-cd7a-48de-939c-20ca539ac9ec to /dev/vdp
      nova-compute.log.1:2021-01-26 19:13:55.472 7 DEBUG nova.virt.block_device [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Updating existing volume attachment record: 861fb8fa-e66d-4092-ad16-0435ee8aeab7 _volume_attach /usr/lib/python3.6/site-packages/nova/virt/block_device.py:539
      nova-compute.log.1:2021-01-26 19:13:56.590 7 DEBUG nova.virt.libvirt.driver [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Attempting to attach volume 335c3afe-cd7a-48de-939c-20ca539ac9ec with discard support enabled to an instance using an unsupported configuration. target_bus = virtio. Trim commands will not be issued to the storage device. _check_discard_for_attach_volume /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:1499
      nova-compute.log.1: <source protocol="rbd" name="volumes/volume-335c3afe-cd7a-48de-939c-20ca539ac9ec">
      nova-compute.log.1: <serial>335c3afe-cd7a-48de-939c-20ca539ac9ec</serial>
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Failed to attach volume at mountpoint: /dev/vdp: libvirt.libvirtError: Requested operation is not valid: target vdp already exists
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Traceback (most recent call last):
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1541, in attach_volume
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] guest.attach_device(conf, persistent=True, live=live)
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 306, in attach_device
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] self._domain.attachDeviceFlags(device_xml, flags=flags)
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] result = proxy_call(self._autowrap, f, *args, **kwargs)
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = execute(f, *args, **kwargs)
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] six.reraise(c, e, tb)
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] raise value
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = meth(*args, **kwargs)
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 630, in attachDeviceFlags
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] libvirt.libvirtError: Requested operation is not valid: target vdp already exists
      nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc]
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Driver failed to attach volume 335c3afe-cd7a-48de-939c-20ca539ac9ec at /dev/vdp: libvirt.libvirtError: Requested operation is not valid: target vdp already exists
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Traceback (most recent call last):
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/block_device.py", line 567, in _volume_attach
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] device_type=self['device_type'], encryption=encryption)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1568, in attach_volume
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] encryption=encryption)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in _exit_
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] self.force_reraise()
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] six.reraise(self.type_, self.value, self.tb)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] raise value
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1541, in attach_volume
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] guest.attach_device(conf, persistent=True, live=live)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 306, in attach_device
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] self._domain.attachDeviceFlags(device_xml, flags=flags)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] result = proxy_call(self._autowrap, f, *args, **kwargs)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = execute(f, *args, **kwargs)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] six.reraise(c, e, tb)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] raise value
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = meth(*args, **kwargs)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = meth(*args, **kwargs)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 630, in attachDeviceFlags
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] libvirt.libvirtError: Requested operation is not valid: target vdp already exists
      nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc]
      ~~~

      Aha, so in nova-compute.log.1 on compute 17 we see:
      ~~~
      2021-01-26 19:13:56.590 7 DEBUG nova.virt.libvirt.driver [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Attempting to attach volume 335c3afe-cd7a-48de-939c-20ca539ac9ec with discard support enabled to an instance using an unsupported configuration. target_bus = virtio. Trim commands will not be issued to the storage device. _check_discard_for_attach_volume /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:1499
      2021-01-26 19:13:56.592 7 DEBUG nova.virt.libvirt.guest [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] attach device xml: <disk type="network" device="disk">
      <driver name="qemu" type="raw" cache="writeback" discard="unmap"/>
      <source protocol="rbd" name="volumes/volume-335c3afe-cd7a-48de-939c-20ca539ac9ec">
      <host name="10.134.0.1" port="6789"/>
      <host name="10.134.0.2" port="6789"/>
      <host name="10.134.0.3" port="6789"/>
      </source>
      <auth username="openstack">
      <secret type="ceph" uuid="f4c52b7e-b23d-41ed-9bc3-6bef9d133b4a"/>
      </auth>
      <target bus="virtio" dev="vdp"/>
      <serial>335c3afe-cd7a-48de-939c-20ca539ac9ec</serial>
      </disk>
      attach_device /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:305
      2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Failed to attach volume at mountpoint: /dev/vdp: libvirt.libvirtError: Requested operation is not valid: target vdp already exists
      ~~~

              mwitt@redhat.com melanie witt
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-dfg-compute
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: