-
Bug
-
Resolution: Done
-
Normal
-
None
-
None
-
False
-
False
-
None
-
Undefined
-
Description of problem:
Hi,
At a customer site, we are running into a complex chain of timeouts upon volume attachment deletion. For their backup solution, the customer creates an instance, then creates volume snapshots, attached these snapshots to the backup instance. After their backups complete, they then proceed with snapshot detachment and deletion.
The volume detachment regularly times out in the customer environment, leaving their backup instances in an inconsistent state. The problem is that the attachment delete operation takes longer than the various timeouts in OpenStack.
On the nova compute nodes, we see 504 timeouts which come from haproxy or httpd wsgi integration, depending on the configured timeouts:
~~~
2021-01-19 01:02:09.674 7 ERROR nova.volume.cinder [req-9febe6f3-d031-4ff2-afb8-0be032b79808 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Delete attachment failed for attachment 81 [details]cb9dd7-b20d-4ee8-b41d-65fbb81d3aa3. Error: Gateway Timeout (HTTP 504) Code: 504: cinderclient.exceptions.ClientException: Gateway Timeout (HTTP 504)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server [req-9febe6f3-d031-4ff2-afb8-0be032b79808 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Exception during message handling: cinderclient.exceptions.ClientException: Gateway Timeout (HTTP 504)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in _exit_
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server raise value
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1346, in decorated_function
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 215, in decorated_function
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in _exit_
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server raise value
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 203, in decorated_function
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 6002, in detach_volume
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server do_detach_volume(context, volume_id, instance, attachment_id)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 6000, in do_detach_volume
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server attachment_id=attachment_id)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 5951, in _detach_volume
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server attachment_id=attachment_id, destroy_bdm=destroy_bdm)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/block_device.py", line 446, in detach
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server attachment_id, destroy_bdm)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/block_device.py", line 427, in _do_detach
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server volume_api.attachment_delete(context, self['attachment_id'])
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/volume/cinder.py", line 398, in wrapper
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server res = method(self, ctx, *args, **kwargs)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/volume/cinder.py", line 448, in wrapper
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server res = method(self, ctx, attachment_id, *args, **kwargs)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/volume/cinder.py", line 882, in attachment_delete
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server 'code': getattr(ex, 'code', None)})
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in _exit_
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server raise value
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/volume/cinder.py", line 875, in attachment_delete
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server attachment_id)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/api_versions.py", line 407, in substitution
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return method.func(obj, *args, **kwargs)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/v3/attachments.py", line 44, in delete
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return self._delete("/attachments/%s" % base.getid(attachment))
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/base.py", line 336, in _delete
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server resp, body = self.api.client.delete(url)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/client.py", line 209, in delete
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return self._cs_request(url, 'DELETE', **kwargs)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/client.py", line 191, in _cs_request
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server return self.request(url, method, **kwargs)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinderclient/client.py", line 177, in request
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server raise exceptions.from_response(resp, body)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server cinderclient.exceptions.ClientException: Gateway Timeout (HTTP 504)
2021-01-19 01:02:09.719 7 ERROR oslo_messaging.rpc.server
~~~
We are currently trying to work around this with the following steps.
-----------------------------------------------------------------------------------------------------------------------------------------
Summary of all actions so far - if we have success with this, we will have to tweak this with OpenStack Director, as well
=====================================================================================================================================
i) Increasing the rabbitmq timeout for cinder-api from 60 seconds to 300 seconds to longer would be a possible avenue.
~~~
[$ egrep '^[|rpc_response_timeout' /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf | grep rpc_resp -B1
[DEFAULT]
#rpc_response_timeout = 60
~~~
Can you run the following on all controllers:
~~~
crudinit --set /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf DEFAULT rpc_response_timeout = 300
podman ps | grep cinder | awk '
~~~
ii)
On all controllers, edit file /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg
Increase the default timeout from 2 minutes to 10 minutes for client and server:
~~~
defaults
log global
maxconn 4096
mode tcp
retries 3
timeout http-request 10s
timeout queue 2m
timeout connect 10s
timeout client 10m
timeout server 10m
timeout check 10s
~~~
Then, run the haproxy container:
~~~
podman ps | grep haproxy-bundle | xargs podman restart
~~~
iii) Raise the httpd / wsgi timeouts to 300 seconds for cinder:
Edit:
/var/lib/config-data/puppet-generated/cinder/etc/httpd/conf/httpd.conf
Set:
~~~
Timeout 300
~~~
Then, restart all cinder services:
~~~
podman ps | grep cinder | awk '{print $NF}
' | xargs podman restart
~~~
-----------------------------------------------------------------------------------------------------------------------------------------
Our goal is to fix the actual timeout issue with the above steps (or some further steps).
However, hitting an Exception in volume_api.attachment_delete has a nasty side effect, and I'm filing this BZ to have that side effect addressed. Nova is not handling these exceptions correctly and nova's DB ends up with an inconsistent state with regards to the instance XML. Any new volume attachment operation to the affected instance will fail due to this inconsistency.
Nova should reconcile its database and the instacne XML even when the aforementioned exception is thrown due to timeout issues. An upstream bug was already reported for this: https://bugs.launchpad.net/nova/+bug/1888665
I acknowledge that the problem must be fixed by adjusting the timeouts of all involved OpenStack components so that we account for potentially slow volume attachment deletion. However, the customer's database is regularly left in a messy state when these timeouts are hit because nova simply gives up during the detach and does not reconcile the instance XML and nova DB after the Exception is thrown.
By the way, the detach eventually succeeds on the backend. I don't remember why exactly, but the instance XML and nova's database do not match.
Subsequently, when the backup solution wants to attach new volumes to the affected instance, nova tries to attach volumes to the instance with device names that already exist for the libvirt instance. The instance XML contains invalid volume names (those of the volumes for which the attachment deletion succeeds on the volume backend, but for which the detachment operation timed out and threw that Exception in nova compute).
Here is one of the symptoms that we see further down the road in the customer's environment, and our customer must manually clean up the broken instance state to fix this:
------------------------------------------------------------------------------------------------------------------------------------------
VSA VM hitting issue of reattaching volumes:
----------------------------------------------------------------------------------------------------+
Field | Value |
----------------------------------------------------------------------------------------------------+
OS-DCF:diskConfig | AUTO |
OS-EXT-AZ:availability_zone | nova |
OS-EXT-SRV-ATTR:host | pod-compute-017.subdomain.example.com |
OS-EXT-SRV-ATTR:hypervisor_hostname | pod-compute-017.subdomain.example.com |
OS-EXT-SRV-ATTR:instance_name | instance-00001f2a |
OS-EXT-STS:power_state | Running |
OS-EXT-STS:task_state | None |
OS-EXT-STS:vm_state | active |
OS-SRV-USG:launched_at | 2020-09-16T11:29:40.000000 |
OS-SRV-USG:terminated_at | None |
accessIPv4 | |
accessIPv6 | |
addresses | backupprovider-Backup=10.80.123.102 |
config_drive | |
created | 2020-09-16T11:29:14Z |
flavor | subdomain-16vCPU-64GBMemory (d91b618d-61bf-43e6-8ff5-19eafeb06a81) |
hostId | e0d387cb30bafa2993744a5ab6c98f7d4ddf6a4a4411f086dcf2f31a |
id | 935cc10d-604f-4f19-9222-d217f0e2fcbc |
image | N/A (booted from volume) |
key_name | keyname |
name | instancename |
progress | 0 |
project_id | 0c72f838568c43ae870177a247bcae0a |
properties | |
security_groups | name='All_Traffic' |
status | ACTIVE |
updated | 2021-01-22T01:40:27Z |
user_id | 8e139a98904248cdbec36449d94184a4 |
volumes_attached | id='9e3591b8-2e02-43a7-bdd4-5ce14c77f8ff' |
id='63605a94-607e-470c-9951-eaf80875d618' | |
id='26425f2f-4227-416a-a365-4a086882240e' | |
id='12525859-32ba-4666-b875-e01304634c1a' | |
id='e5015855-bedc-4189-bb2a-e3bba68968b0' | |
id='fc7dd0d9-8919-41e7-8c86-e40e68915dff' | |
id='23e443fd-2e14-44bf-bbde-c26814b479b0' | |
id='d23f1fc8-fe84-4538-baa4-dc3d486e195b' | |
id='210b5d5d-2575-4ff2-8991-95dddb18e306' | |
id='b090ce10-149a-4f44-872d-789f5eb58280' | |
id='370a06e8-ecf4-4b2f-a178-17369c6969a8' | |
id='58a6e872-7b80-475d-a822-0879faca1ccf' | |
id='1304e9bc-8846-4a73-bc14-ac9fb60a6b9c' | |
id='44f4d0a4-a1af-4d12-8be5-a6ab7fd44f3a' | |
id='ea507fe9-af11-4e22-bf97-ce71ab8a2698' |
----------------------------------------------------------------------------------------------------+
-------------------------
~~~
~~~
[akaris@supportshell casenumber]$ find . -name 'nova*api.log' | xargs grep 'is not attached to volume'
./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log:2021-01-26 19:14:06.807 23 INFO nova.api.openstack.wsgi [req-5a62931d-e7d5-44c3-b796-76158de53a51 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] HTTP exception thrown: Instance 935cc10d-604f-4f19-9222-d217f0e2fcbc is not attached to volume 335c3afe-cd7a-48de-939c-20ca539ac9ec
./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log:2021-01-26 19:14:06.807 23 DEBUG nova.api.openstack.wsgi [req-5a62931d-e7d5-44c3-b796-76158de53a51 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Returning 404 to user: Instance 935cc10d-604f-4f19-9222-d217f0e2fcbc is not attached to volume 335c3afe-cd7a-48de-939c-20ca539ac9ec _call_ /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:943
./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log:2021-01-26 19:18:01.848 27 INFO nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] HTTP exception thrown: Instance 38dbc2ea-8f2a-4514-b2af-c77f572cf62f is not attached to volume ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7
./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log:2021-01-26 19:18:01.849 27 DEBUG nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Returning 404 to user: Instance 38dbc2ea-8f2a-4514-b2af-c77f572cf62f is not attached to volume ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7 _call_ /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:943
~~~
~~~
[akaris@supportshell casenumber]$ grep req-d47a537e-241f-43f4-a975-826e047a3674 ./0010-sosreport-pod-controller-02-2021-01-26-eiuaywv.tar.xz/sosreport-pod-controller-02-2021-01-26-eiuaywv/var/log/containers/nova/nova-api.log
2021-01-26 19:18:01.799 27 DEBUG nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Calling method '<bound method VolumeAttachmentController.show of <nova.api.openstack.compute.volumes.VolumeAttachmentController object at 0x7f7fa41f8518>>' _process_stack /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:523
2021-01-26 19:18:01.800 27 DEBUG nova.compute.api [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 38dbc2ea-8f2a-4514-b2af-c77f572cf62f] Fetching instance by UUID get /usr/lib/python3.6/site-packages/nova/compute/api.py:2633
2021-01-26 19:18:01.806 27 DEBUG oslo_concurrency.lockutils [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Lock "d5b009bd-2183-472f-b385-9f88d0c726ad" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
2021-01-26 19:18:01.806 27 DEBUG oslo_concurrency.lockutils [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Lock "d5b009bd-2183-472f-b385-9f88d0c726ad" released by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2021-01-26 19:18:01.807 27 DEBUG oslo_concurrency.lockutils [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Lock "d5b009bd-2183-472f-b385-9f88d0c726ad" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
2021-01-26 19:18:01.807 27 DEBUG oslo_concurrency.lockutils [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Lock "d5b009bd-2183-472f-b385-9f88d0c726ad" released by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2021-01-26 19:18:01.848 27 INFO nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] HTTP exception thrown: Instance 38dbc2ea-8f2a-4514-b2af-c77f572cf62f is not attached to volume ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7
2021-01-26 19:18:01.849 27 DEBUG nova.api.openstack.wsgi [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] Returning 404 to user: Instance 38dbc2ea-8f2a-4514-b2af-c77f572cf62f is not attached to volume ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7 _call_ /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:943
2021-01-26 19:18:01.849 27 INFO nova.api.openstack.requestlog [req-d47a537e-241f-43f4-a975-826e047a3674 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] 10.133.0.136 "GET /v2.1/0c72f838568c43ae870177a247bcae0a/servers/38dbc2ea-8f2a-4514-b2af-c77f572cf62f/os-volume_attachments/ded9d2b4-38ae-4fcc-b1ca-aaf61bfcccb7" status: 404 len: 154 microversion: 2.1 time: 0.053621
~~~
So, let's look into 935cc10d-604f-4f19-9222-d217f0e2fcbc is not attached to volume 335c3afe-cd7a-48de-939c-20ca539ac9ec
~~~
$ find . -ipath 'var/log/containers/nova/.log*' | xargs egrep '935cc10d-604f-4f19-9222-d217f0e2fcbc|335c3afe-cd7a-48de-939c-20ca539ac9ec' -h | sort
$ find . -ipath 'var/log/containers/cinder/.log*' | xargs egrep '935cc10d-604f-4f19-9222-d217f0e2fcbc|335c3afe-cd7a-48de-939c-20ca539ac9ec' -h | sort
~~~
~~~
nova-compute.log.1:2021-01-26 19:13:55.123 7 INFO nova.compute.manager [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Attaching volume 335c3afe-cd7a-48de-939c-20ca539ac9ec to /dev/vdp
nova-compute.log.1:2021-01-26 19:13:55.472 7 DEBUG nova.virt.block_device [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Updating existing volume attachment record: 861fb8fa-e66d-4092-ad16-0435ee8aeab7 _volume_attach /usr/lib/python3.6/site-packages/nova/virt/block_device.py:539
nova-compute.log.1:2021-01-26 19:13:56.590 7 DEBUG nova.virt.libvirt.driver [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Attempting to attach volume 335c3afe-cd7a-48de-939c-20ca539ac9ec with discard support enabled to an instance using an unsupported configuration. target_bus = virtio. Trim commands will not be issued to the storage device. _check_discard_for_attach_volume /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:1499
nova-compute.log.1: <source protocol="rbd" name="volumes/volume-335c3afe-cd7a-48de-939c-20ca539ac9ec">
nova-compute.log.1: <serial>335c3afe-cd7a-48de-939c-20ca539ac9ec</serial>
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Failed to attach volume at mountpoint: /dev/vdp: libvirt.libvirtError: Requested operation is not valid: target vdp already exists
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Traceback (most recent call last):
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1541, in attach_volume
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] guest.attach_device(conf, persistent=True, live=live)
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 306, in attach_device
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] self._domain.attachDeviceFlags(device_xml, flags=flags)
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] result = proxy_call(self._autowrap, f, *args, **kwargs)
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = execute(f, *args, **kwargs)
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] six.reraise(c, e, tb)
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] raise value
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = meth(*args, **kwargs)
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 630, in attachDeviceFlags
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] libvirt.libvirtError: Requested operation is not valid: target vdp already exists
nova-compute.log.1:2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc]
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Driver failed to attach volume 335c3afe-cd7a-48de-939c-20ca539ac9ec at /dev/vdp: libvirt.libvirtError: Requested operation is not valid: target vdp already exists
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Traceback (most recent call last):
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/block_device.py", line 567, in _volume_attach
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] device_type=self['device_type'], encryption=encryption)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1568, in attach_volume
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] encryption=encryption)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in _exit_
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] self.force_reraise()
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] six.reraise(self.type_, self.value, self.tb)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] raise value
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1541, in attach_volume
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] guest.attach_device(conf, persistent=True, live=live)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 306, in attach_device
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] self._domain.attachDeviceFlags(device_xml, flags=flags)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] result = proxy_call(self._autowrap, f, *args, **kwargs)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = execute(f, *args, **kwargs)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] six.reraise(c, e, tb)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] raise value
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = meth(*args, **kwargs)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] rv = meth(*args, **kwargs)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 630, in attachDeviceFlags
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] libvirt.libvirtError: Requested operation is not valid: target vdp already exists
nova-compute.log.1:2021-01-26 19:13:56.597 7 ERROR nova.virt.block_device [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc]
~~~
Aha, so in nova-compute.log.1 on compute 17 we see:
~~~
2021-01-26 19:13:56.590 7 DEBUG nova.virt.libvirt.driver [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Attempting to attach volume 335c3afe-cd7a-48de-939c-20ca539ac9ec with discard support enabled to an instance using an unsupported configuration. target_bus = virtio. Trim commands will not be issued to the storage device. _check_discard_for_attach_volume /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:1499
2021-01-26 19:13:56.592 7 DEBUG nova.virt.libvirt.guest [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] attach device xml: <disk type="network" device="disk">
<driver name="qemu" type="raw" cache="writeback" discard="unmap"/>
<source protocol="rbd" name="volumes/volume-335c3afe-cd7a-48de-939c-20ca539ac9ec">
<host name="10.134.0.1" port="6789"/>
<host name="10.134.0.2" port="6789"/>
<host name="10.134.0.3" port="6789"/>
</source>
<auth username="openstack">
<secret type="ceph" uuid="f4c52b7e-b23d-41ed-9bc3-6bef9d133b4a"/>
</auth>
<target bus="virtio" dev="vdp"/>
<serial>335c3afe-cd7a-48de-939c-20ca539ac9ec</serial>
</disk>
attach_device /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:305
2021-01-26 19:13:56.595 7 ERROR nova.virt.libvirt.driver [req-9051f276-94bb-4e63-b844-8eed83164575 aaaa63afeb224124b389f03b1b1d2f47 0c72f838568c43ae870177a247bcae0a - default default] [instance: 935cc10d-604f-4f19-9222-d217f0e2fcbc] Failed to attach volume at mountpoint: /dev/vdp: libvirt.libvirtError: Requested operation is not valid: target vdp already exists
~~~