-
Bug
-
Resolution: Done-Errata
-
Major
-
rhos-17.1.3
-
13
-
False
-
-
False
-
?
-
openstack-cinder-18.2.2-17.1.20250701200748.f6b44fc.el9ost
-
None
-
-
-
Cinder Sprint 3, Cinder Sprint 4
-
2
-
Moderate
To Reproduce Steps to reproduce the behavior:
Create instances in bulk will fail as volume in error as its creation failed resulting in instance creation failure.
Expected behavior
Successful creation of instances.
Device Info (please complete the following information):
RHOSP-17.1
Bug impact
Bulk instance creation fals
Known workaround
NIL
Error details.
Details below are captured from: 04042948
We have did an analysis on VM : cff0454d-3d79-4173-8e1e-4336a1370fed which failed with below error
~~~
2025-01-31 09:41:56.398 2 ERROR nova.compute.manager [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] [instance: cff0454d-3d79-4173-8e1e-4336a1370fed] Instance failed block device setup: nova.exception.VolumeNotCreated: Volume fa99cb3d-4277-4e99-b522-e75c458f2608 did not finish being created even after we waited 128 seconds or 41 attempts. And its status is error.
~~~
- We see that VM got its claim on comp7
~~~
0120-sosreport-os-noi-comp7-04042948-2025-01-31-lucldhv.tar.xz/sosreport-os-noi-comp7-04042948-2025-01-31-lucldhv/var/log/containers/nova/nova-compute.log.1
2025-01-31 09:39:46.877 2 INFO nova.compute.claims [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] [instance: cff0454d-3d79-4173-8e1e-4336a1370fed] Claim successful on node os-noi-comp7.acecloud.ai
~~~
- VM started booting using volume : 8678749c-0ac3-44e6-9da9-28b730a14cf4
~~~
2025-01-31 09:39:47.168 2 INFO nova.virt.block_device [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] [instance: cff0454d-3d79-4173-8ennn1e-4336a1370fed] Booting with volume-backed-image 8678749c-0ac3-44e6-9da9-28b730a14cf4 at /dev/sda
~~~
- Further there was additional volume : fa99cb3d-4277-4e99-b522-e75c458f2608 request on VM, but volume creation failed at cinder side
~~~
2025-01-31 09:41:56.349 2 WARNING nova.compute.manager [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] Volume id: fa99cb3d-4277-4e99-b522-e75c458f2608 finished being created but its status is error.
~~~
- Eventually VM failed to created
~~~
2025-01-31 09:41:56.398 2 ERROR nova.compute.manager [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] [instance: cff0454d-3d79-4173-8e1e-4336a1370fed] Instance failed block device setup: nova.exception.VolumeNotCreated: Volume fa99cb3d-4277-4e99-b522-e75c458f2608 did not finish being created even after we waited 128 seconds or 41 attempts. And its status is error.
~~~
- Now checking further for failed volume :fa99cb3d-4277-4e99-b522-e75c458f2608, we see it failed due to an issue on NETAPP side
~~~
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base [req-47bdc97f-478c-4c45-a0fc-659a23a73881 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - - -] Exception cloning volume volume-fa99cb3d-4277-4e99-b522-e75c458f2608 from source volume volume-d3a6f063-a8a2-4763-9a46-104f1175cc51.: cinder.volume.drivers.netapp.dataontap.client.api.NaApiError: NetApp API failed. Reason - 16:Clone operation failed to start: Device busy..
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base Traceback (most recent call last):
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_base.py", line 370, in _clone_source_to_destination
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base self._clone_lun(
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base File "/usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py", line 1488, in trace_method_logging_wrapper
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base return f(*args, **kwargs)
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base File "/usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py", line 1488, in trace_method_logging_wrapper
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base return f(*args, **kwargs)
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_cmode.py", line 221, in _clone_lun
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base self.zapi_client.clone_lun(
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base File "/usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py", line 1488, in trace_method_logging_wrapper
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base return f(*args, **kwargs)
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base File "/usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py", line 1488, in trace_method_logging_wrapper
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base return f(*args, **kwargs)
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/client/client_cmode.py", line 615, in clone_lun
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base self.connection.invoke_successfully(clone_create, True)
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/client/api.py", line 231, in invoke_successfully
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base raise NaApiError(code, msg)
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base cinder.volume.drivers.netapp.dataontap.client.api.NaApiError: NetApp API failed. Reason - 16:Clone operation failed to start: Device busy..
2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base
~~~
- NetApp API responded with `Device Busy`
~~~
2025-01-31 09:40:52.331 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base [req-47bdc97f-478c-4c45-a0fc-659a23a73881 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - - -] Exception cloning volume volume-fa99cb3d-4277-4e99-b522-e75c458f2608 from source volume volume-d3a6f063-a8a2-4763-9a46-104f1175cc51.: cinder.volume.drivers.netapp.dataontap.client.api.NaApiError: NetApp API failed. Reason - 16:Clone operation failed to start: Device busy..
~~~
Information.
We asked customer to reach out to the NETAPP and Netapp shared the patch: https://review.opendev.org/c/openstack/cinder/+/951485 and asked RH to provide the hotfix.
Discussed with engineering in slack thread[1] and there is hotfix for that planned in RHOSO-18 and for 17 to backport, the request should come from Netapp.
https://redhat-internal.slack.com/archives/C04GLFJE57Y/p1749084810914089
But this was opened to get how better we can assist as TAM: Vishal Sharma reached out to cinder engineering for help and seek support help to open a Jira.
- is blocked by
-
OSPRH-17850 [18.0][NetApp] Bulk instance creation failure "NetApp API failed. Reason - 16:Clone operation failed to start: Device busy"
-
- Closed
-
- is cloned by
-
OSPRH-17850 [18.0][NetApp] Bulk instance creation failure "NetApp API failed. Reason - 16:Clone operation failed to start: Device busy"
-
- Closed
-
-
OSPRH-17860 [NetApp] Bulk instance creation failure "NetApp API failed. Reason - 16:Clone operation failed to start: Device busy"
-
- Closed
-
- links to
-
RHBA-2025:152458 Red Hat OpenStack Platform 17.1.9 bug fix and enhancement advisory