Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-17559

[17.1][NetApp] Bulk instance creation failure "NetApp API failed. Reason - 16:Clone operation failed to start: Device busy"

XMLWordPrintable

    • Cinder Sprint 3, Cinder Sprint 4
    • 2
    • Moderate

      To Reproduce Steps to reproduce the behavior:

      Create instances in bulk will fail as volume in error as its creation failed resulting in instance creation failure. 

      Expected behavior
      Successful creation of instances. 

       

      Device Info (please complete the following information):

      RHOSP-17.1

      Bug impact

      Bulk instance creation fals

      Known workaround

      NIL

      Error details.
      Details below are captured from: 04042948

      We have did an analysis on VM : cff0454d-3d79-4173-8e1e-4336a1370fed which failed with below error 
          ~~~
          2025-01-31 09:41:56.398 2 ERROR nova.compute.manager [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] [instance: cff0454d-3d79-4173-8e1e-4336a1370fed] Instance failed block device setup: nova.exception.VolumeNotCreated: Volume fa99cb3d-4277-4e99-b522-e75c458f2608 did not finish being created even after we waited 128 seconds or 41 attempts. And its status is error.
          ~~~

      • We see that VM got its claim on comp7
            ~~~
            0120-sosreport-os-noi-comp7-04042948-2025-01-31-lucldhv.tar.xz/sosreport-os-noi-comp7-04042948-2025-01-31-lucldhv/var/log/containers/nova/nova-compute.log.1

          2025-01-31 09:39:46.877 2 INFO nova.compute.claims [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] [instance: cff0454d-3d79-4173-8e1e-4336a1370fed] Claim successful on node os-noi-comp7.acecloud.ai
          ~~~

      • VM started booting using volume : 8678749c-0ac3-44e6-9da9-28b730a14cf4
            ~~~
            2025-01-31 09:39:47.168 2 INFO nova.virt.block_device [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] [instance: cff0454d-3d79-4173-8ennn1e-4336a1370fed] Booting with volume-backed-image 8678749c-0ac3-44e6-9da9-28b730a14cf4 at /dev/sda
            ~~~
      • Further there was additional volume : fa99cb3d-4277-4e99-b522-e75c458f2608 request on VM, but volume creation failed at cinder side 
            ~~~
            2025-01-31 09:41:56.349 2 WARNING nova.compute.manager [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] Volume id: fa99cb3d-4277-4e99-b522-e75c458f2608 finished being created but its status is error.
            ~~~
      • Eventually VM failed to created
            ~~~
            2025-01-31 09:41:56.398 2 ERROR nova.compute.manager [req-4d8d2c28-924a-4d2e-8890-d0d7f84301f2 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - default default] [instance: cff0454d-3d79-4173-8e1e-4336a1370fed] Instance failed block device setup: nova.exception.VolumeNotCreated: Volume fa99cb3d-4277-4e99-b522-e75c458f2608 did not finish being created even after we waited 128 seconds or 41 attempts. And its status is error.
            ~~~
      • Now checking further for failed volume :fa99cb3d-4277-4e99-b522-e75c458f2608, we see it failed due to an issue on NETAPP side
            ~~~
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base [req-47bdc97f-478c-4c45-a0fc-659a23a73881 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - - -] Exception cloning volume volume-fa99cb3d-4277-4e99-b522-e75c458f2608 from source volume volume-d3a6f063-a8a2-4763-9a46-104f1175cc51.: cinder.volume.drivers.netapp.dataontap.client.api.NaApiError: NetApp API failed. Reason - 16:Clone operation failed to start: Device busy..
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base Traceback (most recent call last):
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base   File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_base.py", line 370, in _clone_source_to_destination
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base     self._clone_lun(
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base   File "/usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py", line 1488, in trace_method_logging_wrapper
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base     return f(*args, **kwargs)
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base   File "/usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py", line 1488, in trace_method_logging_wrapper
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base     return f(*args, **kwargs)
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base   File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_cmode.py", line 221, in _clone_lun
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base     self.zapi_client.clone_lun(
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base   File "/usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py", line 1488, in trace_method_logging_wrapper
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base     return f(*args, **kwargs)
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base   File "/usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py", line 1488, in trace_method_logging_wrapper
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base     return f(*args, **kwargs)
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base   File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/client/client_cmode.py", line 615, in clone_lun
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base     self.connection.invoke_successfully(clone_create, True)
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base   File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/client/api.py", line 231, in invoke_successfully
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base     raise NaApiError(code, msg)
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base cinder.volume.drivers.netapp.dataontap.client.api.NaApiError: NetApp API failed. Reason - 16:Clone operation failed to start: Device busy..
            2025-01-31 09:39:50.096 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base 
            ~~~
      • NetApp API responded with `Device Busy` 
            ~~~
            2025-01-31 09:40:52.331 11 ERROR cinder.volume.drivers.netapp.dataontap.block_base [req-47bdc97f-478c-4c45-a0fc-659a23a73881 faf6d7a32b64466b8e9c95241bcd2127 bdd6c04e96c04bcc8391b1f3b32cf6f9 - - -] Exception cloning volume volume-fa99cb3d-4277-4e99-b522-e75c458f2608 from source volume volume-d3a6f063-a8a2-4763-9a46-104f1175cc51.: cinder.volume.drivers.netapp.dataontap.client.api.NaApiError: NetApp API failed. Reason - 16:Clone operation failed to start: Device busy..
            ~~~

      Information.

      We asked customer to reach out to the NETAPP and Netapp  shared the patch: https://review.opendev.org/c/openstack/cinder/+/951485 and asked RH to provide the hotfix. 

       

      Discussed with engineering in slack thread[1] and there is hotfix for that planned in RHOSO-18 and for 17 to backport, the request should come from Netapp.
      https://redhat-internal.slack.com/archives/C04GLFJE57Y/p1749084810914089

       

      But this was opened to get how better we can assist as TAM: Vishal Sharma reached out to cinder engineering for help and seek support help to open a Jira. 

              brosmait@redhat.com Brian Rosmaita
              rhn-support-pgodwin Paul Godwin
              Eric Harney, Fernando Silva, Rajat Dhasmana
              rhos-storage-cinder
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: