Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-13983

[17.1.4][Hitachi FC storage] os_brick failed to attach volumes in available state.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • rhos-17.1.5
    • None
    • openstack-cinder
    • None
    • Important

      To Reproduce Steps to reproduce the behavior:

      While trying to attach volumes[1] in the available state to an instance it fails. 

      [1]
      8eaee081-f606-471f-a231-ea33223a2043
      02e40e88-1710-4384-be2a-604b90df0f35
      
      Instance_uuid: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3
      Sos file: 0160-sosreport-bbs-p-rhosc-compute19-2025-02-04-vrscpps.tar.xz/sosreport-bbs-p-rhosc-compute19-2025-02-04-vrscpps/var/log/containers/nova/nova-compute.log.5 
      
      2025-02-03 18:20:22.593 2 ERROR nova.virt.block_device [req-6be5bcb4-3f91-409b-8ba8-98fbdf2220fc fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Driver failed to attach volume 8eaee081-f606-471f-a231-ea33223a2043 at /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
      Command: multipath -C None
      Exit code: 1
      Stdout: ''
      Stderr: '4081361.023737 | get_udev_device: failed to look up None with type 3\n'
       
      ----------
      
      2025-02-03 19:33:25.283 2 ERROR nova.virt.block_device [req-9d14a11f-fa49-484c-a1f5-a6dc909f9e5c fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Driver failed to attach volume 02e40e88-1710-4384-be2a-604b90df0f35 at /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
      Command: multipath -C None
      Exit code: 1
      Stdout: ''
      Stderr: '4085743.714703 | get_udev_device: failed to look up None with type 3\n'
      
      

      Expected behavior

      • Volumes in the available state should get attach to the instance seamlessly.

      Device Info (please complete the following information):

      Rhosp-17.1.4
      Hitachi Fibre Channel storage

      Known workaround

      • As the instance booted with the boot volume, we were able to migrate that to different compute node. Post that attaching of additional volumes did help
      • Came across OSPRH-12848  and downgraded nova-compute version corresponding to 17.1.3 and no further issues are observed. 

      Additional context

      DB informattion: This was captured after the issue happened. 

      MariaDB [nova]> select * from block_device_mapping  where attachment_id='e8b567df-7f49-4f40-8197-4842b52b4fbd' \G
      *************************** 1. row ***************************
                 created_at: 2024-11-10 13:55:01
                 updated_at: 2025-02-01 12:44:42
                 deleted_at: NULL
                         id: 30800
                device_name: /dev/vdc
      delete_on_termination: 0
                snapshot_id: NULL
                  volume_id: 02e40e88-1710-4384-be2a-604b90df0f35
                volume_size: 70
                  no_device: NULL
            connection_info: {"driver_volume_type": "fibre_channel", "data": {"target_wwn": ["50060e80089d3c03", "50060e80089d3c13"], "target_discovered": false, "target_lun": 26, "qos_specs": null, "access_mode": "rw", "encrypted": false, "cacheable": false, "multipath_id": "360060e80089d3c0000509d3c0000034f"}, "status": "reserved", "instance": "1c9c7d87-ad65-40bd-b68d-9a578ac205c3", "attached_at": "", "detached_at": "", "volume_id": "02e40e88-1710-4384-be2a-604b90df0f35", "serial": "02e40e88-1710-4384-be2a-604b90df0f35"}
              instance_uuid: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3
                    deleted: 0
                source_type: volume
           destination_type: volume
               guest_format: NULL
                device_type: disk
                   disk_bus: virtio
                 boot_index: NULL
                   image_id: NULL
                        tag: NULL
              attachment_id: e8b567df-7f49-4f40-8197-4842b52b4fbd
                       uuid: 939359ec-d229-4ddd-a4fb-ef7d04b65dd8
                volume_type: NULL
      1 row in set (0.000 sec)
      MariaDB [nova]> select * from block_device_mapping  where attachment_id='eefcecef-3f18-4f6a-949a-3bc2f2c3b747' \G
      *************************** 1. row ***************************
                 created_at: 2024-10-14 10:07:53
                 updated_at: 2025-02-01 12:42:16
                 deleted_at: NULL
                         id: 30435
                device_name: /dev/vdb
      delete_on_termination: 0
                snapshot_id: NULL
                  volume_id: 8eaee081-f606-471f-a231-ea33223a2043
                volume_size: 500
                  no_device: NULL
            connection_info: {"driver_volume_type": "fibre_channel", "data": {"target_wwn": ["50060e80089d3c03", "50060e80089d3c13"], "target_discovered": false, "target_lun": 24, "qos_specs": null, "access_mode": "rw", "encrypted": false, "cacheable": false, "multipath_id": "360060e80089d3c0000509d3c00000319"}, "status": "reserved", "instance": "1c9c7d87-ad65-40bd-b68d-9a578ac205c3", "attached_at": "", "detached_at": "", "volume_id": "8eaee081-f606-471f-a231-ea33223a2043", "serial": "8eaee081-f606-471f-a231-ea33223a2043"}
              instance_uuid: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3
                    deleted: 0
                source_type: volume
           destination_type: volume
               guest_format: NULL
                device_type: disk
                   disk_bus: virtio
                 boot_index: NULL
                   image_id: NULL
                        tag: NULL
              attachment_id: eefcecef-3f18-4f6a-949a-3bc2f2c3b747
                       uuid: 20f734df-fce9-4af2-bf7b-3a2907ed6756
                volume_type: NULL
      1 row in set (0.002 sec)MariaDB [nova]>  

      Vol_id:02e40e88-1710-4384-be2a-604b90df0f35: 
      While attaching this volume, command executed successfully, but `nova volume-attachments ` did not show the attachment.

      2025-02-03 18:12:40.595 2 INFO nova.compute.manager [req-3a653602-3bd2-4a82-91ba-32f7f9cd8462 fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Attaching volume 02e40e88-1710-4384-be2a-604b90df0f35 to /dev/vdb
      2025-02-03 18:12:40.601 2 DEBUG nova.scheduler.client.report [req-91387c75-45c0-4b0f-aa32-7126cdc4177d fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Inventory has not changed for provider ee392e34-58d2-44df-a1e5-1b192ff61713 based on inventory data: {'VCPU': {'total': 128, 'reserved': 0, 'min_unit': 1, 'max_unit': 128, 'step_size': 1, 'allocation_ratio': 3.0}, 'MEMORY_MB': {'total': 1547604, 'reserved': 4096, 'min_unit': 1, 'max_unit': 1547604, 'step_size': 1, 'allocation_ratio': 1.0}, 'DISK_GB': {'total': 1488, 'reserved': 0, 'min_unit': 1, 'max_unit': 1488, 'step_size': 1, 'allocation_ratio': 1.0}} set_inventory_for_provider /usr/lib/python3.9/site-packages/nova/scheduler/client/report.py:940
      2025-02-03 18:12:40.601 2 DEBUG oslo_concurrency.lockutils [req-91387c75-45c0-4b0f-aa32-7126cdc4177d fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Lock "compute_resources" released by "nova.compute.resource_tracker.ResourceTracker.update_usage" :: held 0.029s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:367
      2025-02-03 18:12:40.602 2 INFO nova.compute.manager [req-91387c75-45c0-4b0f-aa32-7126cdc4177d fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Successfully reverted task state from None on failure for instance.
      2025-02-03 18:12:40.606 2 ERROR oslo_messaging.rpc.server [req-91387c75-45c0-4b0f-aa32-7126cdc4177d fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Exception during message handling: nova.exception.InvalidVolume: Invalid volume: volume 02e40e88-1710-4384-be2a-604b90df0f35 already attached
      2025-02-03 18:12:40.606 2 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
      2025-02-03 18:12:40.606 2 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
       
      ....
      ....
      ....
      2025-02-03 19:34:40.566 2 DEBUG oslo_service.periodic_task [req-3975ca65-014c-45ce-a572-60c8b31bd083 - - - - -] Running periodic task ComputeManager._poll_rescued_instances run_periodic_tasks /usr/lib/python3.9/site-packages/oslo_service/periodic_task.py:210
      2025-02-03 19:34:40.999 2 ERROR nova.compute.manager [req-9d14a11f-fa49-484c-a1f5-a6dc909f9e5c fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Failed to attach 02e40e88-1710-4384-be2a-604b90df0f35 at /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
      Command: multipath -C None
      Exit code: 1
      Stdout: ''
      Stderr: '4085743.714703 | get_udev_device: failed to look up None with type 3\n'
      2025-02-03 19:34:40.999 2 ERROR nova.compute.manager [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Traceback (most recent call last):
      2025-02-03 19:34:40.999 2 ERROR nova.compute.manager [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3]   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 7141, in _attach_volume

      Vol_id: 8eaee081-f606-471f-a231-ea33223a2043
      The volume does not seems to be present, but it fails with the 'multipath -C none' error too. 

      2025-02-03 19:30:28.287 2 DEBUG os_brick.initiator.linuxscsi [req-bafc0c7d-9413-4b8e-8e48-8c6c781096fd fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Checking to see if /dev/mapper/360060e80089d3c0000509d3c00000319 exists yet. wait_for_path /usr/lib/python3.9/site-packages/os_brick/initiator/linuxscsi.py:381
      2025-02-03 19:30:28.288 2 DEBUG os_brick.initiator.linuxscsi [req-bafc0c7d-9413-4b8e-8e48-8c6c781096fd fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] /dev/mapper/360060e80089d3c0000509d3c00000319 doesn't exists yet. wait_for_path /usr/lib/python3.9/site-packages/os_brick/initiator/linuxscsi.py:384
      2025-02-03 19:30:28.288 2 DEBUG os_brick.utils [req-bafc0c7d-9413-4b8e-8e48-8c6c781096fd fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Retrying os_brick.initiator.linuxscsi.LinuxSCSI.wait_for_path in 2.0 seconds as it raised VolumeDeviceNotFound: Volume device not found at /dev/mapper/360060e80089d3c0000509d3c00000319.. log_it /usr/lib/python3.9/site-packages/tenacity/before_sleep.py:40 
      ...
      ...
      2025-02-03 19:43:09.299 2 DEBUG os_brick.initiator.linuxscsi [req-fd8f2240-03fa-4f74-83cf-cb14ecc3a358 fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Checking to see if /dev/mapper/360060e80089d3c0000509d3c00000319 exists yet. wait_for_path /usr/lib/python3.9/site-packages/os_brick/initiator/linuxscsi.py:381
      2025-02-03 19:43:09.300 2 DEBUG os_brick.initiator.linuxscsi [req-fd8f2240-03fa-4f74-83cf-cb14ecc3a358 fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] /dev/mapper/360060e80089d3c0000509d3c00000319 doesn't exists yet. wait_for_path /usr/lib/python3.9/site-packages/os_brick/initiator/linuxscsi.py:384
      2025-02-03 19:43:09.300 2 WARNING os_brick.initiator.linuxscsi [req-fd8f2240-03fa-4f74-83cf-cb14ecc3a358 fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] couldn't find a valid multipath device path for 360060e80089d3c0000509d3c00000319

      Background of the issue:

      • The instanceHA node evacuated instances due to the unexpected reboot of compute12 and it landed in compute19. Disk was in compute-19 , but the node+host+network was pointing to compute-12 which got corrected by making DB modification.
      • Unmapped additional disks before correcting this using [openstack server remove volume..]
      • The instance went to emergency mode as the additional disks were not attached. Further attach of additional disks create this issue and followed steps in the workaround to bring instances back online. 

      Additional info:

      The sosreport : 0160-sosreport-bbs-p-rhosc-compute19-2025-02-04-vrscpps.tar.xz has the error i pointed above. 
      File                   : 0080-Compute12-Instance1c9c7d87-ad65-40bd-b68d-9a578ac205c3.txt has information captured as soon as 

       

              rhn-engineering-lhh Lon Hohberger
              rhn-support-pgodwin Paul Godwin
              rhos-storage-cinder
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: