Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-16737

Randomly getting: "Unable to find a Fibre Channel volume device" when attaching volumes to an instance

XMLWordPrintable

    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • python-os-brick-6.2.5-18.0.20250519184829.7ad0ed6.el9ost
    • None
    • Hide
      .Improved reliability for Fibre Channel volume attachments

      Before this update, Fibre Channel volume attachments failed intermittently with a `NoFibreChannelVolumeDeviceFound` error due to partial scanning of devices. With this update, a broader scan results in better discovery of devices and successful attach operations.
      Show
      .Improved reliability for Fibre Channel volume attachments Before this update, Fibre Channel volume attachments failed intermittently with a `NoFibreChannelVolumeDeviceFound` error due to partial scanning of devices. With this update, a broader scan results in better discovery of devices and successful attach operations.
    • Bug Fix
    • Done
    • Pending Verification
    • 1
    • Critical

      To Reproduce Steps to reproduce the behavior:

      1. Create a instance
      2. Create a volume
      3. attach volume to the instance
      4. See error

      Expected behavior

      • attach the volumes

      Env Info :

      • Cinder backed by Netapp fc storage

      Hello, this customer is facing issues while attaching volumes backed by netapp to instances.

      From nova point of view we can see

       

      2025-05-18 11:25:05.605 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x204bd039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222
      2025-05-18 11:25:05.605 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x2056d039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222
      2025-05-18 11:25:05.605 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x204cd039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222
      
      2025-05-18 11:25:05.605 2 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry.  Try number: 1.
      2025-05-18 11:25:05.605 2 DEBUG os_brick.initiator.linuxfc [-] Rescaning HBAs
      
      2025-05-18 11:25:07.607 2 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry.  Try number: 2.
      2025-05-18 11:25:07.607 2 DEBUG os_brick.initiator.linuxfc [-] Rescaning HBAs
      
      2025-05-18 11:25:09.610 2 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry.  Try number: 3.
      2025-05-18 11:25:09.610 2 DEBUG os_brick.initiator.linuxfc [-] Rescaning HBAs
      
      2025-05-18 11:25:11.606 2 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry.  Try number: 4.
      2025-05-18 11:25:11.606 2 DEBUG os_brick.initiator.linuxfc [-] Rescaning HBAs
      
      2025-05-18 11:25:13.608 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x204bd039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222
      2025-05-18 11:25:13.609 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x2056d039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222
      2025-05-18 11:25:13.609 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x204cd039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222
      2025-05-18 11:25:13.609 2 ERROR os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not found.
      2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'os_brick.initiator.connectors.fibre_channel.FibreChannelConnector.connect_volume.<locals>._wait_for_device_discovery' failed: os_brick.exception.NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.
      2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall Traceback (most recent call last):
      2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall   File "/usr/lib/python3.9/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
      2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall     result = func(*self.args, **self.kw)
      2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall   File "/usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py", line 233, in _wait_for_device_discovery
      2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall     raise exception.NoFibreChannelVolumeDeviceFound()
      2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall os_brick.exception.NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.
      2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall 

      while from cinder 

       

       

      2025-05-18 11:25:03.464 23 DEBUG cinder.volume.drivers.netapp.dataontap.block_base [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Mapped LUN volume-383016d9-265d-411f-a277-49f51f09d525 to the initiator(s) ['10:00:70:b7:e4:28:56:31', '10:00:70:b7:e4:28:56:30'] initialize_connection_fc /usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_base.py:1005
      2025-05-18 11:25:03.488 23 DEBUG cinder.volume.drivers.netapp.dataontap.block_base [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Successfully fetched target details for LUN volume-383016d9-265d-411f-a277-49f51f09d525 and initiator(s) ['10:00:70:b7:e4:28:56:31', '10:00:70:b7:e4:28:56:30'] initialize_connection_fc /usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_base.py:1012
      2025-05-18 11:25:03.488 23 DEBUG cinder.zonemanager.utils [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Zoning mode: None. create_zone_manager /usr/lib/python3.9/site-packages/cinder/zonemanager/utils.py:33
      2025-05-18 11:25:03.489 23 DEBUG cinder.zonemanager.utils [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] FC Zone Manager not enabled in cinder.conf. create_zone_manager /usr/lib/python3.9/site-packages/cinder/zonemanager/utils.py:50
      2025-05-18 11:25:03.518 23 DEBUG cinder.volume.manager [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Connection info returned from driver {'target_discovered': True, 'target_lun': 4, 'target_wwn': ['204dd039eac514a1', '2043d039eac514a1', '204ed039eac514a1', '2044d039eac514a1', '204fd039eac514a1', '2045d039eac514a1', '2050d039eac514a1', '2046d039eac514a1', '2051d039eac514a1', '2047d039eac514a1', '2052d039eac514a1', '2048d039eac514a1', '2053d039eac514a1', '2049d039eac514a1', '2054d039eac514a1', '204ad039eac514a1', '2055d039eac514a1', '204bd039eac514a1', '2056d039eac514a1', '204cd039eac514a1'], 'initiator_target_map': {'100070b7e4285631': ['204dd039eac514a1', '2043d039eac514a1', '204ed039eac514a1', '2044d039eac514a1', '204fd039eac514a1', '2045d039eac514a1', '2050d039eac514a1', '2046d039eac514a1', '2051d039eac514a1', '2047d039eac514a1', '2052d039eac514a1', '2048d039eac514a1', '2053d039eac514a1', '2049d039eac514a1', '2054d039eac514a1', '204ad039eac514a1', '2055d039eac514a1', '204bd039eac514a1', '2056d039eac514a1', '204cd039eac514a1'], '100070b7e4285630': ['204dd039eac514a1', '2043d039eac514a1', '204ed039eac514a1', '2044d039eac514a1', '204fd039eac514a1', '2045d039eac514a1', '2050d039eac514a1', '2046d039eac514a1', '2051d039eac514a1', '2047d039eac514a1', '2052d039eac514a1', '2048d039eac514a1', '2053d039eac514a1', '2049d039eac514a1', '2054d039eac514a1', '204ad039eac514a1', '2055d039eac514a1', '204bd039eac514a1', '2056d039eac514a1', '204cd039eac514a1']}, 'qos_specs': None, 'access_mode': 'rw', 'encrypted': False, 'cacheable': False, 'driver_volume_type': 'fibre_channel', 'attachment_id': '67e853b0-f3d2-462c-bf36-2f88f1b11852'} _connection_create /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4694
      2025-05-18 11:25:03.584 23 INFO cinder.volume.manager [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] attachment_update completed successfully.
      2025-05-18 11:25:06.461 29 DEBUG oslo_service.periodic_task [req-5b8752cc-5bf7-4c5a-bba9-7fe70ae6c208 - - - - -] Running periodic task VolumeManager.publish_service_capabilities run_periodic_tasks /usr/lib/python3.9/site-packages/oslo_service/periodic_task.py:210
      2025-05-18 11:25:06.462 29 DEBUG cinder.volume.drivers.netapp.dataontap.block_cmode [req-5b8752cc-5bf7-4c5a-bba9-7fe70ae6c208 - - - - -] Updating volume stats _update_volume_stats /usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_cmode.py:266
      2025-05-18 11:25:13.516 29 DEBUG cinder.manager [req-5b8752cc-5bf7-4c5a-bba9-7fe70ae6c208 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python3.9/site-packages/cinder/manager.py:197
      2025-05-18 11:25:13.688 23 DEBUG cinder.volume.drivers.netapp.dataontap.block_base [req-42f4c571-6f34-4d05-aece-1db7c7cd5485 b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Unmapping LUN volume-383016d9-265d-411f-a277-49f51f09d525 from the initiators ['10:00:70:b7:e4:28:56:31', '10:00:70:b7:e4:28:56:30'] terminate_connection_fc /usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_base.py:1049
      2025-05-18 11:25:13.786 23 INFO cinder.volume.manager [req-42f4c571-6f34-4d05-aece-1db7c7cd5485 b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Terminate volume connection completed successfully.
      2025-05-18 11:25:13.786 23 DEBUG cinder.volume.manager [req-42f4c571-6f34-4d05-aece-1db7c7cd5485 b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Deleting attachment 67e853b0-f3d2-462c-bf36-2f88f1b11852. _do_attachment_delete /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4843 

      Volume creation is always working, while attachment operations are randomly failing regardless of the backend used (they have 2 different in the same Netapp) and regardless the compute.

       

      Netapp analysis of the issue:

       

      We have been investigating an issue related to volume attachment failures in our OpenStack environment. During this process, we manually collected the ASUP logs; however, these logs did not capture the volume-related tasks executed in OpenStack. In contrast, the cinder-volume logs clearly reflect both successful and failed volume attachment attempts.
      
      In both scenarios (successful and failed), the process begins with mapping the LUN, updating the attachment, and handing over the request to Nova Compute for multipath device discovery.
      
      In successful cases, the multipath device was discovered within approximately 3 seconds. However, in failed instances, the system attempted discovery five times at 2-second intervals, ultimately failing to locate the multipath device.
      
      We are observing intermittent failures, not consistent ones. Multipath is functioning in general, but there are cases where the LUNs do not appear. Based on our findings, the potential root causes could be:
      • Multipathd timing or synchronization issues: In some cases, multipathd does not register new devices quickly enough, causing Nova to timeout.
      • udev rule conflicts or delays:  OpenStack relies on os-brick for device discovery via udev and multipath. Any delay in these components can result in intermittent failures.
      
      Next Steps:
      
      1. Increase retry count/timeouts in Nova/Cinder configuration:
      
      [libvirt]
      volume_attach_retry_count = 5
      volume_attach_retry_interval = 5
      
      
      2. Monitor udev events while attaching volumes:  udevadm monitor --kernel --udev
      
      we can check if /dev/dm-* appears late or fails to appear.
      
      We recommend validating the above points to help mitigate the intermittent failures during volume attachments.
       
      Note: The issue does not appear to originate from the NetApp storage system. The LUN mapping and attachment processes are completing successfully, as confirmed in the cinder-volume logs. The problem occurs after the attachment, specifically during the device discovery phase on the OpenStack compute host. This failure is host-side, where the expected multipath device is not being detected in time by Nova/os-brick, despite the LUN being available. 

      sbr-storage couldn't find anything wrong related to RHEL storage components.

      Actions taken:

      • Bumped ** without any effect
      volume_attach_retry_count
      volume_attach_retry_interval
      • enabled SCSI debug logging and reproduced the issue

      Any supporting data is available in the case linked with this request

       

              rdhasman@redhat.com Rajat Dhasmana
              rhn-support-ldavidde Luca Davidde
              Yosi Ben Shimon Yosi Ben Shimon
              rhos-storage-cinder
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: