-
Bug
-
Resolution: Done-Errata
-
Critical
-
rhos-18.0.6
-
8
-
False
-
-
False
-
?
-
python-os-brick-6.2.5-18.0.20250519184829.7ad0ed6.el9ost
-
None
-
-
Bug Fix
-
Done
-
-
-
Pending Verification
-
1
-
Critical
To Reproduce Steps to reproduce the behavior:
- Create a instance
- Create a volume
- attach volume to the instance
- See error
Expected behavior
- attach the volumes
Env Info :
- Cinder backed by Netapp fc storage
Hello, this customer is facing issues while attaching volumes backed by netapp to instances.
From nova point of view we can see
2025-05-18 11:25:05.605 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x204bd039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222 2025-05-18 11:25:05.605 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x2056d039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222 2025-05-18 11:25:05.605 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x204cd039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222 2025-05-18 11:25:05.605 2 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry. Try number: 1. 2025-05-18 11:25:05.605 2 DEBUG os_brick.initiator.linuxfc [-] Rescaning HBAs 2025-05-18 11:25:07.607 2 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry. Try number: 2. 2025-05-18 11:25:07.607 2 DEBUG os_brick.initiator.linuxfc [-] Rescaning HBAs 2025-05-18 11:25:09.610 2 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry. Try number: 3. 2025-05-18 11:25:09.610 2 DEBUG os_brick.initiator.linuxfc [-] Rescaning HBAs 2025-05-18 11:25:11.606 2 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry. Try number: 4. 2025-05-18 11:25:11.606 2 DEBUG os_brick.initiator.linuxfc [-] Rescaning HBAs 2025-05-18 11:25:13.608 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x204bd039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222 2025-05-18 11:25:13.609 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x2056d039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222 2025-05-18 11:25:13.609 2 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:98:00.0-fc-0x204cd039eac514a1-lun-4 _wait_for_device_discovery /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py:222 2025-05-18 11:25:13.609 2 ERROR os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not found. 2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'os_brick.initiator.connectors.fibre_channel.FibreChannelConnector.connect_volume.<locals>._wait_for_device_discovery' failed: os_brick.exception.NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device. 2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall Traceback (most recent call last): 2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall File "/usr/lib/python3.9/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall File "/usr/lib/python3.9/site-packages/os_brick/initiator/connectors/fibre_channel.py", line 233, in _wait_for_device_discovery 2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall raise exception.NoFibreChannelVolumeDeviceFound() 2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall os_brick.exception.NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device. 2025-05-18 11:25:13.609 2 ERROR oslo.service.loopingcall
while from cinder
2025-05-18 11:25:03.464 23 DEBUG cinder.volume.drivers.netapp.dataontap.block_base [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Mapped LUN volume-383016d9-265d-411f-a277-49f51f09d525 to the initiator(s) ['10:00:70:b7:e4:28:56:31', '10:00:70:b7:e4:28:56:30'] initialize_connection_fc /usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_base.py:1005 2025-05-18 11:25:03.488 23 DEBUG cinder.volume.drivers.netapp.dataontap.block_base [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Successfully fetched target details for LUN volume-383016d9-265d-411f-a277-49f51f09d525 and initiator(s) ['10:00:70:b7:e4:28:56:31', '10:00:70:b7:e4:28:56:30'] initialize_connection_fc /usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_base.py:1012 2025-05-18 11:25:03.488 23 DEBUG cinder.zonemanager.utils [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Zoning mode: None. create_zone_manager /usr/lib/python3.9/site-packages/cinder/zonemanager/utils.py:33 2025-05-18 11:25:03.489 23 DEBUG cinder.zonemanager.utils [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] FC Zone Manager not enabled in cinder.conf. create_zone_manager /usr/lib/python3.9/site-packages/cinder/zonemanager/utils.py:50 2025-05-18 11:25:03.518 23 DEBUG cinder.volume.manager [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Connection info returned from driver {'target_discovered': True, 'target_lun': 4, 'target_wwn': ['204dd039eac514a1', '2043d039eac514a1', '204ed039eac514a1', '2044d039eac514a1', '204fd039eac514a1', '2045d039eac514a1', '2050d039eac514a1', '2046d039eac514a1', '2051d039eac514a1', '2047d039eac514a1', '2052d039eac514a1', '2048d039eac514a1', '2053d039eac514a1', '2049d039eac514a1', '2054d039eac514a1', '204ad039eac514a1', '2055d039eac514a1', '204bd039eac514a1', '2056d039eac514a1', '204cd039eac514a1'], 'initiator_target_map': {'100070b7e4285631': ['204dd039eac514a1', '2043d039eac514a1', '204ed039eac514a1', '2044d039eac514a1', '204fd039eac514a1', '2045d039eac514a1', '2050d039eac514a1', '2046d039eac514a1', '2051d039eac514a1', '2047d039eac514a1', '2052d039eac514a1', '2048d039eac514a1', '2053d039eac514a1', '2049d039eac514a1', '2054d039eac514a1', '204ad039eac514a1', '2055d039eac514a1', '204bd039eac514a1', '2056d039eac514a1', '204cd039eac514a1'], '100070b7e4285630': ['204dd039eac514a1', '2043d039eac514a1', '204ed039eac514a1', '2044d039eac514a1', '204fd039eac514a1', '2045d039eac514a1', '2050d039eac514a1', '2046d039eac514a1', '2051d039eac514a1', '2047d039eac514a1', '2052d039eac514a1', '2048d039eac514a1', '2053d039eac514a1', '2049d039eac514a1', '2054d039eac514a1', '204ad039eac514a1', '2055d039eac514a1', '204bd039eac514a1', '2056d039eac514a1', '204cd039eac514a1']}, 'qos_specs': None, 'access_mode': 'rw', 'encrypted': False, 'cacheable': False, 'driver_volume_type': 'fibre_channel', 'attachment_id': '67e853b0-f3d2-462c-bf36-2f88f1b11852'} _connection_create /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4694 2025-05-18 11:25:03.584 23 INFO cinder.volume.manager [req-20b6f8d2-2619-4381-8a5a-134b0505fc4c b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] attachment_update completed successfully. 2025-05-18 11:25:06.461 29 DEBUG oslo_service.periodic_task [req-5b8752cc-5bf7-4c5a-bba9-7fe70ae6c208 - - - - -] Running periodic task VolumeManager.publish_service_capabilities run_periodic_tasks /usr/lib/python3.9/site-packages/oslo_service/periodic_task.py:210 2025-05-18 11:25:06.462 29 DEBUG cinder.volume.drivers.netapp.dataontap.block_cmode [req-5b8752cc-5bf7-4c5a-bba9-7fe70ae6c208 - - - - -] Updating volume stats _update_volume_stats /usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_cmode.py:266 2025-05-18 11:25:13.516 29 DEBUG cinder.manager [req-5b8752cc-5bf7-4c5a-bba9-7fe70ae6c208 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python3.9/site-packages/cinder/manager.py:197 2025-05-18 11:25:13.688 23 DEBUG cinder.volume.drivers.netapp.dataontap.block_base [req-42f4c571-6f34-4d05-aece-1db7c7cd5485 b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Unmapping LUN volume-383016d9-265d-411f-a277-49f51f09d525 from the initiators ['10:00:70:b7:e4:28:56:31', '10:00:70:b7:e4:28:56:30'] terminate_connection_fc /usr/lib/python3.9/site-packages/cinder/volume/drivers/netapp/dataontap/block_base.py:1049 2025-05-18 11:25:13.786 23 INFO cinder.volume.manager [req-42f4c571-6f34-4d05-aece-1db7c7cd5485 b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Terminate volume connection completed successfully. 2025-05-18 11:25:13.786 23 DEBUG cinder.volume.manager [req-42f4c571-6f34-4d05-aece-1db7c7cd5485 b44211db8a4341b79d73619a3dcab2c6 9294dd1a171648b4b5d51185081cc3b6 - - -] Deleting attachment 67e853b0-f3d2-462c-bf36-2f88f1b11852. _do_attachment_delete /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4843
Volume creation is always working, while attachment operations are randomly failing regardless of the backend used (they have 2 different in the same Netapp) and regardless the compute.
Netapp analysis of the issue:
We have been investigating an issue related to volume attachment failures in our OpenStack environment. During this process, we manually collected the ASUP logs; however, these logs did not capture the volume-related tasks executed in OpenStack. In contrast, the cinder-volume logs clearly reflect both successful and failed volume attachment attempts. In both scenarios (successful and failed), the process begins with mapping the LUN, updating the attachment, and handing over the request to Nova Compute for multipath device discovery. In successful cases, the multipath device was discovered within approximately 3 seconds. However, in failed instances, the system attempted discovery five times at 2-second intervals, ultimately failing to locate the multipath device. We are observing intermittent failures, not consistent ones. Multipath is functioning in general, but there are cases where the LUNs do not appear. Based on our findings, the potential root causes could be: • Multipathd timing or synchronization issues: In some cases, multipathd does not register new devices quickly enough, causing Nova to timeout. • udev rule conflicts or delays: OpenStack relies on os-brick for device discovery via udev and multipath. Any delay in these components can result in intermittent failures. Next Steps: 1. Increase retry count/timeouts in Nova/Cinder configuration: [libvirt] volume_attach_retry_count = 5 volume_attach_retry_interval = 5 2. Monitor udev events while attaching volumes: udevadm monitor --kernel --udev we can check if /dev/dm-* appears late or fails to appear. We recommend validating the above points to help mitigate the intermittent failures during volume attachments. Note: The issue does not appear to originate from the NetApp storage system. The LUN mapping and attachment processes are completing successfully, as confirmed in the cinder-volume logs. The problem occurs after the attachment, specifically during the device discovery phase on the OpenStack compute host. This failure is host-side, where the expected multipath device is not being detected in time by Nova/os-brick, despite the LUN being available.
sbr-storage couldn't find anything wrong related to RHEL storage components.
Actions taken:
- Bumped ** without any effect
volume_attach_retry_count volume_attach_retry_interval
- enabled SCSI debug logging and reproduced the issue
Any supporting data is available in the case linked with this request
- links to
-
RHBA-2025:152056
Release of components for RHOSO 18.0