Loading...

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: rhos-17.1.5
Affects Version/s: None
Component/s: openstack-cinder
Labels:
None

Story Points:
8
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
Fixed in Build:
python-os-brick-4.3.4-17.1.20250219181012.cf69f92.el9ost
Regression:
None
Intelligence Requested:
Market:
Errata Link:
https://errata.engineering.redhat.com/advisory/146011

Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

To Reproduce Steps to reproduce the behavior:

While trying to attach volumes[1] in the available state to an instance it fails.

[1]
8eaee081-f606-471f-a231-ea33223a2043
02e40e88-1710-4384-be2a-604b90df0f35

Instance_uuid: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3

Sos file: 0160-sosreport-bbs-p-rhosc-compute19-2025-02-04-vrscpps.tar.xz/sosreport-bbs-p-rhosc-compute19-2025-02-04-vrscpps/var/log/containers/nova/nova-compute.log.5 

2025-02-03 18:20:22.593 2 ERROR nova.virt.block_device [req-6be5bcb4-3f91-409b-8ba8-98fbdf2220fc fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Driver failed to attach volume 8eaee081-f606-471f-a231-ea33223a2043 at /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: multipath -C None
Exit code: 1
Stdout: ''
Stderr: '4081361.023737 | get_udev_device: failed to look up None with type 3\n'
 
----------

2025-02-03 19:33:25.283 2 ERROR nova.virt.block_device [req-9d14a11f-fa49-484c-a1f5-a6dc909f9e5c fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Driver failed to attach volume 02e40e88-1710-4384-be2a-604b90df0f35 at /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: multipath -C None
Exit code: 1
Stdout: ''
Stderr: '4085743.714703 | get_udev_device: failed to look up None with type 3\n'

Expected behavior

Volumes in the available state should get attach to the instance seamlessly.

Device Info (please complete the following information):

Rhosp-17.1.4
Hitachi Fibre Channel storage

Known workaround

As the instance booted with the boot volume, we were able to migrate that to different compute node. Post that attaching of additional volumes did help
Came across OSPRH-12848 and downgraded nova-compute version corresponding to 17.1.3 and no further issues are observed.

Additional context

DB informattion: This was captured after the issue happened.

MariaDB [nova]> select * from block_device_mapping  where attachment_id='e8b567df-7f49-4f40-8197-4842b52b4fbd' \G
*************************** 1. row ***************************
           created_at: 2024-11-10 13:55:01
           updated_at: 2025-02-01 12:44:42
           deleted_at: NULL
                   id: 30800
          device_name: /dev/vdc
delete_on_termination: 0
          snapshot_id: NULL
            volume_id: 02e40e88-1710-4384-be2a-604b90df0f35
          volume_size: 70
            no_device: NULL
      connection_info: {"driver_volume_type": "fibre_channel", "data": {"target_wwn": ["50060e80089d3c03", "50060e80089d3c13"], "target_discovered": false, "target_lun": 26, "qos_specs": null, "access_mode": "rw", "encrypted": false, "cacheable": false, "multipath_id": "360060e80089d3c0000509d3c0000034f"}, "status": "reserved", "instance": "1c9c7d87-ad65-40bd-b68d-9a578ac205c3", "attached_at": "", "detached_at": "", "volume_id": "02e40e88-1710-4384-be2a-604b90df0f35", "serial": "02e40e88-1710-4384-be2a-604b90df0f35"}
        instance_uuid: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3
              deleted: 0
          source_type: volume
     destination_type: volume
         guest_format: NULL
          device_type: disk
             disk_bus: virtio
           boot_index: NULL
             image_id: NULL
                  tag: NULL
        attachment_id: e8b567df-7f49-4f40-8197-4842b52b4fbd
                 uuid: 939359ec-d229-4ddd-a4fb-ef7d04b65dd8
          volume_type: NULL
1 row in set (0.000 sec)

MariaDB [nova]> select * from block_device_mapping  where attachment_id='eefcecef-3f18-4f6a-949a-3bc2f2c3b747' \G
*************************** 1. row ***************************
           created_at: 2024-10-14 10:07:53
           updated_at: 2025-02-01 12:42:16
           deleted_at: NULL
                   id: 30435
          device_name: /dev/vdb
delete_on_termination: 0
          snapshot_id: NULL
            volume_id: 8eaee081-f606-471f-a231-ea33223a2043
          volume_size: 500
            no_device: NULL
      connection_info: {"driver_volume_type": "fibre_channel", "data": {"target_wwn": ["50060e80089d3c03", "50060e80089d3c13"], "target_discovered": false, "target_lun": 24, "qos_specs": null, "access_mode": "rw", "encrypted": false, "cacheable": false, "multipath_id": "360060e80089d3c0000509d3c00000319"}, "status": "reserved", "instance": "1c9c7d87-ad65-40bd-b68d-9a578ac205c3", "attached_at": "", "detached_at": "", "volume_id": "8eaee081-f606-471f-a231-ea33223a2043", "serial": "8eaee081-f606-471f-a231-ea33223a2043"}
        instance_uuid: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3
              deleted: 0
          source_type: volume
     destination_type: volume
         guest_format: NULL
          device_type: disk
             disk_bus: virtio
           boot_index: NULL
             image_id: NULL
                  tag: NULL
        attachment_id: eefcecef-3f18-4f6a-949a-3bc2f2c3b747
                 uuid: 20f734df-fce9-4af2-bf7b-3a2907ed6756
          volume_type: NULL
1 row in set (0.002 sec)MariaDB [nova]>

Vol_id:02e40e88-1710-4384-be2a-604b90df0f35:
While attaching this volume, command executed successfully, but `nova volume-attachments ` did not show the attachment.

2025-02-03 18:12:40.595 2 INFO nova.compute.manager [req-3a653602-3bd2-4a82-91ba-32f7f9cd8462 fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Attaching volume 02e40e88-1710-4384-be2a-604b90df0f35 to /dev/vdb
2025-02-03 18:12:40.601 2 DEBUG nova.scheduler.client.report [req-91387c75-45c0-4b0f-aa32-7126cdc4177d fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Inventory has not changed for provider ee392e34-58d2-44df-a1e5-1b192ff61713 based on inventory data: {'VCPU': {'total': 128, 'reserved': 0, 'min_unit': 1, 'max_unit': 128, 'step_size': 1, 'allocation_ratio': 3.0}, 'MEMORY_MB': {'total': 1547604, 'reserved': 4096, 'min_unit': 1, 'max_unit': 1547604, 'step_size': 1, 'allocation_ratio': 1.0}, 'DISK_GB': {'total': 1488, 'reserved': 0, 'min_unit': 1, 'max_unit': 1488, 'step_size': 1, 'allocation_ratio': 1.0}} set_inventory_for_provider /usr/lib/python3.9/site-packages/nova/scheduler/client/report.py:940
2025-02-03 18:12:40.601 2 DEBUG oslo_concurrency.lockutils [req-91387c75-45c0-4b0f-aa32-7126cdc4177d fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Lock "compute_resources" released by "nova.compute.resource_tracker.ResourceTracker.update_usage" :: held 0.029s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:367
2025-02-03 18:12:40.602 2 INFO nova.compute.manager [req-91387c75-45c0-4b0f-aa32-7126cdc4177d fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Successfully reverted task state from None on failure for instance.
2025-02-03 18:12:40.606 2 ERROR oslo_messaging.rpc.server [req-91387c75-45c0-4b0f-aa32-7126cdc4177d fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Exception during message handling: nova.exception.InvalidVolume: Invalid volume: volume 02e40e88-1710-4384-be2a-604b90df0f35 already attached
2025-02-03 18:12:40.606 2 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2025-02-03 18:12:40.606 2 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
 
....
....
....
2025-02-03 19:34:40.566 2 DEBUG oslo_service.periodic_task [req-3975ca65-014c-45ce-a572-60c8b31bd083 - - - - -] Running periodic task ComputeManager._poll_rescued_instances run_periodic_tasks /usr/lib/python3.9/site-packages/oslo_service/periodic_task.py:210
2025-02-03 19:34:40.999 2 ERROR nova.compute.manager [req-9d14a11f-fa49-484c-a1f5-a6dc909f9e5c fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Failed to attach 02e40e88-1710-4384-be2a-604b90df0f35 at /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: multipath -C None
Exit code: 1
Stdout: ''
Stderr: '4085743.714703 | get_udev_device: failed to look up None with type 3\n'
2025-02-03 19:34:40.999 2 ERROR nova.compute.manager [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3] Traceback (most recent call last):
2025-02-03 19:34:40.999 2 ERROR nova.compute.manager [instance: 1c9c7d87-ad65-40bd-b68d-9a578ac205c3]   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 7141, in _attach_volume

Vol_id: 8eaee081-f606-471f-a231-ea33223a2043
The volume does not seems to be present, but it fails with the 'multipath -C none' error too.

2025-02-03 19:30:28.287 2 DEBUG os_brick.initiator.linuxscsi [req-bafc0c7d-9413-4b8e-8e48-8c6c781096fd fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Checking to see if /dev/mapper/360060e80089d3c0000509d3c00000319 exists yet. wait_for_path /usr/lib/python3.9/site-packages/os_brick/initiator/linuxscsi.py:381
2025-02-03 19:30:28.288 2 DEBUG os_brick.initiator.linuxscsi [req-bafc0c7d-9413-4b8e-8e48-8c6c781096fd fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] /dev/mapper/360060e80089d3c0000509d3c00000319 doesn't exists yet. wait_for_path /usr/lib/python3.9/site-packages/os_brick/initiator/linuxscsi.py:384
2025-02-03 19:30:28.288 2 DEBUG os_brick.utils [req-bafc0c7d-9413-4b8e-8e48-8c6c781096fd fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Retrying os_brick.initiator.linuxscsi.LinuxSCSI.wait_for_path in 2.0 seconds as it raised VolumeDeviceNotFound: Volume device not found at /dev/mapper/360060e80089d3c0000509d3c00000319.. log_it /usr/lib/python3.9/site-packages/tenacity/before_sleep.py:40 
...
...
2025-02-03 19:43:09.299 2 DEBUG os_brick.initiator.linuxscsi [req-fd8f2240-03fa-4f74-83cf-cb14ecc3a358 fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] Checking to see if /dev/mapper/360060e80089d3c0000509d3c00000319 exists yet. wait_for_path /usr/lib/python3.9/site-packages/os_brick/initiator/linuxscsi.py:381
2025-02-03 19:43:09.300 2 DEBUG os_brick.initiator.linuxscsi [req-fd8f2240-03fa-4f74-83cf-cb14ecc3a358 fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] /dev/mapper/360060e80089d3c0000509d3c00000319 doesn't exists yet. wait_for_path /usr/lib/python3.9/site-packages/os_brick/initiator/linuxscsi.py:384
2025-02-03 19:43:09.300 2 WARNING os_brick.initiator.linuxscsi [req-fd8f2240-03fa-4f74-83cf-cb14ecc3a358 fa99d7dc5aa747e196dcee1d8b6cd114 119ad3bc77f84d2ca237bd82d40b86e6 - default default] couldn't find a valid multipath device path for 360060e80089d3c0000509d3c00000319

Background of the issue:

The instanceHA node evacuated instances due to the unexpected reboot of compute12 and it landed in compute19. Disk was in compute-19 , but the node+host+network was pointing to compute-12 which got corrected by making DB modification.

Unmapped additional disks before correcting this using [openstack server remove volume..]

The instance went to emergency mode as the additional disks were not attached. Further attach of additional disks create this issue and followed steps in the workaround to bring instances back online.

Additional info:

The sosreport : 0160-sosreport-bbs-p-rhosc-compute19-2025-02-04-vrscpps.tar.xz has the error i pointed above.
File : 0080-Compute12-Instance1c9c7d87-ad65-40bd-b68d-9a578ac205c3.txt has information captured as soon as

links to

RHBA-2025:3478 Red Hat OpenStack Platform 17.1.5 bug fix and enhancement advisory

mentioned on

Merge request - FC: Fix multipath wait issue with slow connections

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty