-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
?
-
None
-
-
-
Moderate
After an upgrade from OSP16 to OSP17.1.4, customer is failing to attach persistent volumes to all instances.
nova/nova-compute.log:2025-02-03 10:01:06.538 2 ERROR nova.virt.block_device [instance: 18b1af24-39f2-4055-af9a-ef6f2b5b9faa] raise libvirtError('virDomainAttachDeviceFlags() failed')
nova/nova-compute.log:2025-02-03 10:01:06.538 2 ERROR nova.virt.block_device [instance: 18b1af24-39f2-4055-af9a-ef6f2b5b9faa] libvirt.libvirtError: Requested operation is not valid: target sdf already exists
The mentioned volume is in available state:
(oscar14) [stack@oscar14dir001 ~]$ openstack volume show 691f1a27-def8-4323-bcc9-e13b96038067 --fit +--------------------------------+----------------------------------------------------------+ | Field | Value | +--------------------------------+----------------------------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2021-11-30T12:08:11.000000 | | description | Created by OpenStack Cinder CSI driver | | encrypted | False | | id | 691f1a27-def8-4323-bcc9-e13b96038067 | | migration_status | None | | multiattach | False | | name | pvc-07fb9a16-b6a5-4fe4-93d9-38d0d18636a7 | | os-vol-host-attr:host | hostgroup@tripleo_ceph#tripleo_ceph | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | 37ec3de21f554480bfbca365bb9b849a | | properties | cinder.csi.openstack.org/cluster='prod-ocpcarc-01-qxcfk' | | replication_status | None | | size | 26 | | snapshot_id | None | | source_volid | None | | status | available | | type | tripleo | | updated_at | 2025-02-03T12:16:50.000000 | | user_id | 6acc6269aaf04b278df8e568b717eaa6 | +--------------------------------+----------------------------------------------------------+
And the output of
openstack server show
for the instance does not contain the mentioned volume:
| volumes_attached | delete_on_termination='False', id='6421414f-caf9-4b31-bdeb-a92d98b0ec93' | | | delete_on_termination='False', id='c333227a-beb4-4f85-826a-c1ce6d1a1aab' | | | delete_on_termination='False', id='6321eb81-0e29-4551-84e4-c6f6bad8ff7e' | | | delete_on_termination='False', id='a41a61ba-cde4-4dfa-90e1-4f815f26ac6f' | | | delete_on_termination='False', id='691f1a27-def8-4323-bcc9-e13b96038067' | +-------------------------------------+------------------------------------------------------------------------------------------------------------------
sdf device changed the type from `scsi` to `sata` between 16.x and 17.1.4 . There's dumpxml discrepancy. In the output of `openstack server show ..` in the OS-EXT-SRV-ATTR:root_device_name field there's no /dev/sdf, only /dev/sda. Hence nova isn't aware of /dev/sdf. Despite that, kvm is trying to get the last device on the list(/dev/sdf) as it's still part of instance definition:
sda is an ephemeral ceph disk: ~~~ <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='e3f27f3b-b76c-4505-9c35-fccd4a0e99a0'/> </auth> <source protocol='rbd' name='vms/bde2657d-e117-49dc-8f2f-42f4410132de_disk'> <host name='192.168.9.188' port='6789'/> <host name='192.168.10.207' port='6789'/> <host name='192.168.11.227' port='6789'/> </source> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> ~~~ sdf is cd-rom pointing to the ephemeral sda disk: ~~~ <disk type='network' device='cdrom'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='e3f27f3b-b76c-4505-9c35-fccd4a0e99a0'/> </auth> <source protocol='rbd' name='vms/bde2657d-e117-49dc-8f2f-42f4410132de_disk.config'> <host name='192.168.9.188' port='6789'/> <host name='192.168.10.207' port='6789'/> <host name='192.168.11.227' port='6789'/> </source> <target dev='sdf' bus='sata'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='5'/> </disk> ~~~
Sosreports with debug enabled for nova are attached to the case, as well as output of openstack server event list, server show, volume show etc.
Please note this is a prod environment.
The system where the instance is running:
[root@node-004 ~]# ls /dev/ autofs fb0 mcelog rtc0 sg0 tty tty20 tty33 tty46 tty59 uhid vcsa1 vga_arbiter block fd mem sda sg1 tty0 tty21 tty34 tty47 tty6 uinput vcsa2 vhci bsg full mqueue sda1 sg2 tty1 tty22 tty35 tty48 tty60 urandom vcsa3 vhost-net bus fuse net sdb sg3 tty10 tty23 tty36 tty49 tty61 usbmon0 vcsa4 vhost-vsock cdrom hidraw0 null sdb1 sg4 tty11 tty24 tty37 tty5 tty62 usbmon1 vcsa5 watchdog char hpet nvme-fabrics sdb2 sg5 tty12 tty25 tty38 tty50 tty63 userfaultfd vcsa6 watchdog0 console hugepages nvram sdb3 shm tty13 tty26 tty39 tty51 tty7 vcs vcsu zero core hwrng port sdb4 snapshot tty14 tty27 tty4 tty52 tty8 vcs1 vcsu1 cpu initctl ppp sdc snd tty15 tty28 tty40 tty53 tty9 vcs2 vcsu2 cpu_dma_latency input ptmx sdc1 sr0 tty16 tty29 tty41 tty54 ttyS0 vcs3 vcsu3 cuse kmsg pts sdd stderr tty17 tty3 tty42 tty55 ttyS1 vcs4 vcsu4 disk log random sdd1 stdin tty18 tty30 tty43 tty56 ttyS2 vcs5 vcsu5 dma_heap loop-control rfkill sde stdout tty19 tty31 tty44 tty57 ttyS3 vcs6 vcsu6 dri mapper rtc sde1 termination-log tty2 tty32 tty45 tty58 udmabuf vcsa vfio [root@node-004 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 450G 0 disk └─sda1 8:1 0 450G 0 part /var/lib/containers sdb 8:16 0 60G 0 disk ├─sdb1 8:17 0 1M 0 part ├─sdb2 8:18 0 127M 0 part ├─sdb3 8:19 0 384M 0 part /boot └─sdb4 8:20 0 59.5G 0 part /var /sysroot/ostree/deploy/rhcos/var /usr /etc / /sysroot sdc 8:32 0 30G 0 disk └─sdc1 8:33 0 30G 0 part /var/lib/fluentd sdd 8:48 0 30G 0 disk └─sdd1 8:49 0 30G 0 part /var/log sde 8:64 0 100G 0 disk └─sde1 8:65 0 100G 0 part /var/lib/kubelet/pods/0c389be0-4b18-4d37-8a71-babe250ac5d8/volume-subpaths/entrypoint/collector/15 /var/lib/kubelet/pods/b36876e4-acc3-4eb6-b0b9-50166908907b/volume-subpaths/pullcerts/twistlock-defender/8 /var/lib/kubelet sr0 11:0 1 492K 0 rom [root@node-004 ~]#
~~~ Warning FailedAttachVolume pod/splunk-forwarder-app-0 AttachVolume.Attach failed for volume "pvc-07fb9a16-b6a5-4fe4-93d9-38d0d18636a7" : rpc error: code = Internal desc = [ControllerPublishVolume] failed to attach volume: Volume "691f1a27-def8-4323-bcc9-e13b96038067" failed to be attached within the alloted time ~~~