-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhos-17.1.11
-
None
-
Bug Tracking
-
False
-
-
False
-
?
-
rhos-ops-day1day2-edpm
-
None
-
-
-
-
Moderate
To Reproduce Steps to reproduce the behavior:
- Deploy 17.1.11 from scratch
- Create an instance:
$ openstack server create --flavor 4c4g --image rhel-9.2 --network testnet --key-name stack --security-group all-open --host overcloud-novacompute-2.enothen-kellerlab.tamlab.brq2.redhat.com --wait -c status -c OS-EXT-SRV-ATTR:host testvm-volume1 +----------------------+------------------------------------------------------------------+ | Field | Value | +----------------------+------------------------------------------------------------------+ | OS-EXT-SRV-ATTR:host | overcloud-novacompute-2.enothen-kellerlab.tamlab.brq2.redhat.com | | status | ACTIVE | +----------------------+------------------------------------------------------------------+
- Create a volume
$ openstack volume create --size 1 testvol -c id -c status +--------+--------------------------------------+ | Field | Value | +--------+--------------------------------------+ | id | e1574718-cf70-474a-bf19-4dc7e52a7d23 | | status | creating | +--------+--------------------------------------+ $ openstack volume show testvol -c status +--------+-----------+ | Field | Value | +--------+-----------+ | status | available | +--------+-----------+
- Add volume to node
$ date ; openstack server add volume testvm-volume1 testvol ; date Wed Oct 29 10:24:08 CET 2025 Wed Oct 29 10:24:13 CET 2025
- Remove volume from volume
$ date ; openstack server remove volume testvm-volume1 testvol ; date Wed Oct 29 10:24:27 CET 2025 Wed Oct 29 10:24:31 CET 2025
- Meanwhile monitor volume status in a separate window:
$ while [ True ] ;do echo "$(date): $(openstack volume show testvol -c status -f value)" ;done Wed Oct 29 10:24:09 CET 2025: available Wed Oct 29 10:24:11 CET 2025: reserved Wed Oct 29 10:24:14 CET 2025: attaching Wed Oct 29 10:24:17 CET 2025: in-use Wed Oct 29 10:24:19 CET 2025: in-use Wed Oct 29 10:24:22 CET 2025: in-use Wed Oct 29 10:24:25 CET 2025: in-use Wed Oct 29 10:24:27 CET 2025: in-use Wed Oct 29 10:24:30 CET 2025: detaching Wed Oct 29 10:24:32 CET 2025: detaching Wed Oct 29 10:24:35 CET 2025: detaching Wed Oct 29 10:24:37 CET 2025: detaching Wed Oct 29 10:24:40 CET 2025: detaching Wed Oct 29 10:24:42 CET 2025: detaching Wed Oct 29 10:24:45 CET 2025: detaching Wed Oct 29 10:24:47 CET 2025: detaching Wed Oct 29 10:24:50 CET 2025: detaching Wed Oct 29 10:24:52 CET 2025: available Wed Oct 29 10:24:54 CET 2025: available Wed Oct 29 10:24:57 CET 2025: available
- Looking into /var/log/containers/nova/nova-compute.log in the compute node where the vm is hosted, I can see the 20 seconds gap, followed by a timeout and error:
2025-10-29 10:24:32.496 2 DEBUG nova.virt.libvirt.driver [req-18478c06-71ac-4934-a0de-d53413bffafd 31a05ca62faf4aed9e7d5e38035efa19 4ef31de303ab4ea2b611301583994bc0 - default default] (1/8): Attempting to detach device vdb with device alias virtio-disk1 from instance c562b73e-239c-4192-a981-f32f4be5efd7 from the live domain config. _detach_from_live_with_retry /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:2465 2025-10-29 10:24:32.496 2 DEBUG nova.virt.libvirt.guest [req-18478c06-71ac-4934-a0de-d53413bffafd 31a05ca62faf4aed9e7d5e38035efa19 4ef31de303ab4ea2b611301583994bc0 - default default] detach device xml: <disk type="network" device="disk"> 2025-10-29 10:24:32.630 2 DEBUG nova.virt.libvirt.driver [req-18478c06-71ac-4934-a0de-d53413bffafd 31a05ca62faf4aed9e7d5e38035efa19 4ef31de303ab4ea2b611301583994bc0 - default default] Start waiting for the detach event from libvirt for device vdb with device alias virtio-disk1 for instance c562b73e-239c-4192-a981-f32f4be5efd7 _detach_from_live_and_wait_for_event /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:2541 ... <--- 20 sec gap here ... 2025-10-29 10:24:52.632 2 ERROR nova.virt.libvirt.driver [req-18478c06-71ac-4934-a0de-d53413bffafd 31a05ca62faf4aed9e7d5e38035efa19 4ef31de303ab4ea2b611301583994bc0 - default default] Waiting for libvirt event about the detach of device vdb with device alias virtio-disk1 from instance c562b73e-239c-4192-a981-f32f4be5efd7 is timed out. 2025-10-29 10:24:52.634 2 INFO nova.virt.libvirt.driver [req-18478c06-71ac-4934-a0de-d53413bffafd 31a05ca62faf4aed9e7d5e38035efa19 4ef31de303ab4ea2b611301583994bc0 - default default] Successfully detached device vdb from instance c562b73e-239c-4192-a981-f32f4be5efd7 from the live domain config.
- Restarting the nova_virtqemud container in the same compute node:
# date ; systemctl restart tripleo_nova_virtqemud ; date Wed Oct 29 10:48:26 AM CET 2025 Wed Oct 29 10:49:53 AM CET 2025
- Attempting to reproduce again:
$ date ; openstack server add volume testvm-volume1 testvol ; date Wed Oct 29 10:51:30 CET 2025 Wed Oct 29 10:51:35 CET 2025 $ date ; openstack server remove volume testvm-volume1 testvol ; date Wed Oct 29 10:51:42 CET 2025 Wed Oct 29 10:51:45 CET 2025 $ date ; openstack server add volume testvm-volume1 testvol ; date Wed Oct 29 10:51:52 CET 2025 Wed Oct 29 10:51:58 CET 2025 $ date ; openstack server remove volume testvm-volume1 testvol ; date Wed Oct 29 10:52:00 CET 2025 Wed Oct 29 10:52:03 CET 2025
- Issue is not reproduced:
$ while [ True ] ;do echo "$(date): $(openstack volume show testvol -c status -f value)" ;done Wed Oct 29 10:51:24 CET 2025: available Wed Oct 29 10:51:27 CET 2025: available Wed Oct 29 10:51:29 CET 2025: available Wed Oct 29 10:51:32 CET 2025: reserved Wed Oct 29 10:51:34 CET 2025: in-use Wed Oct 29 10:51:37 CET 2025: in-use Wed Oct 29 10:51:39 CET 2025: in-use Wed Oct 29 10:51:42 CET 2025: in-use Wed Oct 29 10:51:44 CET 2025: available Wed Oct 29 10:51:46 CET 2025: available Wed Oct 29 10:51:49 CET 2025: available Wed Oct 29 10:51:51 CET 2025: available Wed Oct 29 10:51:54 CET 2025: reserved Wed Oct 29 10:51:56 CET 2025: reserved Wed Oct 29 10:51:59 CET 2025: in-use Wed Oct 29 10:52:01 CET 2025: detaching Wed Oct 29 10:52:04 CET 2025: available Wed Oct 29 10:52:06 CET 2025: available Wed Oct 29 10:52:09 CET 2025: available Wed Oct 29 10:52:11 CET 2025: available
Expected behavior
- The volume status should transition to "available" rather quick, should not remain in "detaching" for 20 seconds
- A manual container restart after a new deployment (or scale out) should not be needed
- The error in Nova should not happen, and apparently did not happen on 17.1.9
Environment:
- RHOSP 17.1.11 (reproduced)
- Customer reports 17.1.9 does not present the same issue (not attempted to reproduce yet)
Bug impact
- Low impact apparently, the volume is anyway removed from the instance after the timeout and error
Known workaround
- After deployment or scale-out, do `systemctl restart tripleo_nova_virtqemud`, or more generally, as stack from the undercloud:
$ ansible -i ~/overcloud-deploy/overcloud/tripleo-ansible-inventory.yaml -m service -a 'name=tripleo_nova_virtqemud state=restarted' -b Compute
Additional context
- The issue does not reproduce in environments that are updated from 17.1.9 to 17.1.11
- In environments where the workaround has been applied, scaling out to a new compute allows reproducing the issue again in the scaled out node
- Logs are from my own environment, with Nova and Libvirt in Debug. Let me know if any other detail is needed.