-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
rhos-17.1.z
Description of problem:
For a boot-from-volume instances, 'openstack server rescue <vm> --image <image>' fails with the following issues:
1. It attempts to attach two disks: <instance_uuid>_disk & <instance_uuid>_disk.rescue. Only, <instance_uuid>_disk.rescue is created so it fails with the following error:
2024-01-23 16:32:14.338 2 ERROR oslo_messaging.rpc.server nova.exception.InstanceNotRescuable: Instance dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2 cannot be rescued: Driver Error: internal error: process exited while connecting to monitor: 2024-01-23T16:32:13.017966Z qemu-kvm: -blockdev {"driver":"rbd","pool":"vms","image":"dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk","server":[
{"host":"172.16.1.100","port":"6789"}],"user":"openstack","auth-client-required":["cephx","none"],"key-secret":"libvirt-1-storage-auth-secret0","node-name":"libvirt-1-storage","cache":
{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: error reading header from dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk: No such file or directory
If you look in ceph, only the .rescue image exists.
[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack -p vms ls -l
NAME SIZE PARENT FMT PROT LOCK
dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue 10 GiB 2 excl
However we see the instance configured with both disks.
[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000003
Target Source
----------------------------------------------------------------
vda vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue
vdb vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk
If I manually copy, the UUID_disk.rescue to UUID_disk, the instance will boot into RESCUE mode. It seems the UUID_disk volume is not needed and should not be configured in this RESCUE situation.
2. The RESCUED instance doesn't attach the cinder root volume. The cinder root also doesnt re-attach after "unrescuing" the instance.
Reproducer:
$ openstack volume create --size 10 --image rhel8 rootvol1
$ openstack volume list
----------------------------------------------------------------------
ID | Name | Status | Size | Attached to |
----------------------------------------------------------------------
f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | available | 10 |
----------------------------------------------------------------------
$ openstack server create --key-name default --flavor rhel --volume rootvol1 --network external test1
$ openstack server show test1 -c status -c image -c volumes_attached
------------------------------------------------------------------------------------------+
Field | Value |
------------------------------------------------------------------------------------------+
image | N/A (booted from volume) |
status | ACTIVE |
volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
------------------------------------------------------------------------------------------+
$ openstack server rescue test1 --image rhel8
$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
-----------------------------------------------------------------------------------------------------------------------------------------------------------+
Field | Value |
-----------------------------------------------------------------------------------------------------------------------------------------------------------+
fault | {'code': 400, 'created': '2024-01-23T20:12:17Z', 'message': 'Instance ac3d46c0-c8d5-45df-bd17-d467baaa5a98 cannot be rescued: Driver |
Error: internal error: process exited while connecting to monitor: 2024-01-23T20:12:17.612453Z qemu-kvm: -blockdev | |
{"driver":"rbd","pool":"vms","image":"ac3d46c0-c8d5-45df-bd17-d467ba'} | |
image | N/A (booted from volume) |
status | ERROR |
volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
-----------------------------------------------------------------------------------------------------------------------------------------------------------+
[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
Target Source
----------------------------------------------------------------
vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack -p vms ls -l
NAME SIZE PARENT FMT PROT LOCK
ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue 10 GiB 2
NOTE: here if manually create the _disk volume, the instance will boot into rescue mode; however, the cinder volume is not attached.
[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack cp vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
Image copy: 100% complete...done.
RESCUE now completes and the instance is accessible (without cinder root vol attached).
$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
------------------------------------------------------------------------------------------+
Field | Value |
------------------------------------------------------------------------------------------+
image | N/A (booted from volume) |
status | RESCUE |
volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
------------------------------------------------------------------------------------------+
volume still shows in-use
$ openstack volume list
--------------------------------------------------------------------------------------
ID | Name | Status | Size | Attached to |
--------------------------------------------------------------------------------------
f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | in-use | 10 | Attached to test1 on /dev/vda |
--------------------------------------------------------------------------------------
But not attached.
[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
Target Source
----------------------------------------------------------------
vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
The other ugly thing, the unrescue does not revert this back to original disk config.
$ openstack server unrescue test1
$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
------------------------------------------------------------------------------------------+
Field | Value |
------------------------------------------------------------------------------------------+
image | N/A (booted from volume) |
status | ACTIVE |
volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
------------------------------------------------------------------------------------------+
The above looks good, but the instance is still booted on rescue disks.
[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
Target Source
----------------------------------------------------------------
vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
A hard reboot will fix it:
$ openstack server reboot --hard test1
Now the instance is back to boot from vol:
[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
Target Source
---------------------------------------------------------------
vda volumes/volume-f855dfe6-ad5a-4497-87ff-16ac5856f596
Version-Release number of selected component (if applicable):
OSP 17.1
How reproducible:
100%
Steps to Reproduce:
1. See above
2.
3.