Loading...

Type: Bug
Resolution: Won't Do
Priority: Normal
Fix Version/s: None
Affects Version/s: rhos-17.1.z
Component/s: openstack-nova
Labels:
- Triaged

Story Points:
0
Epic Link:
Compute Closed Valid Bugs
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Regression:
None
Intelligence Requested:
Market:

Severity:
Moderate

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

For a boot-from-volume instances, 'openstack server rescue <vm> --image <image>' fails with the following issues:

1. It attempts to attach two disks: <instance_uuid>_disk & <instance_uuid>_disk.rescue. Only, <instance_uuid>_disk.rescue is created so it fails with the following error:

2024-01-23 16:32:14.338 2 ERROR oslo_messaging.rpc.server nova.exception.InstanceNotRescuable: Instance dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2 cannot be rescued: Driver Error: internal error: process exited while connecting to monitor: 2024-01-23T16:32:13.017966Z qemu-kvm: -blockdev {"driver":"rbd","pool":"vms","image":"dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk","server":[

{"host":"172.16.1.100","port":"6789"}

],"user":"openstack","auth-client-required":["cephx","none"],"key-secret":"libvirt-1-storage-auth-secret0","node-name":"libvirt-1-storage","cache":

{"direct":false,"no-flush":false}

,"auto-read-only":true,"discard":"unmap"}: error reading header from dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk: No such file or directory

If you look in ceph, only the .rescue image exists.

[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack -p vms ls -l
NAME SIZE PARENT FMT PROT LOCK
dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue 10 GiB 2 excl

However we see the instance configured with both disks.

[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000003
Target Source
----------------------------------------------------------------
vda vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue
vdb vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk

If I manually copy, the UUID_disk.rescue to UUID_disk, the instance will boot into RESCUE mode. It seems the UUID_disk volume is not needed and should not be configured in this RESCUE situation.

2. The RESCUED instance doesn't attach the cinder root volume. The cinder root also doesnt re-attach after "unrescuing" the instance.

Reproducer:

$ openstack volume create --size 10 --image rhel8 rootvol1

$ openstack volume list
----------------------------------------------------------------------

ID

Name

Status

Size

Attached to

----------------------------------------------------------------------

f855dfe6-ad5a-4497-87ff-16ac5856f596

rootvol1

available

10

----------------------------------------------------------------------

$ openstack server create --key-name default --flavor rhel --volume rootvol1 --network external test1

$ openstack server show test1 -c status -c image -c volumes_attached
------------------------------------------------------------------------------------------+

Field

Value

------------------------------------------------------------------------------------------+

image	N/A (booted from volume)
status	ACTIVE
volumes_attached	delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'

------------------------------------------------------------------------------------------+

$ openstack server rescue test1 --image rhel8

$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
-----------------------------------------------------------------------------------------------------------------------------------------------------------+

Field

Value

-----------------------------------------------------------------------------------------------------------------------------------------------------------+

fault	{'code': 400, 'created': '2024-01-23T20:12:17Z', 'message': 'Instance ac3d46c0-c8d5-45df-bd17-d467baaa5a98 cannot be rescued: Driver
	Error: internal error: process exited while connecting to monitor: 2024-01-23T20:12:17.612453Z qemu-kvm: -blockdev
	{"driver":"rbd","pool":"vms","image":"ac3d46c0-c8d5-45df-bd17-d467ba'}
image	N/A (booted from volume)
status	ERROR
volumes_attached	delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'

-----------------------------------------------------------------------------------------------------------------------------------------------------------+

[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
Target Source
----------------------------------------------------------------
vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack -p vms ls -l
NAME SIZE PARENT FMT PROT LOCK
ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue 10 GiB 2

NOTE: here if manually create the _disk volume, the instance will boot into rescue mode; however, the cinder volume is not attached.

[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack cp vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
Image copy: 100% complete...done.

RESCUE now completes and the instance is accessible (without cinder root vol attached).

$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
------------------------------------------------------------------------------------------+

Field

Value

------------------------------------------------------------------------------------------+

image	N/A (booted from volume)
status	RESCUE
volumes_attached	delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'

------------------------------------------------------------------------------------------+

volume still shows in-use

$ openstack volume list
--------------------------------------------------------------------------------------

ID

Name

Status

Size

Attached to

--------------------------------------------------------------------------------------

f855dfe6-ad5a-4497-87ff-16ac5856f596

rootvol1

in-use

10

Attached to test1 on /dev/vda

--------------------------------------------------------------------------------------

But not attached.

[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
Target Source
----------------------------------------------------------------
vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

The other ugly thing, the unrescue does not revert this back to original disk config.

$ openstack server unrescue test1
$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
------------------------------------------------------------------------------------------+

Field

Value

------------------------------------------------------------------------------------------+

image	N/A (booted from volume)
status	ACTIVE
volumes_attached	delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'

------------------------------------------------------------------------------------------+

The above looks good, but the instance is still booted on rescue disks.

[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
Target Source
----------------------------------------------------------------
vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

A hard reboot will fix it:

$ openstack server reboot --hard test1

Now the instance is back to boot from vol:

[root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
Target Source
---------------------------------------------------------------
vda volumes/volume-f855dfe6-ad5a-4497-87ff-16ac5856f596

Version-Release number of selected component (if applicable):
OSP 17.1

How reproducible:
100%

Steps to Reproduce:
1. See above
2.
3.

links to

Stable rescue fails when necessary image properties not set

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty