Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-13142

'openstack server rescue' failures on boot-from-volume instances

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • rhos-17.1.z
    • openstack-nova
    • Moderate

      Description of problem:

      For a boot-from-volume instances, 'openstack server rescue <vm> --image <image>' fails with the following issues:

      1. It attempts to attach two disks: <instance_uuid>_disk & <instance_uuid>_disk.rescue. Only, <instance_uuid>_disk.rescue is created so it fails with the following error:

      2024-01-23 16:32:14.338 2 ERROR oslo_messaging.rpc.server nova.exception.InstanceNotRescuable: Instance dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2 cannot be rescued: Driver Error: internal error: process exited while connecting to monitor: 2024-01-23T16:32:13.017966Z qemu-kvm: -blockdev {"driver":"rbd","pool":"vms","image":"dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk","server":[

      {"host":"172.16.1.100","port":"6789"}

      ],"user":"openstack","auth-client-required":["cephx","none"],"key-secret":"libvirt-1-storage-auth-secret0","node-name":"libvirt-1-storage","cache":

      {"direct":false,"no-flush":false}

      ,"auto-read-only":true,"discard":"unmap"}: error reading header from dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk: No such file or directory

      If you look in ceph, only the .rescue image exists.

      [root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack -p vms ls -l
      NAME SIZE PARENT FMT PROT LOCK
      dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue 10 GiB 2 excl

      However we see the instance configured with both disks.

      [root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000003
      Target Source
      ----------------------------------------------------------------
      vda vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue
      vdb vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk

      If I manually copy, the UUID_disk.rescue to UUID_disk, the instance will boot into RESCUE mode. It seems the UUID_disk volume is not needed and should not be configured in this RESCUE situation.

      2. The RESCUED instance doesn't attach the cinder root volume. The cinder root also doesnt re-attach after "unrescuing" the instance.

      Reproducer:

      $ openstack volume create --size 10 --image rhel8 rootvol1

      $ openstack volume list
      ----------------------------------------------------------------------

      ID Name Status Size Attached to

      ----------------------------------------------------------------------

      f855dfe6-ad5a-4497-87ff-16ac5856f596 rootvol1 available 10  

      ----------------------------------------------------------------------

      $ openstack server create --key-name default --flavor rhel --volume rootvol1 --network external test1

      $ openstack server show test1 -c status -c image -c volumes_attached
      ------------------------------------------------------------------------------------------+

      Field Value

      ------------------------------------------------------------------------------------------+

      image N/A (booted from volume)
      status ACTIVE
      volumes_attached delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'

      ------------------------------------------------------------------------------------------+

      $ openstack server rescue test1 --image rhel8

      $ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
      -----------------------------------------------------------------------------------------------------------------------------------------------------------+

      Field Value

      -----------------------------------------------------------------------------------------------------------------------------------------------------------+

      fault {'code': 400, 'created': '2024-01-23T20:12:17Z', 'message': 'Instance ac3d46c0-c8d5-45df-bd17-d467baaa5a98 cannot be rescued: Driver
        Error: internal error: process exited while connecting to monitor: 2024-01-23T20:12:17.612453Z qemu-kvm: -blockdev
        {"driver":"rbd","pool":"vms","image":"ac3d46c0-c8d5-45df-bd17-d467ba'}
      image N/A (booted from volume)
      status ERROR
      volumes_attached delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'

      -----------------------------------------------------------------------------------------------------------------------------------------------------------+

      [root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
      Target Source
      ----------------------------------------------------------------
      vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
      vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

      [root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack -p vms ls -l
      NAME SIZE PARENT FMT PROT LOCK
      ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue 10 GiB 2

      NOTE: here if manually create the _disk volume, the instance will boot into rescue mode; however, the cinder volume is not attached.

      [root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud rbd --id openstack cp vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
      Image copy: 100% complete...done.

      RESCUE now completes and the instance is accessible (without cinder root vol attached).

      $ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
      ------------------------------------------------------------------------------------------+

      Field Value

      ------------------------------------------------------------------------------------------+

      image N/A (booted from volume)
      status RESCUE
      volumes_attached delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'

      ------------------------------------------------------------------------------------------+

      volume still shows in-use

      $ openstack volume list
      --------------------------------------------------------------------------------------

      ID Name Status Size Attached to

      --------------------------------------------------------------------------------------

      f855dfe6-ad5a-4497-87ff-16ac5856f596 rootvol1 in-use 10 Attached to test1 on /dev/vda

      --------------------------------------------------------------------------------------

      But not attached.

      [root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
      Target Source
      ----------------------------------------------------------------
      vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
      vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

      The other ugly thing, the unrescue does not revert this back to original disk config.

      $ openstack server unrescue test1
      $ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
      ------------------------------------------------------------------------------------------+

      Field Value

      ------------------------------------------------------------------------------------------+

      image N/A (booted from volume)
      status ACTIVE
      volumes_attached delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'

      ------------------------------------------------------------------------------------------+

      The above looks good, but the instance is still booted on rescue disks.

      [root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
      Target Source
      ----------------------------------------------------------------
      vda vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
      vdb vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

      A hard reboot will fix it:

      $ openstack server reboot --hard test1

      Now the instance is back to boot from vol:

      [root@overcloud-novacompute-0 ~]# podman exec -ti nova_virtqemud virsh domblklist instance-00000004
      Target Source
      ---------------------------------------------------------------
      vda volumes/volume-f855dfe6-ad5a-4497-87ff-16ac5856f596

      Version-Release number of selected component (if applicable):
      OSP 17.1

      How reproducible:
      100%

      Steps to Reproduce:
      1. See above
      2.
      3.

              Unassigned Unassigned
              jira-bugzilla-migration RH Bugzilla Integration
              RH Bugzilla Integration RH Bugzilla Integration
              rhos-workloads-compute
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: