Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-3139

Live-migration failure during post because of keystone unavailable

XMLWordPrintable

    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • rhos-workloads-compute
    • None
    • 1
    • Low

      Description of problem:

      Client attempted a live migration and it failed during post section with:

      The source of the problem seems to have been related to keystone:
      1064:2022-11-17 15:49:15.296 7 INFO nova.compute.resource_tracker [req-1c837c0b-3ccd-4226-8340-af1b12c6fddb - - - - -] [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Updating resource usage from migration 1976219d-a89d-48f2-893f-39d3e194a4f3
      1067:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [req-ac4e513d-a39c-452c-8df7-319d2e764095 - - - - -] [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Post live migration at destination icmlw-p1-r740-070.itpc.uk.pri.o2.com failed: oslo_messaging.rpc.client.RemoteError: Remote error: ServiceUnavailable The server is currently unavailable. Please try again at a later time.<br /><br />
      1073:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Traceback (most recent call last):
      1074:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 7579, in _post_live_migration
      1075:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] instance, block_migration, dest)
      1076:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/nova/compute/rpcapi.py", line 796, in post_live_migration_at_destination
      1077:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] instance=instance, block_migration=block_migration)
      1078:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
      1079:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] transport_options=self.transport_options)
      1080:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
      1081:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] transport_options=transport_options)
      1082:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send
      1083:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] transport_options=transport_options)
      1084:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send
      1085:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] raise result
      1086:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] oslo_messaging.rpc.client.RemoteError: Remote error: ServiceUnavailable The server is currently unavailable. Please try again at a later time.<br /><br />
      1087:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] The Keystone service is temporarily unavailable.

      I looked at keystone logs around that time on all 3 controller nodes and couldn't find anything substantial.

      Instance was stuck in "MIGRATING" state and nova.instances was pointing still to the source compute.
      We had to modify the database to point to the new compute.

      I don't know if you can find something with.
      Or if we need to enable debug mode in keystone and wait for the next occurrence of this situation.

      If you need anything please let me know.

      Thank you.

      Version-Release number of selected component (if applicable):
      OSP16.1.7
      Environment is integrated with Contrail.

      How reproducible:
      Happened once

      Steps to Reproduce:
      1. Live-migration
      2.
      3.

      Actual results:
      Live-migration failed.

      Expected results:
      Live-migration succeed

      Additional info:
      have sosreport from compute nodes and controller nodes

              rribaud@redhat.com Rene Ribaud
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-workloads-compute
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: