Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-6711

Restoring does not include all objects - Investigation.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • velero
    • None
    • Incidents & Support
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      The customer is experiencing intermittent failures during cutover migration from OpenShift to AWS, resulting in missed deadlines, business impact, increased infrastructure costs, and potential reputation damage. The issue affects the entire migration program and is not limited to a specific application.During migrations we have found sometimes migmigration does not migrate all project objects. In most cases it miss to migrate route objects. We do then rollback in MTC, do Cutover again and then it copies all objects. Sometimes, few rollbacks are required for MTC to pick up all objects. 

      At first, while cutover migration, MTC is migrating all resources except deployment and routes. Analysis so far was that, according to customer when they are trying to perform cutover migration, they are getting intermittent failures. 

       Customer tested multiple scenarios where they have tried to either migrate deployment or route CR but in both scenarios, in the backend storage they are not able to locate the CRs.

       

      Example of issue:
      backup created for first try where route restore was failed,
      "kind": "Backup",
      "apiVersion": "velero.io/v1",
      "metadata": {
      "name": "mm-locman-i6lm-az2-osp00-initial-r77tl",
      "namespace": "openshift-migration",

      "spec": {
      "metadata": {},
      "includedNamespaces": [
      "locman-i6lm"
      Checking backup logs for "mm-locman-i6lm-az2-osp00-initial-r77tl", we could see that the route backup was taken successfully,
      0140-mp-locman-i6lm-az2-osp00.zip/mp-locman-i6lm-az2-osp00/s3/cutover_no_route/Backup-mm-locman-i6lm-az2-osp00-initial-r77tl/mm-locman-i6lm-az2-osp00-initial-r77tl-logs
      {{time="2025-01-22T05:15:48Z" level=info msg="Getting items for group" backup=openshift-migration/mm-locman-i6lm-az2-osp00-initial-r77tl group=route.openshift.io/v1 logSource="/remote-source/velero/app/pkg/backup/item_collector.go:105"
      time="2025-01-22T05:15:48Z" level=info msg="Getting items for resource" backup=openshift-migration/mm-locman-i6lm-az2-osp00-initial-r77tl group=route.openshift.io/v1 logSource="/remote-source/velero/app/pkg/backup/item_collector.go:196" resource=routes
      time="2025-01-22T05:15:48Z" level=info msg="Listing items" backup=openshift-migration/mm-locman-i6lm-az2-osp00-initial-r77tl group=route.openshift.io/v1 logSource="/remote-source/velero/app/pkg/backup/item_collector.go:322" namespace=locman-i6lm resource=routes
      time="2025-01-22T05:15:48Z" level=info msg="list for groupResource routes.route.openshift.io was not paginated" backup=openshift-migration/mm-locman-i6lm-az2-osp00-initial-r77tl logSource="/remote-source/velero/app/pkg/backup/item_collector.go:495"
      time="2025-01-22T05:15:48Z" level=info msg="Retrieved 1 items" backup=openshift-migration/mm-locman-i6lm-az2-osp00-initial-r77tl group=route.openshift.io/v1 logSource="/remote-source/velero/app/pkg/backup/item_collector.go:353" namespace=locman-i6lm resource=routes
      [..]
      time="2025-01-22T05:15:56Z" level=info msg="Processing item" backup=openshift-migration/mm-locman-i6lm-az2-osp00-initial-r77tl logSource="/remote-source/velero/app/pkg/backup/backup.go:380" name=locman namespace=locman-i6lm progress= resource=routes.route.openshift.io
      time="2025-01-22T05:15:56Z" level=info msg="Backing up item" backup=openshift-migration/mm-locman-i6lm-az2-osp00-initial-r77tl logSource="/remote-source/velero/app/pkg/backup/item_backupper.go:177" name=locman namespace=locman-i6lm resource=routes.route.openshift.io
      time="2025-01-22T05:15:56Z" level=info msg="Backed up 48 items out of an estimated total of 48 (estimate will change throughout the backup)" backup=openshift-migration/mm-locman-i6lm-az2-osp00-initial-r77tl logSource="/remote-source/velero/app/pkg/backup/backup.go:420" name=locman namespace=locman-i6lm progress= resource=routes.route.openshift.io}}
      But in the restore logs, we could not see any logs or errors for route object and restore completed for 46 items,
      0140-mp-locman-i6lm-az2-osp00.zip/mp-locman-i6lm-az2-osp00/s3/cutover_no_route/Restore-mm-locman-i6lm-az2-osp00-final-9rd6q/restore-mm-locman-i6lm-az2-osp00-final-9rd6q-log
      time="2025-01-22T05:19:39Z" level=info msg="Restored 46 items out of an estimated total of 46 (estimate will change throughout the restore)" logSource="/remote-source/velero

      [ MigPlan "mp-price-plc-dk1-s1pm-az2-osp00" ]

      {{{
      "kind": "Backup",
      "metadata": {
      "name": "mm-price-plc-dk1-s1pm-az2-osp00-initial-8q9dz",
      "namespace": "openshift-migration",

      "spec": {
      "metadata": {},
      "includedNamespaces": [
      "price-plc-dk1-s1pm" }}
      Checking backup logs for "mm-price-plc-dk1-s1pm-az2-osp00-initial-8q9dz", we could see that the route backup was taken successfully, {{time="2025-01-22T05:23:05Z" level=info msg="Processing item" backup=openshift-migration/mm-price-plc-dk1-s1pm-az2-osp00-initial-8q9dz logSource="/remote-source/velero/app/pkg/backup/backup.go:380" name=price-plc namespace=price-plc-dk1-s1pm progress= resource=routes.route.openshift.io
      time="2025-01-22T05:23:05Z" level=info msg="Backing up item" backup=openshift-migration/mm-price-plc-dk1-s1pm-az2-osp00-initial-8q9dz logSource="/remote-source/velero/app/pkg/backup/item_backupper.go:177" name=price-plc namespace=price-plc-dk1-s1pm resource=routes.route.openshift.io
      time="2025-01-22T05:23:05Z" level=info msg="Backed up 51 items out of an estimated total of 51 (estimate will change throughout the backup)" backup=openshift-migration/mm-price-plc-dk1-s1pm-az2-osp00-initial-8q9dz logSource="/remote-source/velero/app/pkg/backup/backup.go:420" name=price-plc namespace=price-plc-dk1-s1pm progress= resource=routes.route.openshift.io}}

      But in respective restore log, any logs or errors for route object is not present,
      Actions done:

      • Checked backup and restore logs to verify if the resources are backed up or not. CRs are getting backed up and it also got restored, so I asked cu to check manually in the backend storage but  resources were not present.
      • Post upgrading the OADP operator to 1.4, and with the shared logs engineering didn't find anything suspicious, even with debug level
      • Perform migration with Velero debug logging and observe some objects are not migrated to the target cluster 
      • Collect the following logs right after that: 
        • Destination cluster must-gather with API audit logs: oc adm must-gather – '/usr/bin/gather && /usr/bin/gather_audit_logs'
        • [-] Destination application namespace inspect: oc adm inspect ns/<namespace>
        • [-] Migration Toolkit and Velero logs: [-] oc adm must-gather --image=registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:<version>
        • [-] velero logs to capture either with 'oc' command or from log collector.

      All logs are available in supportshell: 04012096

      Slack with engineering:  #danske-bank-cap-481-04012096-unable-to-migrate-all-objects-at-once

      Escalation: https://issues.redhat.com/browse/CAP-481

       

              wnstb Wes Hayutin
              rhn-support-dahernan David Hernandez Fernandez
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: