Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-4254

Restore of VSL backups are partially failing for AWS provider

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-operator-bundle-container-1.4.0-4
    • ToDo
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • Yes

      Description of problem:

      VSL restores are failing in OADP 1.4.0 for the AWS provider. For other provider such GCP and Azure we haven't seen any failures related to VSL.  This is a regression as this tests was passing in OADP 1.3.1. 

      Attached error log below. 

      $ oc logs velero-659fb4cd5c-x7fwt | grep error
      Defaulted container "velero" out of: velero, openshift-velero-plugin (init), velero-plugin-for-aws (init)
      time="2024-06-10T10:18:24Z" level=warning msg="Failed to set default backup storage location at server start" backupStorageLocation=default error="backupstoragelocations.velero.io \"default\" not found" logSource="/remote-source/velero/app/pkg/cmd/server/server.go:492"
      time="2024-06-10T10:18:24Z" level=error msg="Current BackupStorageLocations available/unavailable/unknown: 0/0/1)" controller=backup-storage-location logSource="/remote-source/velero/app/pkg/controller/backup_storage_location_controller.go:180"
      time="2024-06-10T10:29:21Z" level=error msg="Cluster resource restore error: error executing PVAction for persistentvolumes/pvc-271d8ad0-ccb8-4037-ac85-de297e26cdeb: rpc error: code = Unknown desc = Snapshot snap-065bfc14a17157d7e is not available, err: Snapshot has empty state" logSource="/remote-source/velero/app/pkg/controller/restore_controller.go:587" restore=openshift-adp/test-restore

       

      Version-Release number of selected component (if applicable):

      oadp-operator-bundle-container-1.4.0-1
      OCP 4.16

       

      How reproducible:
      Always

       

      Steps to Reproduce:
      1. Create a DPA with snapshotLocation spec

      oc get dpa ts-dpa -o yaml
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        creationTimestamp: "2024-06-10T10:18:08Z"
        generation: 1
        name: ts-dpa
        namespace: openshift-adp
        resourceVersion: "45413"
        uid: 0b509046-e52e-4821-8c51-c007831ca863
      spec:
        backupLocations:
        - velero:
            config:
              profile: default
              region: us-east-2
            credential:
              key: cloud
              name: cloud-credentials
            default: true
            objectStorage:
              bucket: oadp83371lkrqz
              prefix: upgrade
            provider: aws
        configuration:
          velero:
            defaultPlugins:
            - openshift
            - aws
        snapshotLocations:
        - velero:
            config:
              profile: default
              region: us-east-2
            provider: aws
      status:
        conditions:
        - lastTransitionTime: "2024-06-10T10:18:09Z"
          message: Reconcile complete
          reason: Complete
          status: "True"
          type: Reconciled

      2. Deploy a stateful application

       

       

      $ oc get pod -n ocp-django
      NAME                              READY   STATUS      RESTARTS   AGE
      django-psql-persistent-1-build    0/1     Completed   0          7m34s
      django-psql-persistent-1-deploy   0/1     Completed   0          6m47s
      django-psql-persistent-1-h2dgp    1/1     Running     0          6m46s
      postgresql-1-deploy               0/1     Completed   0          7m32s
      postgresql-1-nkmgt                1/1     Running     0          7m30s

      3. Trigger a VSL backup of ocp-django namespace.

       

       

      oc get backup test-backup -o yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/resource-timeout: 10m0s
          velero.io/source-cluster-k8s-gitversion: v1.29.5+f6419fb
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "29"
        creationTimestamp: "2024-06-10T10:21:00Z"
        generation: 7
        labels:
          velero.io/storage-location: ts-dpa-1
        name: test-backup
        namespace: openshift-adp
        resourceVersion: "47418"
        uid: f5cd2016-f5dd-45bd-a172-10aedf0a95d8
      spec:
        csiSnapshotTimeout: 10m0s
        defaultVolumesToFsBackup: false
        includedNamespaces:
        - ocp-django
        itemOperationTimeout: 4h0m0s
        snapshotMoveData: false
        storageLocation: ts-dpa-1
        ttl: 720h0m0s
        volumeSnapshotLocations:
        - ts-dpa-1
      status:
        completionTimestamp: "2024-06-10T10:21:11Z"
        expiration: "2024-07-10T10:21:00Z"
        formatVersion: 1.1.0
        hookStatus: {}
        phase: Completed
        progress:
          itemsBackedUp: 91
          totalItems: 91
        startTimestamp: "2024-06-10T10:21:00Z"
        version: 1
        volumeSnapshotsAttempted: 1
        volumeSnapshotsCompleted: 1

      4. Delete app namespace and trigger restore.

      oc delete ns ocp-django
      namespace "ocp-django" deleted 
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: test-restore
        namespace: openshift-adp
      spec:
        backupName: test-backup

       

      Actual results:

      Restore is partially failing with error "Snapshot has empty state"

      time="2024-06-10T10:29:21Z" level=error msg="Cluster resource restore error: error executing PVAction for persistentvolumes/pvc-271d8ad0-ccb8-4037-ac85-de297e26cdeb: rpc error: code = Unknown desc = Snapshot snap-065bfc14a17157d7e is not available, err: Snapshot has empty state" logSource="/remote-source/velero/app/pkg/controller/restore_controller.go:587" restore=openshift-adp/test-restore

       

       

      Expected results:

      Restore should be successful. 

       

      Additional info:

      Attached restore logs below:- 

       

      ./velero describe backup test-backup --details
      Name:         test-backup
      Namespace:    openshift-adp
      Labels:       velero.io/storage-location=ts-dpa-1
      Annotations:  velero.io/resource-timeout=10m0s
                    velero.io/source-cluster-k8s-gitversion=v1.29.5+f6419fb
                    velero.io/source-cluster-k8s-major-version=1
                    velero.io/source-cluster-k8s-minor-version=29
      Phase:  Completed
      
      Namespaces:
        Included:  ocp-django
        Excluded:  <none>
      Resources:
        Included:        *
        Excluded:        <none>
        Cluster-scoped:  auto
      Label selector:  <none>
      Or label selector:  <none>
      Storage Location:  ts-dpa-1
      Velero-Native Snapshot PVs:  auto
      Snapshot Move Data:          false
      Data Mover:                  velero
      TTL:  720h0m0s
      CSISnapshotTimeout:    10m0s
      ItemOperationTimeout:  4h0m0s
      Hooks:  <none>
      Backup Format Version:  1.1.0
      Started:    2024-06-10 10:21:00 +0000 UTC
      Completed:  2024-06-10 10:21:11 +0000 UTC
      Expiration:  2024-07-10 10:21:00 +0000 UTC
      Total items to be backed up:  91
      Items backed up:              91
      Resource List:
        apps.openshift.io/v1/DeploymentConfig:
          - ocp-django/django-psql-persistent
          - ocp-django/postgresql
        authorization.openshift.io/v1/RoleBinding:
          - ocp-django/admin
          - ocp-django/system:deployers
          - ocp-django/system:image-builders
          - ocp-django/system:image-pullers
        build.openshift.io/v1/Build:
          - ocp-django/django-psql-persistent-1
        build.openshift.io/v1/BuildConfig:
          - ocp-django/django-psql-persistent
        discovery.k8s.io/v1/EndpointSlice:
          - ocp-django/django-psql-persistent-wf7vf
          - ocp-django/postgresql-5gzzf
        image.openshift.io/v1/ImageStream:
          - ocp-django/django-psql-persistent
        image.openshift.io/v1/ImageStreamTag:
          - ocp-django/django-psql-persistent:latest
        image.openshift.io/v1/ImageTag:
          - ocp-django/django-psql-persistent:latest
        rbac.authorization.k8s.io/v1/RoleBinding:
          - ocp-django/admin
          - ocp-django/system:deployers
          - ocp-django/system:image-builders
          - ocp-django/system:image-pullers
        route.openshift.io/v1/Route:
          - ocp-django/django-psql-persistent
        template.openshift.io/v1/Template:
          - ocp-django/mtc-test-django-psql-persistent
        v1/ConfigMap:
          - ocp-django/django-psql-persistent-1-ca
          - ocp-django/django-psql-persistent-1-global-ca
          - ocp-django/django-psql-persistent-1-sys-config
          - ocp-django/kube-root-ca.crt
          - ocp-django/openshift-service-ca.crt
        v1/Endpoints:
          - ocp-django/django-psql-persistent
          - ocp-django/postgresql
        v1/Event:
          - ocp-django/django-psql-persistent-1-build.17d79d4c488b2f35
          - ocp-django/django-psql-persistent-1-build.17d79d4c62069aad
          - ocp-django/django-psql-persistent-1-build.17d79d4c63ea357b
          - ocp-django/django-psql-persistent-1-build.17d79d4da8034aa2
          - ocp-django/django-psql-persistent-1-build.17d79d4db314a7f2
          - ocp-django/django-psql-persistent-1-build.17d79d4db41a197b
          - ocp-django/django-psql-persistent-1-build.17d79d4e0cbb2304
          - ocp-django/django-psql-persistent-1-build.17d79d4e178838d4
          - ocp-django/django-psql-persistent-1-build.17d79d4e1895861e
          - ocp-django/django-psql-persistent-1-build.17d79d4e48a9a82d
          - ocp-django/django-psql-persistent-1-build.17d79d4e5fadee82
          - ocp-django/django-psql-persistent-1-build.17d79d4e611ae034
          - ocp-django/django-psql-persistent-1-deploy.17d79d572ec5f247
          - ocp-django/django-psql-persistent-1-deploy.17d79d57474d1f4e
          - ocp-django/django-psql-persistent-1-deploy.17d79d5748b2f3ac
          - ocp-django/django-psql-persistent-1-deploy.17d79d57507361e5
          - ocp-django/django-psql-persistent-1-deploy.17d79d5751f038a3
          - ocp-django/django-psql-persistent-1-h2dgp.17d79d575936847c
          - ocp-django/django-psql-persistent-1-h2dgp.17d79d5771fd149d
          - ocp-django/django-psql-persistent-1-h2dgp.17d79d5773935ef5
          - ocp-django/django-psql-persistent-1-h2dgp.17d79d5a9d646921
          - ocp-django/django-psql-persistent-1-h2dgp.17d79d5aa4281f03
          - ocp-django/django-psql-persistent-1-h2dgp.17d79d5aa52dd90f
          - ocp-django/django-psql-persistent-1.17d79d4dd3076bfe
          - ocp-django/django-psql-persistent-1.17d79d57589d0888
          - ocp-django/django-psql-persistent-1.17d79d57ce1c22ec
          - ocp-django/django-psql-persistent.17d79d572ccf0be7
          - ocp-django/postgresql-1-deploy.17d79d4c9433b77e
          - ocp-django/postgresql-1-deploy.17d79d4cad02107f
          - ocp-django/postgresql-1-deploy.17d79d4cae47f2aa
          - ocp-django/postgresql-1-deploy.17d79d4d103a151e
          - ocp-django/postgresql-1-deploy.17d79d4d184a1bd5
          - ocp-django/postgresql-1-deploy.17d79d4d1996895d
          - ocp-django/postgresql-1-nkmgt.17d79d4e4d221ad1
          - ocp-django/postgresql-1-nkmgt.17d79d4ed8e3e5bd
          - ocp-django/postgresql-1-nkmgt.17d79d4f071f8217
          - ocp-django/postgresql-1-nkmgt.17d79d4f08ad2307
          - ocp-django/postgresql-1-nkmgt.17d79d50e8bee524
          - ocp-django/postgresql-1-nkmgt.17d79d50f06da988
          - ocp-django/postgresql-1-nkmgt.17d79d50f17067d9
          - ocp-django/postgresql-1.17d79d4d22580378
          - ocp-django/postgresql.17d79d4c67b1f99a
          - ocp-django/postgresql.17d79d4c91445437
          - ocp-django/postgresql.17d79d4d2307077a
          - ocp-django/postgresql.17d79d4d230ee92e
          - ocp-django/postgresql.17d79d4e2efccc26
        v1/Namespace:
          - ocp-django
        v1/PersistentVolume:
          - pvc-271d8ad0-ccb8-4037-ac85-de297e26cdeb
        v1/PersistentVolumeClaim:
          - ocp-django/postgresql
        v1/Pod:
          - ocp-django/django-psql-persistent-1-build
          - ocp-django/django-psql-persistent-1-deploy
          - ocp-django/django-psql-persistent-1-h2dgp
          - ocp-django/postgresql-1-deploy
          - ocp-django/postgresql-1-nkmgt
        v1/ReplicationController:
          - ocp-django/django-psql-persistent-1
          - ocp-django/postgresql-1
        v1/Secret:
          - ocp-django/builder-dockercfg-gk972
          - ocp-django/default-dockercfg-78f5h
          - ocp-django/deployer-dockercfg-kp946
          - ocp-django/django-psql-persistent
        v1/Service:
          - ocp-django/django-psql-persistent
          - ocp-django/postgresql
        v1/ServiceAccount:
          - ocp-django/builder
          - ocp-django/default
          - ocp-django/deployer
      Backup Volumes:
        Velero-Native Snapshots:
          pvc-271d8ad0-ccb8-4037-ac85-de297e26cdeb:
            Snapshot ID:        snap-065bfc14a17157d7e
            Type:               gp3
            Availability Zone:  us-east-2a
            IOPS:               0
        CSI Snapshots: <none included>
        Pod Volume Backups: <none included>
      HooksAttempted:  0
      HooksFailed:     0
      

       

       

      ./velero describe restore test-restore --details
      Name:         test-restore
      Namespace:    openshift-adp
      Labels:       <none>
      Annotations:  <none>
      Phase:                       PartiallyFailed (run 'velero restore logs test-restore' for more information)
      Total items to be restored:  42
      Items restored:              42
      Started:    2024-06-10 10:29:19 +0000 UTC
      Completed:  2024-06-10 10:29:22 +0000 UTC
      Warnings:
        Velero:     <none>
        Cluster:    <none>
        Namespaces:
          ocp-django:  could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version
                       could not restore, ConfigMap "openshift-service-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version
                       could not restore, RoleBinding "admin" already exists. Warning: the in-cluster version is different than the backed-up version
                       could not restore, RoleBinding "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version
                       could not restore, RoleBinding "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version
                       could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version
      Errors:
        Velero:     <none>
        Cluster:  error executing PVAction for persistentvolumes/pvc-271d8ad0-ccb8-4037-ac85-de297e26cdeb: rpc error: code = Unknown desc = Snapshot snap-065bfc14a17157d7e is not available, err: Snapshot has empty state
        Namespaces: <none>
      Backup:  test-backup
      Namespaces:
        Included:  all namespaces found in the backup
        Excluded:  <none>
      Resources:
        Included:        *
        Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
        Cluster-scoped:  auto
      Namespace mappings:  <none>
      Label selector:  <none>
      Or label selector:  <none>
      Restore PVs:  auto
      CSI Snapshot Restores: <none included>
      Existing Resource Policy:   <none>
      ItemOperationTimeout:       4h0m0s
      Preserve Service NodePorts:  auto
      
      HooksAttempted:   0
      HooksFailed:      0
      Resource List:
        apps.openshift.io/v1/DeploymentConfig:
          - ocp-django/django-psql-persistent(created)
          - ocp-django/postgresql(created)
        authorization.openshift.io/v1/RoleBinding:
          - ocp-django/admin(failed)
          - ocp-django/system:deployers(failed)
          - ocp-django/system:image-builders(failed)
          - ocp-django/system:image-pullers(failed)
        build.openshift.io/v1/Build:
          - ocp-django/django-psql-persistent-1(skipped)
        build.openshift.io/v1/BuildConfig:
          - ocp-django/django-psql-persistent(created)
        discovery.k8s.io/v1/EndpointSlice:
          - ocp-django/django-psql-persistent-wf7vf(created)
          - ocp-django/postgresql-5gzzf(created)
        image.openshift.io/v1/ImageStream:
          - ocp-django/django-psql-persistent(skipped)
        image.openshift.io/v1/ImageStreamTag:
          - ocp-django/django-psql-persistent:latest(skipped)
        image.openshift.io/v1/ImageTag:
          - ocp-django/django-psql-persistent:latest(skipped)
        rbac.authorization.k8s.io/v1/RoleBinding:
          - ocp-django/admin(created)
          - ocp-django/system:deployers(created)
          - ocp-django/system:image-builders(created)
          - ocp-django/system:image-pullers(created)
        route.openshift.io/v1/Route:
          - ocp-django/django-psql-persistent(created)
        template.openshift.io/v1/Template:
          - ocp-django/mtc-test-django-psql-persistent(created)
        v1/ConfigMap:
          - ocp-django/django-psql-persistent-1-ca(created)
          - ocp-django/django-psql-persistent-1-global-ca(created)
          - ocp-django/django-psql-persistent-1-sys-config(created)
          - ocp-django/kube-root-ca.crt(failed)
          - ocp-django/openshift-service-ca.crt(failed)
        v1/Endpoints:
          - ocp-django/django-psql-persistent(created)
          - ocp-django/postgresql(created)
        v1/Namespace:
          - ocp-django(created)
        v1/PersistentVolume:
          - pvc-271d8ad0-ccb8-4037-ac85-de297e26cdeb(failed)
        v1/PersistentVolumeClaim:
          - ocp-django/postgresql(created)
        v1/Pod:
          - ocp-django/django-psql-persistent-1-h2dgp(skipped)
          - ocp-django/postgresql-1-nkmgt(created)
        v1/ReplicationController:
          - ocp-django/django-psql-persistent-1(skipped)
          - ocp-django/postgresql-1(skipped)
        v1/Secret:
          - ocp-django/builder-dockercfg-gk972(created)
          - ocp-django/default-dockercfg-78f5h(created)
          - ocp-django/deployer-dockercfg-kp946(created)
          - ocp-django/django-psql-persistent(created)
        v1/Service:
          - ocp-django/django-psql-persistent(created)
          - ocp-django/postgresql(created)
        v1/ServiceAccount:
          - ocp-django/builder(skipped)
          - ocp-django/default(skipped)
          - ocp-django/deployer(skipped)
      

              tkaovila@redhat.com Tiger Kaovilai
              rhn-support-prajoshi Prasad Joshi
              Prasad Joshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: