Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2860

Restore is getting stuck inProgress status for deploymentConfig resource

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • OADP 1.2.6
    • OADP 1.2.2, oadp 1.1.5
    • restic
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0.000
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

       

      Restore is getting stuck in InProgress status for DeploymentConfig application when PV's are backed up via PV opt in approach.  For other k8s resources such as Deployment, StatefulSet its working fine.  Attached error below:-

       

      $ oc logs velero-798755899f-lhx84| grep error
      time="2023-10-09T10:53:35Z" level=error msg="Failed to check node-agent pod status, disengage" error="pods \"postgresql-1-8pd7b\" not found" logSource="/remote-source/velero/app/pkg/podvolume/restorer.go:206"
      

       

      Version-Release number of selected component (if applicable):
      OADP 1.2.2

      How reproducible:

      Always 

       

      Steps to Reproduce:

      1. Create DPA with restic enabled.

      $ oc get dpa -o yaml ts-dpa
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        name: ts-dpa
        namespace: openshift-adp
        resourceVersion: "117481"
        uid: dbf952f8-d6e0-4269-8b2e-034b407e05a5
      spec:
        backupLocations:
        - velero:
            default: true
            objectStorage:
              bucket: oadpbucket238227
              prefix: velero
            provider: gcp
        configuration:
          restic:
            enable: true
          velero:
            defaultPlugins:
            - gcp
            - openshift
      status:
        conditions:
        - lastTransitionTime: "2023-10-09T09:41:37Z"
          message: Reconcile complete
          reason: Complete
          status: "True"
          type: Reconciled

      2. Deploy ocp-django application.

      $ appm deploy ocp-django
      
      $ oc get pod -n ocp-django
      NAME                              READY   STATUS      RESTARTS   AGE
      django-psql-persistent-1-build    0/1     Completed   0          2m21s
      django-psql-persistent-1-deploy   0/1     Completed   0          99s
      django-psql-persistent-1-q64l9    1/1     Running     0          97s
      postgresql-1-8pd7b                1/1     Running     0          2m18s
      postgresql-1-deploy               0/1     Completed   0          2m20s

      3. Add backup-volume annotation to postgreSQL pod. 

      $ oc get pod -n ocp-django postgresql-1-8pd7b -o yaml
        volumes:
        - name: postgresql-data
          persistentVolumeClaim:
            claimName: postgresql
        

      4. Execute backup with defaultVolumesToFSBackup set as false.

      $ oc get backup -o yaml test-backup3 -o yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/source-cluster-k8s-gitversion: v1.27.6+6936c15
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "27"
        name: test-backup3
        namespace: openshift-adp
      spec:
        csiSnapshotTimeout: 10m0s
        defaultVolumesToFsBackup: false
        includedNamespaces:
        - ocp-django
        itemOperationTimeout: 1h0m0s
        storageLocation: ts-dpa-1
        ttl: 720h0m0s
      status:
        completionTimestamp: "2023-10-09T10:49:30Z"
        expiration: "2023-11-08T10:49:12Z"
        formatVersion: 1.1.0
        phase: Completed
        progress:
          itemsBackedUp: 94
          totalItems: 94
        startTimestamp: "2023-10-09T10:49:12Z"
        version: 1

      5. Verify PodVolumebackup CR created for the volume.

      $ oc get podvolumebackup test-backup3-5rtsl
      NAME                 STATUS      CREATED   NAMESPACE    POD                  VOLUME            REPOSITORY ID                                   UPLOADER TYPE   STORAGE LOCATION   AGE
      test-backup3-5rtsl   Completed   2m31s     ocp-django   postgresql-1-8pd7b   postgresql-data   gs:oadpbucket238227:/velero/restic/ocp-django   restic          ts-dpa-1           2m31s

      6. Removed app namespace
      7. Execute restore

       

      Actual results:
      Restore gets stuck inProgress status waiting for all the podvolumeRestore to complete. 

       

       spec:
          backupName: test-backup3
          excludedResources:
          - nodes
          - events
          - events.events.k8s.io
          - backups.velero.io
          - restores.velero.io
          - resticrepositories.velero.io
          - csinodes.storage.k8s.io
          - volumeattachments.storage.k8s.io
          - backuprepositories.velero.io
          itemOperationTimeout: 1h0m0s
        status:
          phase: InProgress
          progress:
            itemsRestored: 47
            totalItems: 47
          startTimestamp: "2023-10-09T10:53:18Z" 

       

      PodVolumeRestore doesn't have any status.

      $ oc get podvolumerestore test-restore3-wq72j
      NAME                  NAMESPACE    POD                  UPLOADER TYPE   VOLUME            STATUS   TOTALBYTES   BYTESDONE   AGE
      test-restore3-wq72j   ocp-django   postgresql-1-8pd7b   restic          postgresql-data                                     13m 

       

      Expected results:
      Restore should get completed successfully. 

       

      Additional info:

              sseago Scott Seago
              rhn-support-prajoshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: