Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-657

Restic restore on OCP 4.10 remains in InProgress state even though restore seems complete

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Passed
    • No
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

      i have run a backup plan which involved a single ns/pod with 20GB of data that contains 5M files using restic  on rbd sc the backup was completed successfully   >  "case-2.1.1-rbd-restic-backup"
      I have updated the timeout value for the restic plugin on the dpa CR to 900min  ( by Tiger example )
      after almost 220 min the restored state is still on  "InProgress" 
      It seems like the restore was completed successfully  since the new pod on the target ns is exists and contains the same amount of files/data as expected from the source backup   but the target ns also contains the stuck pod of restic-wait which seems like its stuck 

      [root@f01-h07-000-r640 oadp-helpers]# oc get pods  -nperf-datagen-case1-ocs-storagecluster-ceph-rbd
      NAME                                  READY   STATUS     RESTARTS   AGE
      perf-datagen-case1-5bd5dfd7fd-zxctb   0/1     Init:0/1   0          161m
      perf-datagen-case1-8495854c4d-62x4n   1/1     Running    0          161m
      
      
      the new restored and active on "Running" state  pod  > "perf-datagen-case1-8495854c4d-62x4n"
      
      
      from the shell of the pod : 
       Pod: perf-datagen-case1-ocs-storagecluster-ceph-rbd/perf-datagen-case1-8495854c4d-62x4n | Container: data-generator
      
      (app-root) bash-4.2$ du -sh /opt/mounts/mnt1/new/
      20G    /opt/mounts/mnt1/new/
      (app-root) bash-4.2$ find  /opt/mounts/mnt1/new/ -type f |wc -l
      5000000
      
      
      
      
      

       

      Those errors from the velero log : 

      time="2022-07-31T09:47:51Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/case-2.1.1-rbd-restic-backup-f810fe87-44c0-4e90-b2cd-d5939fa335bc error="downloadrequests.velero.io \"case-2.1.1-rbd-restic-backup-f810fe87-44c0-4e90-b2cd-d5939fa335bc\" not found" logSource="pkg/controller/download_request_controller.go:74"
      time="2022-07-31T09:47:55Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/case-2.1.1-rbd-restic-backup-3d78a93c-b16f-41e8-9837-a6a99c07d000 error="downloadrequests.velero.io \"case-2.1.1-rbd-restic-backup-3d78a93c-b16f-41e8-9837-a6a99c07d000\" not found" logSource="pkg/controller/download_request_controller.go:74"
      time="2022-07-31T09:48:26Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/case-2.1.1-rbd-restic-backup-365fd926-e3ce-4618-a297-4e3f2ab2ef12 error="downloadrequests.velero.io \"case-2.1.1-rbd-restic-backup-365fd926-e3ce-4618-a297-4e3f2ab2ef12\" not found" logSource="pkg/controller/download_request_controller.go:74" 

       

      those errors from the velero log  regarding  the backup which is  missing but it present on the cluster and its on  completed state  from the backup CR

      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/source-cluster-k8s-gitversion: v1.23.5+012e945
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "23"
        creationTimestamp: "2022-07-31T09:00:02Z"
        generation: 5
        labels:
          velero.io/storage-location: example-velero-1
        name: case-2.1.1-rbd-restic-backup
        namespace: openshift-adp
        resourceVersion: "691460796"
        uid: 92b08e9c-2cae-4979-9206-fae010ebf3fb
      spec:
        defaultVolumesToRestic: true
        hooks: {}
        includedNamespaces:
        - perf-datagen-case1-ocs-storagecluster-ceph-rbd
        metadata: {}
        snapshotVolumes: false
        storageLocation: example-velero-1
        ttl: 720h0m0s
      status:
        completionTimestamp: "2022-07-31T09:17:50Z"
        expiration: "2022-08-30T09:00:02Z"
        formatVersion: 1.1.0
        phase: Completed
        progress:
          itemsBackedUp: 29
          totalItems: 29
        startTimestamp: "2022-07-31T09:00:02Z"
        version: 1
       

      oadp : 1.1.0 iib:282305
      OCP: 4.10.23

            sseago Scott Seago
            tzahia Tzahi Ashkenazi
            Tzahi Ashkenazi Tzahi Ashkenazi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: