Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: OADP 1.1.0
Affects Version/s: OADP 1.1.0
Component/s: Scale&Perf-QE
Labels:
- qe-impact
- triaged

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
QEStatus:
Passed

Cost of Delay:
0
WSJF:
0
Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

i have run a backup plan which involved a single ns/pod with 20GB of data that contains 5M files using restic on rbd sc the backup was completed successfully > "case-2.1.1-rbd-restic-backup"
I have updated the timeout value for the restic plugin on the dpa CR to 900min ( by Tiger example )
after almost 220 min the restored state is still on "InProgress"
It seems like the restore was completed successfully since the new pod on the target ns is exists and contains the same amount of files/data as expected from the source backup but the target ns also contains the stuck pod of restic-wait which seems like its stuck

[root@f01-h07-000-r640 oadp-helpers]# oc get pods  -nperf-datagen-case1-ocs-storagecluster-ceph-rbd
NAME                                  READY   STATUS     RESTARTS   AGE
perf-datagen-case1-5bd5dfd7fd-zxctb   0/1     Init:0/1   0          161m
perf-datagen-case1-8495854c4d-62x4n   1/1     Running    0          161m


the new restored and active on "Running" state  pod  > "perf-datagen-case1-8495854c4d-62x4n"


from the shell of the pod : 
 Pod: perf-datagen-case1-ocs-storagecluster-ceph-rbd/perf-datagen-case1-8495854c4d-62x4n | Container: data-generator

(app-root) bash-4.2$ du -sh /opt/mounts/mnt1/new/
20G    /opt/mounts/mnt1/new/
(app-root) bash-4.2$ find  /opt/mounts/mnt1/new/ -type f |wc -l
5000000

Those errors from the velero log :

time="2022-07-31T09:47:51Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/case-2.1.1-rbd-restic-backup-f810fe87-44c0-4e90-b2cd-d5939fa335bc error="downloadrequests.velero.io \"case-2.1.1-rbd-restic-backup-f810fe87-44c0-4e90-b2cd-d5939fa335bc\" not found" logSource="pkg/controller/download_request_controller.go:74"
time="2022-07-31T09:47:55Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/case-2.1.1-rbd-restic-backup-3d78a93c-b16f-41e8-9837-a6a99c07d000 error="downloadrequests.velero.io \"case-2.1.1-rbd-restic-backup-3d78a93c-b16f-41e8-9837-a6a99c07d000\" not found" logSource="pkg/controller/download_request_controller.go:74"
time="2022-07-31T09:48:26Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/case-2.1.1-rbd-restic-backup-365fd926-e3ce-4618-a297-4e3f2ab2ef12 error="downloadrequests.velero.io \"case-2.1.1-rbd-restic-backup-365fd926-e3ce-4618-a297-4e3f2ab2ef12\" not found" logSource="pkg/controller/download_request_controller.go:74"

those errors from the velero log regarding the backup which is missing but it present on the cluster and its on completed state from the backup CR

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/source-cluster-k8s-gitversion: v1.23.5+012e945
    velero.io/source-cluster-k8s-major-version: "1"
    velero.io/source-cluster-k8s-minor-version: "23"
  creationTimestamp: "2022-07-31T09:00:02Z"
  generation: 5
  labels:
    velero.io/storage-location: example-velero-1
  name: case-2.1.1-rbd-restic-backup
  namespace: openshift-adp
  resourceVersion: "691460796"
  uid: 92b08e9c-2cae-4979-9206-fae010ebf3fb
spec:
  defaultVolumesToRestic: true
  hooks: {}
  includedNamespaces:
  - perf-datagen-case1-ocs-storagecluster-ceph-rbd
  metadata: {}
  snapshotVolumes: false
  storageLocation: example-velero-1
  ttl: 720h0m0s
status:
  completionTimestamp: "2022-07-31T09:17:50Z"
  expiration: "2022-08-30T09:00:02Z"
  formatVersion: 1.1.0
  phase: Completed
  progress:
    itemsBackedUp: 29
    totalItems: 29
  startTimestamp: "2022-07-31T09:00:02Z"
  version: 1

oadp : 1.1.0 iib:282305
OCP: 4.10.23

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

case-2.1.1-rbd-restic-backup.yml
0.9 kB
2022/07/31 12:33 PM
case-2.1.1-rbd-restic-restore_describe.yml
1 kB
2022/07/31 12:33 PM
case-2.1.1-rbd-restic-restore_velero_describe.yml
0.9 kB
2022/07/31 12:33 PM
dpa_example_velero.yml
1 kB
2022/07/31 12:33 PM
image-2022-08-10-11-20-17-227.png
50 kB
2022/08/10 8:20 AM
openshift-adp-controller-manager-cbfdd5f48-w92cc.log
13.18 MB
2022/07/31 12:33 PM
perf-datagen-case1-5bd5dfd7fd-zxctb_restic_stuckpod.yml
6 kB
2022/07/31 12:33 PM
perf-datagen-case1-8495854c4d-62x4n_running_pod.yml
4 kB
2022/07/31 12:33 PM
pods_names.png
85 kB
2022/07/31 12:58 PM
restic_containers.png
66 kB
2022/07/31 12:58 PM
restic-hxhvd_worker02.log
3 kB
2022/07/31 12:33 PM
velero-f667d9497-jglxq.log
1.16 MB
2022/07/31 12:33 PM

Assignee:: Scott Seago

Reporter:: Tzahi Ashkenazi

QA Contact:: Tzahi Ashkenazi

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/07/31 12:38 PM

Updated:: 2024/02/09 7:01 PM

Resolved:: 2024/02/09 7:01 PM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates