-
Sub-task
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
None
-
4
-
False
-
-
False
-
Passed
-
-
-
0
-
0.000
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
Description of problem:
Velero backup stays in progress status after restic pod is restarted due to OOM killed, before this build oadp-operator-bundle-container-1.1.2-14 test passed as usual but now it started failing. Attached report portal link below.
Upstream PR: https://github.com/vmware-tanzu/velero/pull/4893
Version-Release number of selected component (if applicable):
OADP 1.1.2
Build :- oadp-operator-bundle-container-1.1.2-16
How reproducible:
Always
Failing consistently.
Steps to Reproduce:
Polarion case :- https://polarion.engineering.redhat.com/polarion/redirect/project/OADP/workitem?id=OADP-231
1. Create a dpa CR with low restic limit resource
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: ts-dpa namespace: openshift-adp spec: backupLocations: - velero: credential: key: cloud name: cloud-credentials-gcp default: true objectStorage: bucket: oadpbucket163761 prefix: velero-e2e-50e5ea53-7a22-11ed-b0bf-845cf3eff33a provider: gcp configuration: restic: enable: true podConfig: resourceAllocations: limits: cpu: 100m memory: 50Mi requests: cpu: 50m memory: 10Mi velero: defaultPlugins: - openshift - gcp - kubevirt
2. Create a restic backup
Actual results:
Backup got stuck in inprogress status.
$ oc get podvolumebackup NAME STATUS CREATED NAMESPACE POD VOLUME REPOSITORY ID UPLOADER TYPE STORAGE LOCATION AGE backup1-53b48381-7a22-11ed-b0bf-845cf3eff33a-bndxk InProgress 11m test-oadp-591 postgresql-1-hf7js postgresql-data gs:oadpbucket163761:/velero-e2e-ebeca73d-79f2-11ed-941e-0a58ac1e09e0/restic/test-oadp-591 restic ts-dpa-1 11m
Expected results:
PodVolumeBackup should be marked as Failed in case of restic pod restart. Also backup should be marked as partiallyFailed.
Additional info: