Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-1256

Backup stays in progress status after restic pod is restarted due to OOM killed

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-velero-container-1.1.2-12
    • ToDo
    • Yes
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • Approved

    Description

      Description of problem:

      Velero backup stays in progress status after restic pod is restarted due to OOM killed,  before this build oadp-operator-bundle-container-1.1.2-14  test passed as usual but now it started failing. Attached report portal link below.

      https://reportportal-migration-qe.apps.ocp-c1.prod.psi.redhat.com/ui/#oadp/launches/all/2689/95281/log

      Upstream PR: https://github.com/vmware-tanzu/velero/pull/4893

       

      Version-Release number of selected component (if applicable):

      OADP 1.1.2

      Build :- oadp-operator-bundle-container-1.1.2-16

       

      How reproducible:

      Always
      Failing consistently. 

       

      Steps to Reproduce:

      Polarion case :- https://polarion.engineering.redhat.com/polarion/redirect/project/OADP/workitem?id=OADP-231

       

      1. Create a dpa CR with low restic limit resource

      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        name: ts-dpa
        namespace: openshift-adp
      spec:
        backupLocations:
        - velero:
            credential:
              key: cloud
              name: cloud-credentials-gcp
            default: true
            objectStorage:
              bucket: oadpbucket163761
              prefix: velero-e2e-50e5ea53-7a22-11ed-b0bf-845cf3eff33a
            provider: gcp
        configuration:
          restic:
            enable: true
            podConfig:
              resourceAllocations:
                limits:
                  cpu: 100m
                  memory: 50Mi
                requests:
                  cpu: 50m
                  memory: 10Mi
          velero:
            defaultPlugins:
            - openshift
            - gcp
            - kubevirt

      2. Create a restic backup

      Actual results:

      Backup got stuck in inprogress status.

      $ oc get podvolumebackup
      NAME                                                 STATUS       CREATED   NAMESPACE       POD                  VOLUME            REPOSITORY ID                                                                               UPLOADER TYPE   STORAGE LOCATION   AGE
      backup1-53b48381-7a22-11ed-b0bf-845cf3eff33a-bndxk   InProgress   11m       test-oadp-591   postgresql-1-hf7js   postgresql-data   gs:oadpbucket163761:/velero-e2e-ebeca73d-79f2-11ed-941e-0a58ac1e09e0/restic/test-oadp-591   restic          ts-dpa-1           11m
       

      Expected results: 
      PodVolumeBackup should be marked as Failed in case of restic pod restart. Also backup should be marked as partiallyFailed.

      Additional info:

      Attachments

        Issue Links

          Activity

            People

              tkaovila@redhat.com Tiger Kaovilai
              rhn-support-prajoshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: