Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-5178

Backup should immediately fail when BSL is in unavailable state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • OADP 1.5.0
    • OADP 1.3.3, OADP 1.4.1
    • velero
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0.000
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      Description of problem:

      Tried running backup when BSL was in unavailable state. I noticed Velero processed entire backup request and lastly marked it as failed.  

      Slack discussion:- 
      https://redhat-internal.slack.com/archives/C0144ECKUJ0/p1730725233888819

       

      Version-Release number of selected component (if applicable):

      OADP 1.4.1

       

      How reproducible:
      Always 

       

      Steps to Reproduce:
      1. Create a DPA with non existing bucket.

      $ oc get dpa ts-dpa -o yaml
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        creationTimestamp: "2024-11-06T11:02:12Z"
        generation: 2
        name: ts-dpa
        namespace: openshift-adp
        resourceVersion: "13822947"
        uid: d36fa2b9-20b6-457d-b0f0-1d4344c548f6
      spec:
        backupLocations:
        - name: default
          velero:
            config:
              profile: noobaa
              region: noobaa
              s3ForcePathStyle: "true"
              s3Url: http://s3-openshift-storage.apps.mig-oadp-2316d.rhos-psi.cnv-qe.rhood.us
            credential:
              key: cloud
              name: cloud-credentials
            default: true
            objectStorage:
              bucket: migoadp2316djgp7g1
              prefix: velero
            provider: aws
        configuration:
          nodeAgent:
            enable: true
            uploaderType: kopia
          velero:
            defaultPlugins:
            - openshift
            - aws
            - csi
      status:
        conditions:
        - lastTransitionTime: "2024-11-06T11:02:13Z"
          message: Reconcile complete
          reason: Complete
          status: "True"
          type: Reconciled

      2. Verify the BSL is in unavailable state 

      $ oc get bsl default -o yaml
      apiVersion: velero.io/v1
      kind: BackupStorageLocation
      metadata:
        creationTimestamp: "2024-11-06T11:02:12Z"
        generation: 18
        labels:
          app.kubernetes.io/component: bsl
          app.kubernetes.io/instance: default
          app.kubernetes.io/managed-by: oadp-operator
          app.kubernetes.io/name: oadp-operator-velero
          openshift.io/oadp: "True"
          openshift.io/oadp-registry: "True"
        name: default
        namespace: openshift-adp
        ownerReferences:
        - apiVersion: oadp.openshift.io/v1alpha1
          blockOwnerDeletion: true
          controller: true
          kind: DataProtectionApplication
          name: ts-dpa
          uid: d36fa2b9-20b6-457d-b0f0-1d4344c548f6
        resourceVersion: "13828551"
        uid: bfb24f78-f433-4122-8b06-add2683d0128
      spec:
        config:
          checksumAlgorithm: ""
          profile: noobaa
          region: noobaa
          s3ForcePathStyle: "true"
          s3Url: http://s3-openshift-storage.apps.mig-oadp-2316d.rhos-psi.cnv-qe.rhood.us
        credential:
          key: cloud
          name: cloud-credentials
        default: true
        objectStorage:
          bucket: migoadp2316djgp7g1
          prefix: velero
        provider: aws
      status:
        lastValidationTime: "2024-11-06T11:16:16Z"
        message: 'BackupStorageLocation "default" is unavailable: rpc error: code = Unknown
          desc = operation error S3: ListObjectsV2, https response error StatusCode: 404,
          RequestID: m35s9up6-eyg2so-w7j, HostID: m35s9up6-eyg2so-w7j, NoSuchBucket: '
        phase: Unavailable

      3.  Deploy an application and trigger a backup 

       

      Actual results:

      From velero logs and status it seems that it processed the backup request. Refer to status.progress.itemsBackedUp field. 

      $ oc get backup test-backup1 -o yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/resource-timeout: 10m0s
          velero.io/source-cluster-k8s-gitversion: v1.29.8+632b078
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "29"
        creationTimestamp: "2024-11-06T11:12:35Z"
        generation: 5
        labels:
          velero.io/storage-location: default
        name: test-backup1
        namespace: openshift-adp
        resourceVersion: "13826552"
        uid: f45e9d8c-1a59-48f3-b732-afbe756e9571
      spec:
        csiSnapshotTimeout: 10m0s
        defaultVolumesToFsBackup: false
        includedNamespaces:
        - ocp-django
        itemOperationTimeout: 4h0m0s
        snapshotMoveData: false
        storageLocation: default
        ttl: 720h0m0s
      status:
        errors: 2
        expiration: "2024-12-06T11:12:35Z"
        failureReason: 'rpc error: code = Unknown desc = error putting object velero/backups/test-backup1/velero-backup.json:
          operation error S3: PutObject, https response error StatusCode: 404, RequestID:
          m35s5565-38lqzg-vds, HostID: m35s5565-38lqzg-vds, api error NoSuchBucket: The
          specified bucket does not exist.'
        formatVersion: 1.1.0
        hookStatus: {}
        phase: Failed
        progress:
          itemsBackedUp: 62
          totalItems: 62
        startTimestamp: "2024-11-06T11:12:35Z"
        version: 1
        warnings: 1

      Velero logs:-

      time="2024-11-06T11:12:37Z" level=info msg="Backed up a total of 62 items" backup=openshift-adp/test-backup1 logSource="/remote-source/velero/app/pkg/backup/backup.go:499" progress=
      time="2024-11-06T11:12:37Z" level=info msg="Setting up backup store to persist the backup" backup=openshift-adp/test-backup1 logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:738"
      time="2024-11-06T11:12:37Z" level=error msg="Error uploading log file" backup=test-backup1 bucket=migoadp2316djgp7g1 error="rpc error: code = Unknown desc = error putting object velero/backups/test-backup1/test-backup1-logs.gz: operation error S3: PutObject, https response error StatusCode: 404, RequestID: m35s555m-2x69zr-18z6, HostID: m35s555m-2x69zr-18z6, api error NoSuchBucket: The specified bucket does not exist." error.file="/remote-source/app/velero-plugin-for-aws/object_store.go:279" error.function="main.(*ObjectStore).PutObject" logSource="/remote-source/velero/app/pkg/persistence/object_store.go:256" prefix=velero
      time="2024-11-06T11:12:37Z" level=info msg="Initial backup processing complete, moving to FinalizingPartiallyFailed" backup=openshift-adp/test-backup1 logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:752"
      time="2024-11-06T11:12:37Z" level=info msg="plugin process exited" backup=openshift-adp/test-backup1 cmd=/velero id=174 logSource="/remote-source/velero/app/pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/velero
      time="2024-11-06T11:12:37Z" level=info msg="plugin process exited" backup=openshift-adp/test-backup1 cmd=/plugins/velero-plugins id=185 logSource="/remote-source/velero/app/pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugins
      time="2024-11-06T11:12:37Z" level=info msg="plugin process exited" backup=openshift-adp/test-backup1 cmd=/plugins/velero-plugin-for-aws id=196 logSource="/remote-source/velero/app/pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugin-for-aws
      time="2024-11-06T11:12:37Z" level=error msg="backup failed" backuprequest=openshift-adp/test-backup1 controller=backup error="rpc error: code = Unknown desc = error putting object velero/backups/test-backup1/velero-backup.json: operation error S3: PutObject, https response error StatusCode: 404, RequestID: m35s5565-38lqzg-vds, HostID: m35s5565-38lqzg-vds, api error NoSuchBucket: The specified bucket does not exist." logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:306"
      time="2024-11-06T11:12:37Z" level=info msg="Updating backup's final status" backuprequest=openshift-adp/test-backup1 controller=backup logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:325"
       

      Expected results:

      Velero should exist early if BSL is in unavailable state. 

       

       

      Additional info:

              wnstb Wes Hayutin
              rhn-support-prajoshi Prasad Joshi
              Prasad Joshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: