Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2941

Backup is marked as failed when velero pod gets restarted during WaitingForPluginsOperations phase

XMLWordPrintable

    • Quality / Stability / Reliability
    • 4
    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-operator-bundle-container-1.5.0-82
    • ToDo
    • 0
    • Very Likely
    • 0
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

      Triggered a datamover backup with OADP 1.3.0 bounced Velero pod during the backup is in the waitingForPluginsOperation phase. Backup was marked as failed with failureReason  "Found a backup with status "InProgress" during the server starting, mark it as "Failed"

      Version-Release number of selected component (if applicable):

      OADP 1.3.0-117

       

      How reproducible:
      Always

       

      Steps to Reproduce:
      1. Create a DPA with nodeAgent and csi plugin is added in defaultPlugins list.

      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        name: ts-dpa
      spec:
        backupLocations:
        - velero:
            default: true
            objectStorage:
              bucket: oadpbucket241214
              prefix: velero
            credential:
              key: cloud
              name: cloud-credentials-gcp
            provider: gcp
        configuration:
          nodeAgent:
            uploaderType: restic
            enable: true
          velero:
            defaultPlugins:
            - gcp
            - openshift
            - csi

      2. Deploy a stateful application

       

      3. Create a datamover backup

      $ cat backup.yml 
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        name: test-backup
        labels:
          velero.io/storage-location: default
        namespace: openshift-adp
      spec:
        includedNamespaces:
        - ocp-mysql
        snapshotMoveData: true
      

      4. Wait until the backup moves to waitingForPluginOperation phase. 

      $ oc get backup -o jsonpath={.status.phase} -w
      InProgressInProgressWaitingForPluginOperations
      

      5. Verify the dataupload CRs are created.

      $ oc get dataupload
      NAME                STATUS     STARTED   BYTES DONE   TOTAL BYTES   STORAGE LOCATION   AGE   NODE
      test-backup-hs6g2   Accepted   26s                                  ts-dpa-1           26s   
      test-backup-m4fhc   Accepted   16s                                  ts-dpa-1           16s   

      6. Bounce the velero pod

      $ oc delete pod velero-f459d954-84xkl 
      pod "velero-f459d954-84xkl" deleted

      Actual results: 

      Backup is marked as Failed. 

      $ oc get backup  -o yaml
      apiVersion: v1
      items:
      - apiVersion: velero.io/v1
        kind: Backup
        metadata:
          annotations:
            velero.io/resource-timeout: 10m0s
            velero.io/source-cluster-k8s-gitversion: v1.27.6+98158f9
            velero.io/source-cluster-k8s-major-version: "1"
            velero.io/source-cluster-k8s-minor-version: "27"
          creationTimestamp: "2023-10-19T11:04:02Z"
          generation: 7
          labels:
            velero.io/storage-location: ts-dpa-1
          name: test-backup
          namespace: openshift-adp
          resourceVersion: "93615"
          uid: 183e3c95-a8bf-4900-adc1-63a273f5500f
        spec:
          csiSnapshotTimeout: 10m0s
          defaultVolumesToFsBackup: false
          includedNamespaces:
          - ocp-mysql
          itemOperationTimeout: 4h0m0s
          snapshotMoveData: true
          storageLocation: ts-dpa-1
          ttl: 720h0m0s
        status:
          backupItemOperationsAttempted: 2
          completionTimestamp: "2023-10-19T11:05:13Z"
          expiration: "2023-11-18T11:04:02Z"
          failureReason: found a backup with status "InProgress" during the server starting,
            mark it as "Failed"
          formatVersion: 1.1.0
          phase: Failed
          progress:
            itemsBackedUp: 57
            totalItems: 57
          startTimestamp: "2023-10-19T11:04:02Z"
          version: 1

       

      DataUpload CR is marked as cancel. 

      $ oc get dataupload  test-backup-hs6g2 -o yaml
      apiVersion: velero.io/v2alpha1
      kind: DataUpload
      metadata:
        creationTimestamp: "2023-10-19T11:04:16Z"
        generateName: test-backup-
        generation: 4
        labels:
          velero.io/accepted-by: oadp-59000-2q862-worker-b-pgtlg.c.openshift-qe.internal
          velero.io/async-operation-id: du-183e3c95-a8bf-4900-adc1-63a273f5500f.c27f1c9c-0b00-4d5465e59
          velero.io/backup-name: test-backup
          velero.io/backup-uid: 183e3c95-a8bf-4900-adc1-63a273f5500f
          velero.io/pvc-uid: c27f1c9c-0b00-4d5d-a09b-6737f31f7f60
        name: test-backup-hs6g2
        namespace: openshift-adp
        ownerReferences:
        - apiVersion: velero.io/v1
          controller: true
          kind: Backup
          name: test-backup
          uid: 183e3c95-a8bf-4900-adc1-63a273f5500f
        resourceVersion: "93638"
        uid: 81716dfd-ce38-4f5e-a6e9-f83934b038fa
      spec:
        backupStorageLocation: ts-dpa-1
        cancel: true
        csiSnapshot:
          snapshotClass: ""
          storageClass: standard-csi
          volumeSnapshot: velero-mysql-wj58k
        operationTimeout: 10m0s
        snapshotType: CSI
        sourceNamespace: ocp-mysql
        sourcePVC: mysql
      status:
        completionTimestamp: "2023-10-19T11:05:15Z"
        message: found a dataupload with status "Accepted" during the velero server starting,
          mark it as cancel
        phase: Canceled
        progress: {}
        startTimestamp: "2023-10-19T11:04:16Z"

       

       

      Expected results:

      Backup should get completed if it gets restarted during the waitingForPluginOperations phase.

       

      Additional info:

       

      ./velero describe backup test-backup -n openshift-adp --details
      Name:         test-backup
      Namespace:    openshift-adp
      Labels:       velero.io/storage-location=ts-dpa-1
      Annotations:  velero.io/resource-timeout=10m0s
                    velero.io/source-cluster-k8s-gitversion=v1.27.6+98158f9
                    velero.io/source-cluster-k8s-major-version=1
                    velero.io/source-cluster-k8s-minor-version=27
      Phase:  Failed (run `velero backup logs test-backup` for more information)
      
      Namespaces:
        Included:  ocp-mysql
        Excluded:  <none>
      Resources:
        Included:        *
        Excluded:        <none>
        Cluster-scoped:  auto
      Label selector:  <none>
      Storage Location:  ts-dpa-1
      Velero-Native Snapshot PVs:  auto
      Snapshot Move Data:          true
      Data Mover:                  velero
      TTL:  720h0m0s
      CSISnapshotTimeout:    10m0s
      ItemOperationTimeout:  4h0m0s
      Hooks:  <none>
      Backup Format Version:  1.1.0
      Started:    2023-10-19 16:34:02 +0530 IST
      Completed:  2023-10-19 16:35:13 +0530 IST
      Expiration:  2023-11-18 16:34:02 +0530 IST
      Total items to be backed up:  57
      Items backed up:              57
      Backup Item Operations:
        Operation for persistentvolumeclaims ocp-mysql/mysql:
          Backup Item Action Plugin:  velero.io/csi-pvc-backupper
          Operation ID:               du-183e3c95-a8bf-4900-adc1-63a273f5500f.c27f1c9c-0b00-4d5465e59
          Items to Update:
                                 datauploads.velero.io openshift-adp/test-backup-hs6g2
          Phase:                 InProgress
          Progress description:  Accepted
          Created:               2023-10-19 16:34:16 +0530 IST
          Started:               2023-10-19 16:34:16 +0530 IST
        Operation for persistentvolumeclaims ocp-mysql/mysql-1:
          Backup Item Action Plugin:  velero.io/csi-pvc-backupper
          Operation ID:               du-183e3c95-a8bf-4900-adc1-63a273f5500f.f92d7d3a-3b0f-4840f4164
          Items to Update:
                                 datauploads.velero.io openshift-adp/test-backup-m4fhc
          Phase:                 InProgress
          Progress description:  Accepted
          Created:               2023-10-19 16:34:26 +0530 IST
          Started:               2023-10-19 16:34:26 +0530 IST
      Resource List:
        apps/v1/Deployment:
          - ocp-mysql/mysql
        apps/v1/ReplicaSet:
          - ocp-mysql/mysql-68d84d7c89
        authorization.openshift.io/v1/RoleBinding:
          - ocp-mysql/admin
          - ocp-mysql/system:deployers
          - ocp-mysql/system:image-builders
          - ocp-mysql/system:image-pullers
        discovery.k8s.io/v1/EndpointSlice:
          - ocp-mysql/mysql-pv7qr
        rbac.authorization.k8s.io/v1/RoleBinding:
          - ocp-mysql/admin
          - ocp-mysql/system:deployers
          - ocp-mysql/system:image-builders
          - ocp-mysql/system:image-pullers
        v1/ConfigMap:
          - ocp-mysql/kube-root-ca.crt
          - ocp-mysql/openshift-service-ca.crt
        v1/Endpoints:
          - ocp-mysql/mysql
        v1/Event:
          - ocp-mysql/mysql-1.178f75e50b55cc4b
          - ocp-mysql/mysql-1.178f75e53375ff04
          - ocp-mysql/mysql-1.178f75e5337c2030
          - ocp-mysql/mysql-1.178f75e61e26391e
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f75e65db93f1b
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f75e76216efc0
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f75e81a91d7c7
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f75e8cc884dee
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f75e8cf298757
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f75eaca74c238
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f75ead129f618
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f75ead25a91ca
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f7c59783f6362
          - ocp-mysql/mysql-68d84d7c89-4ktgb.178f7ca094f7a81e
          - ocp-mysql/mysql-68d84d7c89-kvd7g.178f7ca0b9cfb971
          - ocp-mysql/mysql-68d84d7c89-kvd7g.178f7ce9be9120fc
          - ocp-mysql/mysql-68d84d7c89.178f75e5319092b8
          - ocp-mysql/mysql-68d84d7c89.178f7ca0b9e4aa43
          - ocp-mysql/mysql.178f75e4e708be2f
          - ocp-mysql/mysql.178f75e530c01f6d
          - ocp-mysql/mysql.178f75e532e9f7d1
          - ocp-mysql/mysql.178f75e532ed1d0f
          - ocp-mysql/mysql.178f75e621334c8b
          - ocp-mysql/velero-mysql-1-sqndc.178f76040bdecef4
          - ocp-mysql/velero-mysql-1-sqndc.178f76043b7c9589
          - ocp-mysql/velero-mysql-98kpv.178f7603ba266b32
        v1/Namespace:
          - ocp-mysql
        v1/PersistentVolume:
          - pvc-c27f1c9c-0b00-4d5d-a09b-6737f31f7f60
          - pvc-f92d7d3a-3b0f-484c-a697-c7a1f6a42b78
        v1/PersistentVolumeClaim:
          - ocp-mysql/mysql
          - ocp-mysql/mysql-1
        v1/Pod:
          - ocp-mysql/mysql-68d84d7c89-kvd7g
        v1/Secret:
          - ocp-mysql/builder-dockercfg-45vpg
          - ocp-mysql/builder-token-gcjwd
          - ocp-mysql/default-dockercfg-fskf8
          - ocp-mysql/default-token-dbfq9
          - ocp-mysql/deployer-dockercfg-xvnfk
          - ocp-mysql/deployer-token-4d7mc
          - ocp-mysql/mysql
        v1/Service:
          - ocp-mysql/mysql
        v1/ServiceAccount:
          - ocp-mysql/builder
          - ocp-mysql/default
          - ocp-mysql/deployer
      Velero-Native Snapshots: <none included>
      

       

              sseago Scott Seago
              rhn-support-prajoshi Prasad Joshi
              Prasad Joshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: