Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-3140 Post Restore Hooks might start running before Datadownload op has release the related PV
  3. OADP-4755

[RedHat QE] Verify Bug OADP-3140 - Post Restore Hooks might start running before Datadownload op has release the related PV

    • Icon: Sub-task Sub-task
    • Resolution: Done
    • Icon: Undefined Undefined
    • OADP 1.4.1
    • None
    • QE-Task
    • None
    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0.000
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

      Description of problem:

      Due to the Asynchronous nature of the datamover operation 

      A post-hook might be attempted before the related pods PV's are released by the datamover PVC
      The pod Remains in pending during that time, and cannot run the Hook.
      The Hook attempt might time out, before the pod is released, leading to PartllyFaile backup.

      link to Slack thread in the comments

       

       

      Version-Release number of selected component (if applicable):

      How reproducible:

       

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

       

      Expected results:

       

      Additional info:

            [OADP-4755] [RedHat QE] Verify Bug OADP-3140 - Post Restore Hooks might start running before Datadownload op has release the related PV

            Verified with OADP 1.4.1-28 build. 

            1. Created DPA with nodeAgent and CSI enabled.

            oc get dpa ts-dpa -o yaml
            apiVersion: oadp.openshift.io/v1alpha1
            kind: DataProtectionApplication
            metadata:
              creationTimestamp: "2024-08-30T10:40:46Z"
              generation: 1
              name: ts-dpa
              namespace: openshift-adp
              resourceVersion: "114035"
              uid: e51555ef-0d80-4884-984c-2f05b650c076
            spec:
              backupLocations:
              - velero:
                  credential:
                    key: cloud
                    name: cloud-credentials-gcp
                  default: true
                  objectStorage:
                    bucket: oadp95301m9q8h
                    prefix: velero-e2e-4eef4c9e-66bc-11ef-8620-845cf3eff33a
                  provider: gcp
              configuration:
                nodeAgent:
                  enable: true
                  podConfig:
                    resourceAllocations: {}
                  uploaderType: kopia
                velero:
                  defaultPlugins:
                  - openshift
                  - gcp
                  - kubevirt
                  - csi
              podDnsConfig: {}
              snapshotLocations: []
            status:
              conditions:
              - lastTransitionTime: "2024-08-30T10:40:47Z"
                message: Reconcile complete
                reason: Complete
                status: "True"
                type: Reconciled
            

            2. Deployed stateful application and trigger datamover backup.

            $ oc get backup  mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a -o yaml
            apiVersion: velero.io/v1
            kind: Backup
            metadata:
              annotations:
                velero.io/resource-timeout: 10m0s
                velero.io/source-cluster-k8s-gitversion: v1.30.3
                velero.io/source-cluster-k8s-major-version: "1"
                velero.io/source-cluster-k8s-minor-version: "30"
              creationTimestamp: "2024-08-30T10:43:20Z"
              generation: 8
              labels:
                velero.io/storage-location: ts-dpa-1
              name: mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a
              namespace: openshift-adp
              resourceVersion: "115759"
              uid: 16e9642a-b6ec-4bfa-8bdd-b79c16f20fcc
            spec:
              csiSnapshotTimeout: 10m0s
              defaultVolumesToFsBackup: false
              hooks: {}
              includedNamespaces:
              - test-oadp-196
              itemOperationTimeout: 4h0m0s
              metadata: {}
              snapshotMoveData: true
              storageLocation: ts-dpa-1
              ttl: 720h0m0s
            status:
              backupItemOperationsAttempted: 2
              backupItemOperationsCompleted: 2
              completionTimestamp: "2024-08-30T10:45:11Z"
              expiration: "2024-09-29T10:43:20Z"
              formatVersion: 1.1.0
              hookStatus: {}
              phase: Completed
              progress:
                itemsBackedUp: 46
                totalItems: 46
              startTimestamp: "2024-08-30T10:43:21Z"
              version: 1
            

            3. Removed app namespace and triggered restore

            $  oc get restore mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a -o yaml
            apiVersion: velero.io/v1
            kind: Restore
            metadata:
              creationTimestamp: "2024-08-30T10:45:50Z"
              finalizers:
              - restores.velero.io/external-resources-finalizer
              generation: 9
              name: mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a
              namespace: openshift-adp
              resourceVersion: "116631"
              uid: e91b6097-776d-4d97-a413-145ff92ea40c
            spec:
              backupName: mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a
              excludedResources:
              - nodes
              - events
              - events.events.k8s.io
              - backups.velero.io
              - restores.velero.io
              - resticrepositories.velero.io
              - csinodes.storage.k8s.io
              - volumeattachments.storage.k8s.io
              - backuprepositories.velero.io
              hooks:
                resources:
                - includedNamespaces:
                  - test-oadp-196
                  name: restore-hook-1
                  postHooks:
                  - exec:
                      command:
                      - sh
                      - -c
                      - while ! mysqladmin ping -h localhost --silent; do sleep 1; done
                      execTimeout: 4m0s
                      onError: Fail
                      waitTimeout: 2m0s
                  - exec:
                      command:
                      - mysql
                      - -u
                      - root
                      - -e
                      - source /test-data/world.sql
                      execTimeout: 4m0s
                      onError: Fail
                      waitTimeout: 2m0s
              itemOperationTimeout: 4h0m0s
            status:
              completionTimestamp: "2024-08-30T10:47:01Z"
              hookStatus:
                hooksAttempted: 1
              phase: Completed
              progress:
                itemsRestored: 30
                totalItems: 30
              restoreItemOperationsAttempted: 2
              restoreItemOperationsCompleted: 2
              startTimestamp: "2024-08-30T10:45:50Z"
              warnings: 7
            
            $  oc get datadownload
            NAME                                                         STATUS      STARTED   BYTES DONE   TOTAL BYTES   STORAGE LOCATION   AGE   NODE
            mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a-5ws8t   Completed   17m       105256269    105256269     ts-dpa-1           17m   oadp-95301-m9q8h-worker-b-lpth2
            mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a-bklrc   Completed   16m       107854713    107854713     ts-dpa-1           17m   oadp-95301-m9q8h-worker-b-lpth2
            

            Tested it multiple times no issues found. Moving this to verified status.

            Prasad Joshi added a comment - Verified with OADP 1.4.1-28 build.  1. Created DPA with nodeAgent and CSI enabled. oc get dpa ts-dpa -o yaml apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata:   creationTimestamp: "2024-08-30T10:40:46Z"   generation: 1   name: ts-dpa   namespace: openshift-adp   resourceVersion: "114035"   uid: e51555ef-0d80-4884-984c-2f05b650c076 spec:   backupLocations:   - velero:       credential:         key: cloud         name: cloud-credentials-gcp       default : true       objectStorage:         bucket: oadp95301m9q8h         prefix: velero-e2e-4eef4c9e-66bc-11ef-8620-845cf3eff33a       provider: gcp   configuration:     nodeAgent:       enable: true       podConfig:         resourceAllocations: {}       uploaderType: kopia     velero:       defaultPlugins:       - openshift       - gcp       - kubevirt       - csi   podDnsConfig: {}   snapshotLocations: [] status:   conditions:   - lastTransitionTime: "2024-08-30T10:40:47Z"     message: Reconcile complete     reason: Complete     status: "True"     type: Reconciled 2. Deployed stateful application and trigger datamover backup. $ oc get backup  mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a -o yaml apiVersion: velero.io/v1 kind: Backup metadata:   annotations:     velero.io/resource-timeout: 10m0s     velero.io/source-cluster-k8s-gitversion: v1.30.3     velero.io/source-cluster-k8s-major-version: "1"     velero.io/source-cluster-k8s-minor-version: "30"   creationTimestamp: "2024-08-30T10:43:20Z"   generation: 8   labels:     velero.io/storage-location: ts-dpa-1   name: mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a   namespace: openshift-adp   resourceVersion: "115759"   uid: 16e9642a-b6ec-4bfa-8bdd-b79c16f20fcc spec:   csiSnapshotTimeout: 10m0s   defaultVolumesToFsBackup: false   hooks: {}   includedNamespaces:   - test-oadp-196   itemOperationTimeout: 4h0m0s   metadata: {}   snapshotMoveData: true   storageLocation: ts-dpa-1   ttl: 720h0m0s status:   backupItemOperationsAttempted: 2   backupItemOperationsCompleted: 2   completionTimestamp: "2024-08-30T10:45:11Z"   expiration: "2024-09-29T10:43:20Z"   formatVersion: 1.1.0   hookStatus: {}   phase: Completed   progress:     itemsBackedUp: 46     totalItems: 46   startTimestamp: "2024-08-30T10:43:21Z"   version: 1 3. Removed app namespace and triggered restore $  oc get restore mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a -o yaml apiVersion: velero.io/v1 kind: Restore metadata:   creationTimestamp: "2024-08-30T10:45:50Z"   finalizers:   - restores.velero.io/external-resources-finalizer   generation: 9   name: mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a   namespace: openshift-adp   resourceVersion: "116631"   uid: e91b6097-776d-4d97-a413-145ff92ea40c spec:   backupName: mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a   excludedResources:   - nodes   - events   - events.events.k8s.io   - backups.velero.io   - restores.velero.io   - resticrepositories.velero.io   - csinodes.storage.k8s.io   - volumeattachments.storage.k8s.io   - backuprepositories.velero.io   hooks:     resources:     - includedNamespaces:       - test-oadp-196       name: restore-hook-1       postHooks:       - exec:           command:           - sh           - -c           - while ! mysqladmin ping -h localhost --silent; do sleep 1; done           execTimeout: 4m0s           onError: Fail           waitTimeout: 2m0s       - exec:           command:           - mysql           - -u           - root           - -e           - source /test-data/world.sql           execTimeout: 4m0s           onError: Fail           waitTimeout: 2m0s   itemOperationTimeout: 4h0m0s status:   completionTimestamp: "2024-08-30T10:47:01Z"   hookStatus:     hooksAttempted: 1   phase: Completed   progress:     itemsRestored: 30     totalItems: 30   restoreItemOperationsAttempted: 2   restoreItemOperationsCompleted: 2   startTimestamp: "2024-08-30T10:45:50Z"   warnings: 7 $  oc get datadownload NAME                                                         STATUS      STARTED   BYTES DONE   TOTAL BYTES   STORAGE LOCATION   AGE   NODE mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a-5ws8t   Completed   17m       105256269    105256269     ts-dpa-1           17m   oadp-95301-m9q8h-worker-b-lpth2 mysql-hooks-e2e-5036f2c7-66bc-11ef-8620-845cf3eff33a-bklrc   Completed   16m       107854713    107854713     ts-dpa-1           17m   oadp-95301-m9q8h-worker-b-lpth2 Tested it multiple times no issues found. Moving this to verified status.

              rhn-support-prajoshi Prasad Joshi
              akarol@redhat.com Aziza Karol
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: