-
Bug
-
Resolution: Won't Do
-
Normal
-
OADP 1.2.2, oadp 1.1.5
-
1
-
False
-
-
False
-
ToDo
-
-
-
0
-
0.000
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
No
Description of problem:
Restore is getting stuck in InProgress status for DeploymentConfig application when PV's are backed up via PV opt in approach. For other k8s resources such as Deployment, StatefulSet its working fine. Attached error below:-
$ oc logs velero-798755899f-lhx84| grep error time="2023-10-09T10:53:35Z" level=error msg="Failed to check node-agent pod status, disengage" error="pods \"postgresql-1-8pd7b\" not found" logSource="/remote-source/velero/app/pkg/podvolume/restorer.go:206"
Version-Release number of selected component (if applicable):
OADP 1.2.2
How reproducible:
Always
Steps to Reproduce:
1. Create DPA with restic enabled.
$ oc get dpa -o yaml ts-dpa apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: ts-dpa namespace: openshift-adp resourceVersion: "117481" uid: dbf952f8-d6e0-4269-8b2e-034b407e05a5 spec: backupLocations: - velero: default: true objectStorage: bucket: oadpbucket238227 prefix: velero provider: gcp configuration: restic: enable: true velero: defaultPlugins: - gcp - openshift status: conditions: - lastTransitionTime: "2023-10-09T09:41:37Z" message: Reconcile complete reason: Complete status: "True" type: Reconciled
2. Deploy ocp-django application.
$ appm deploy ocp-django $ oc get pod -n ocp-django NAME READY STATUS RESTARTS AGE django-psql-persistent-1-build 0/1 Completed 0 2m21s django-psql-persistent-1-deploy 0/1 Completed 0 99s django-psql-persistent-1-q64l9 1/1 Running 0 97s postgresql-1-8pd7b 1/1 Running 0 2m18s postgresql-1-deploy 0/1 Completed 0 2m20s
3. Add backup-volume annotation to postgreSQL pod.
$ oc get pod -n ocp-django postgresql-1-8pd7b -o yaml volumes: - name: postgresql-data persistentVolumeClaim: claimName: postgresql
4. Execute backup with defaultVolumesToFSBackup set as false.
$ oc get backup -o yaml test-backup3 -o yaml apiVersion: velero.io/v1 kind: Backup metadata: annotations: velero.io/source-cluster-k8s-gitversion: v1.27.6+6936c15 velero.io/source-cluster-k8s-major-version: "1" velero.io/source-cluster-k8s-minor-version: "27" name: test-backup3 namespace: openshift-adp spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: false includedNamespaces: - ocp-django itemOperationTimeout: 1h0m0s storageLocation: ts-dpa-1 ttl: 720h0m0s status: completionTimestamp: "2023-10-09T10:49:30Z" expiration: "2023-11-08T10:49:12Z" formatVersion: 1.1.0 phase: Completed progress: itemsBackedUp: 94 totalItems: 94 startTimestamp: "2023-10-09T10:49:12Z" version: 1
5. Verify PodVolumebackup CR created for the volume.
$ oc get podvolumebackup test-backup3-5rtsl NAME STATUS CREATED NAMESPACE POD VOLUME REPOSITORY ID UPLOADER TYPE STORAGE LOCATION AGE test-backup3-5rtsl Completed 2m31s ocp-django postgresql-1-8pd7b postgresql-data gs:oadpbucket238227:/velero/restic/ocp-django restic ts-dpa-1 2m31s
6. Removed app namespace
7. Execute restore
Actual results:
Restore gets stuck inProgress status waiting for all the podvolumeRestore to complete.
spec:
backupName: test-backup3
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
- csinodes.storage.k8s.io
- volumeattachments.storage.k8s.io
- backuprepositories.velero.io
itemOperationTimeout: 1h0m0s
status:
phase: InProgress
progress:
itemsRestored: 47
totalItems: 47
startTimestamp: "2023-10-09T10:53:18Z"
PodVolumeRestore doesn't have any status.
$ oc get podvolumerestore test-restore3-wq72j NAME NAMESPACE POD UPLOADER TYPE VOLUME STATUS TOTALBYTES BYTESDONE AGE test-restore3-wq72j ocp-django postgresql-1-8pd7b restic postgresql-data 13m
Expected results:
Restore should get completed successfully.
Additional info: