-
Bug
-
Resolution: Done
-
Major
-
OADP 1.0.0, OADP 1.1.0, OADP 1.0.1, OADP 1.0.2, OADP 1.0.3
-
False
-
False
-
oadp-velero-plugin-container-1.1.0-17, oadp-operator-container-1.1.0-40
-
Passed
-
-
OADP Sprint 218
-
1
-
0
-
0
-
0
-
Untriaged
-
None
Problem Description:
Not 100% sure this is the root cause, but it seems like restic gets stuck on restore when app is deployed by DeploymentConfig.
- Doesn't happened with 2 stateful apps that do not have DeploymentConfig
- Tried other 3 apps with deploymentconfig - all have the same behavior.
- Issue doesn't occur when using volume snapshot
Observed Results:
(mtc-e2e-venv) [mperetz@mperetz mtc-e2e-qev2]$ oc get restore -n openshift-adp -o yaml apiVersion: v1 items: - apiVersion: velero.io/v1 kind: Restore metadata: creationTimestamp: "2021-11-30T16:08:02Z" generation: 11 name: mongodb123 namespace: openshift-adp resourceVersion: "915253" uid: ddbf6e75-ec7f-442b-93c5-778af79d52f5 spec: backupName: mongodb123 excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io restorePVs: true status: phase: InProgress progress: itemsRestored: 43 totalItems: 43 startTimestamp: "2021-11-30T16:08:02Z" kind: List metadata: resourceVersion: "" selfLink: "" (mtc-e2e-venv) [mperetz@mperetz mtc-e2e-qev2]$ oc get restore -n openshift-adp NAME AGE mongodb123 7m15s (mtc-e2e-venv) [mperetz@mperetz mtc-e2e-qev2]$
Getting these errors from velero:
time="2021-11-30T16:24:48Z" level=info msg="Backup storage location is invalid, marking as unavailable" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:117" time="2021-11-30T16:24:48Z" level=error msg="Current backup storage locations available/unavailable/unknown: 0/1/0, Backup storage location \"default\" is unavailable: rpc error: code = Unknown desc = AccessDenied: Access Denied\n\tstatus code: 403, request id: NZFNS1E4YBSA2C2R, host id: bFDBsSSnwrMsck8QvzR3QJ5enMszisF8RTdVcD+l+ui5FqPrnAyoHKpqqkMMQTTBmDyq1iKntZs=)" controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:164" time="2021-11-30T16:24:48Z" level=error msg="Current backup storage locations available/unavailable/unknown: 0/1/0)" controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:166"
Restic logs don't say much:
(mtc-e2e-venv) [mperetz@mperetz mtc-e2e-qev2]$ oc logs daemonset.apps/restic -n openshift-adp Found 6 pods, using pod/restic-nwgjp time="2021-11-30T16:05:14Z" level=info msg="Setting log-level to INFO" time="2021-11-30T16:05:14Z" level=info msg="Starting Velero restic server konveyor-dev (-)" logSource="pkg/cmd/cli/restic/server.go:87" 2021-11-30T16:05:14.496Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": ":8080"} time="2021-11-30T16:05:14Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:198" time="2021-11-30T16:05:14Z" level=info msg="Starting metric server for restic at address [:8085]" logSource="pkg/cmd/cli/restic/server.go:189" time="2021-11-30T16:05:14Z" level=info msg="Controllers starting..." logSource="pkg/cmd/cli/restic/server.go:249" 2021-11-30T16:05:14.552Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"} time="2021-11-30T16:05:14Z" level=info msg="Starting controller" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:76" time="2021-11-30T16:05:14Z" level=info msg="Waiting for caches to sync" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:81" time="2021-11-30T16:05:14Z" level=info msg="Starting controller" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:76" time="2021-11-30T16:05:14Z" level=info msg="Waiting for caches to sync" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:81" time="2021-11-30T16:05:14Z" level=info msg="Caches are synced" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:85" time="2021-11-30T16:05:14Z" level=info msg="Caches are synced" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:85"
Restic pods are always in ready and running:
**
NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-554545f7d6-99spc 1/1 Running 0 69m openshift-adp-controller-manager-d79f5fcd6-8lhz9 2/2 Running 0 132m restic-27v2p 1/1 Running 0 69m restic-56nqh 1/1 Running 0 69m restic-bkx25 1/1 Running 0 69m restic-h2pfk 1/1 Running 0 69m restic-vchhp 1/1 Running 0 69m restic-ws54h 1/1 Running 0 69m velero-769494ddb9-86lgq 1/1 Running 0 2m57s
Version: 0.5.0
Steps to reproduce:
- clone this repo https://gitlab.cee.redhat.com/app-mig/cam-e2e-qe
- run some playbook to deploy app with DeploymentConfig, for example:
ansible-playbook cam-e2e-qe/deploy-app.yml -e use_role=roles/ocp-redis/ -e namespace=redis-ns - Create a backup:
cat <<EOF | oc create -f -
apiVersion: velero.io/v1
kind: Backup
metadata:
name: redis-ns
labels:
velero.io/storage-location: example-velero-1
namespace: openshift-adp
spec:
hooks: {}
includedNamespaces:
- redis-ns
storageLocation: example-velero-1
defaultVolumesToRestic: true
snapshotVolumes: false
ttl: 720h0m0s
EOF - delete the project
oc delete project redis-ns - create restore:
cat <<EOF | oc create -f -
apiVersion: velero.io/v1
kind: Restore
metadata:
name: redis-ns
namespace: openshift-adp
spec:
backupName: redis-ns
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
restorePVs: true
EOF - Note that the restore is stuck after all items seem to be restored on InProgress