-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
Problem description: When creating a DPA instance with S3-compatible provider (minio, noobaa etc..) and then replacing it with AWS S3, the oadp-example-velero-1-aws-registry pod crashes with CrashLoopBackOff status. Any attempt of re-creating the secret and the DPA fails in the same manner, unless re-installing the the OADP operator.
Reconstruct frequency: 100%
Observed Results:
Creating DPA instance with AWS S3 works initially - WORKS AS EXPECTED:
(mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 11s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ BUCKET=mayapvelerooadp (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ REGION=us-east-2 (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=~/git/velero-examples/aws-creds secret/cloud-credentials created (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ cat <<EOF | oc create -f - > apiVersion: oadp.openshift.io/v1alpha1 > kind: DataProtectionApplication > metadata: > namespace: openshift-adp > name: example-velero > spec: > configuration: > velero: > defaultPlugins: > - openshift > - csi > - aws > restic: > enable: true > backupLocations: > - name: default > velero: > provider: aws > default: true > objectStorage: > bucket: $BUCKET > prefix: velero > config: > profile: default > region: $REGION > credential: > name: cloud-credentials > key: cloud > snapshotLocations: > - name: default > velero: > default: true > provider: aws > config: > region: $REGION > profile: default > # olmManaged: true > EOF dataprotectionapplication.oadp.openshift.io/example-velero created (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-888ff7cf-wk4kp 0/1 Running 0 5s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 33s restic-7snmv 1/1 Running 0 5s restic-9xrhc 1/1 Running 0 5s restic-b9dgd 1/1 Running 0 5s restic-gk66f 1/1 Running 0 5s restic-j4ldp 1/1 Running 0 5s restic-rpqpd 1/1 Running 0 5s velero-bc86b77d9-hrmpj 1/1 Running 0 5s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-888ff7cf-wk4kp 0/1 Running 0 7s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 35s restic-7snmv 1/1 Running 0 7s restic-9xrhc 1/1 Running 0 7s restic-b9dgd 1/1 Running 0 7s restic-gk66f 1/1 Running 0 7s restic-j4ldp 1/1 Running 0 7s restic-rpqpd 1/1 Running 0 7s velero-bc86b77d9-hrmpj 1/1 Running 0 7s NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-888ff7cf-wk4kp 0/1 Running 0 17s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 45s restic-7snmv 1/1 Running 0 17s restic-9xrhc 1/1 Running 0 17s restic-b9dgd 1/1 Running 0 17s restic-gk66f 1/1 Running 0 17s restic-j4ldp 1/1 Running 0 17s restic-rpqpd 1/1 Running 0 17s velero-bc86b77d9-hrmpj 1/1 Running 0 17s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-888ff7cf-wk4kp 1/1 Running 0 21s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 49s restic-7snmv 1/1 Running 0 21s restic-9xrhc 1/1 Running 0 21s restic-b9dgd 1/1 Running 0 21s restic-gk66f 1/1 Running 0 21s restic-j4ldp 1/1 Running 0 21s restic-rpqpd 1/1 Running 0 21s velero-bc86b77d9-hrmpj 1/1 Running 0 21s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$
After deleting the DPA instance & secret and creating secret & DPA with S3-compatible provider(minio in this example) - WORKS AS EXPECTED:
(mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc delete dpa -n openshift-adp --all dataprotectionapplication.oadp.openshift.io "example-velero" deleted (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc delete secret cloud-credentials --namespace openshift-adp secret "cloud-credentials" deleted (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ cat > minio-creds <<EOF > [minio] > aws_access_key_id=minio > aws_secret_access_key=minio123 > EOF (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=minio-creds secret/cloud-credentials created (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ cat <<EOF | oc create -f - > apiVersion: oadp.openshift.io/v1alpha1 > kind: DataProtectionApplication > metadata: > namespace: openshift-adp > name: example-velero > spec: > configuration: > velero: > defaultPlugins: > - openshift > - csi > - aws > featureFlags: > - EnableCSI > restic: > enable: true > backupLocations: > - name: default > velero: > provider: aws > default: true > objectStorage: > bucket: $BUCKET > prefix: velero > config: > insecureSkipTLSVerify: 'true' > profile: minio > region: minio > s3ForcePathStyle: 'true' > s3Url: http://$(oc get route minio -n minio -o jsonpath='{.spec.host}') > credential: > name: cloud-credentials > key: cloud > # olmManaged: true > EOF dataprotectionapplication.oadp.openshift.io/example-velero created (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-7d58d6ccc5-2msh5 0/1 Running 0 15s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 2m58s restic-5nttn 1/1 Running 0 15s restic-6l5tr 1/1 Running 0 15s restic-c7ldq 1/1 Running 0 15s restic-m6lqg 1/1 Running 0 15s restic-m97hc 1/1 Running 0 15s restic-mx77s 1/1 Running 0 15s velero-bc86b77d9-22j9g 1/1 Running 0 15s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-7d58d6ccc5-2msh5 0/1 Running 0 17s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 3m restic-5nttn 1/1 Running 0 17s restic-6l5tr 1/1 Running 0 17s restic-c7ldq 1/1 Running 0 17s restic-m6lqg 1/1 Running 0 17s restic-m97hc 1/1 Running 0 17s restic-mx77s 1/1 Running 0 17s velero-bc86b77d9-22j9g 1/1 Running 0 17s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$
Now again, deleting secret and DPA and setting secret & DPA with AWS S3 - Fails:
(mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc delete secret cloud-credentials --namespace openshift-adp secret "cloud-credentials" deleted (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc delete dpa -n openshift-adp --all dataprotectionapplication.oadp.openshift.io "example-velero" deleted (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=~/git/velero-examples/aws-creds secret/cloud-credentials created (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ cat <<EOF | oc create -f - > apiVersion: oadp.openshift.io/v1alpha1 > kind: DataProtectionApplication > metadata: > namespace: openshift-adp > name: example-velero > spec: > configuration: > velero: > defaultPlugins: > - openshift > - csi > - aws > restic: > enable: true > backupLocations: > - name: default > velero: > provider: aws > default: true > objectStorage: > bucket: $BUCKET > prefix: velero > config: > profile: default > region: $REGION > credential: > name: cloud-credentials > key: cloud > snapshotLocations: > - name: default > velero: > default: true > provider: aws > config: > region: $REGION > profile: default > # olmManaged: true > EOF dataprotectionapplication.oadp.openshift.io/example-velero created (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-748b8c56cc-7wqqx 0/1 Running 0 9s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 3m47s restic-9kvh7 1/1 Running 0 8s restic-clkl4 1/1 Running 0 8s restic-hvm5p 1/1 Running 0 8s restic-hwsfj 1/1 Running 0 8s restic-kwgw6 1/1 Running 0 8s restic-zwfnf 1/1 Running 0 8s velero-bc86b77d9-dvns5 1/1 Running 0 8s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-748b8c56cc-7wqqx 0/1 Running 0 26s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 4m4s restic-9kvh7 1/1 Running 0 25s restic-clkl4 1/1 Running 0 25s restic-hvm5p 1/1 Running 0 25s restic-hwsfj 1/1 Running 0 25s restic-kwgw6 1/1 Running 0 25s restic-zwfnf 1/1 Running 0 25s velero-bc86b77d9-dvns5 1/1 Running 0 25s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-748b8c56cc-7wqqx 0/1 Running 2 (7s ago) 67s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 4m45s restic-9kvh7 1/1 Running 0 66s restic-clkl4 1/1 Running 0 66s restic-hvm5p 1/1 Running 0 66s restic-hwsfj 1/1 Running 0 66s restic-kwgw6 1/1 Running 0 66s restic-zwfnf 1/1 Running 0 66s velero-bc86b77d9-dvns5 1/1 Running 0 66s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp NAME READY STATUS RESTARTS AGE oadp-example-velero-1-aws-registry-748b8c56cc-7wqqx 0/1 CrashLoopBackOff 5 (53s ago) 4m38s openshift-adp-controller-manager-59f48998d7-q2r9v 2/2 Running 0 8m16s restic-9kvh7 1/1 Running 0 4m37s restic-clkl4 1/1 Running 0 4m37s restic-hvm5p 1/1 Running 0 4m37s restic-hwsfj 1/1 Running 0 4m37s restic-kwgw6 1/1 Running 0 4m37s restic-zwfnf 1/1 Running 0 4m37s velero-bc86b77d9-dvns5 1/1 Running 0 4m37s (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$
Also observed in the pod's logs those errors:
time="2021-11-30T08:27:43.98247765Z" level=debug msg="s3aws.Walk(\"/docker/registry/v2/repositories\")" go.version=go1.17.2 http.request.host="10.129.2.46:5000" http.request.id=66ba51b2-9839-494c-a469-86f8f1e33f38 http.request.method=GET http.request.remoteaddr="10.129.2.1:49480" http.request.uri="/v2/_catalog?n=5" http.request.useragent=kube-probe/1.22 trace.duration=3.572922ms trace.file=/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Walk" trace.id=73dc041d-73fe-4102-bf09-bce7e71e18d2 trace.line=232 time="2021-11-30T08:27:43.982551427Z" level=error msg="response completed with error" err.code=unknown err.detail="s3aws: InvalidAccessKeyId: The Access Key Id you provided does not exist in our records.\n\tstatus code: 403, request id: 16BC45936EFFBFF4, host id: " err.message="unknown error" go.version=go1.17.2 http.request.host="10.129.2.46:5000" http.request.id=66ba51b2-9839-494c-a469-86f8f1e33f38 http.request.method=GET http.request.remoteaddr="10.129.2.1:49480" http.request.uri="/v2/_catalog?n=5" http.request.useragent=kube-probe/1.22 http.response.contenttype=application/json http.response.duration=6.495369ms http.response.status=500 http.response.written=123
Steps to reconstruct:
1. Install OADP.
2. Create S3 bucket & credentials for velero: http://pastebin.test.redhat.com/1011899
3. Install minio.
4. Create minio credentials, bucket and set DPA instance: ** http://pastebin.test.redhat.com/1011901
- Verify DPA is successfully deployed and all the pods within openshift-adp are running and ready
5. delete the dpa instance and the secret:
oc delete secret cloud-credentials --namespace openshift-adp && oc delete dpa -n openshift-adp --all
6. Create secret for AWS S3 with the credentials file created in step 2 & create DPA instance:
oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=aws-credscat <<EOF | oc create -f - apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: namespace: openshift-adp name: example-velero spec: configuration: velero: defaultPlugins: - openshift - csi - aws restic: enable: true backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: $BUCKET prefix: velero config: profile: default region: $REGION credential: name: cloud-credentials key: cloud snapshotLocations: - name: default velero: default: true provider: aws config: region: $REGION profile: default olmManaged: true EOF
- Note that after few restarts, the oadp-example-velero-1-aws-registry pod goes into CrashLoopBackOff
7. Try to delete secret and and DPA instatnce and try step 6 again after a while.
- note that you get the same results
8. delete the project and repeat step 6
- note that after few seconds all the pods in openshift-adp become ready and are running.