Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-147

oadp-example-velero-1-aws-registry pod crashes with CrashLoopBackOff after replacing S3-compatible storage with AWS S3

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • OADP 1.0.0
    • None
    • None
    • None
    • False
    • False
    • Passed
    • 0
    • 0
    • 0
    • Untriaged
    • None

      Problem description: When creating a DPA instance with S3-compatible provider (minio, noobaa etc..) and then replacing it with AWS S3, the oadp-example-velero-1-aws-registry pod crashes with CrashLoopBackOff status. Any attempt of re-creating the secret and the DPA fails in the same manner, unless re-installing the the OADP operator.

      Reconstruct frequency: 100%

      Observed Results:
      Creating DPA instance with AWS S3 works initially - WORKS AS EXPECTED:

      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                READY   STATUS    RESTARTS   AGE
      openshift-adp-controller-manager-59f48998d7-q2r9v   2/2     Running   0          11s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ BUCKET=mayapvelerooadp 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ REGION=us-east-2
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=~/git/velero-examples/aws-creds
      secret/cloud-credentials created
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ cat <<EOF | oc create -f -
      > apiVersion: oadp.openshift.io/v1alpha1
      > kind: DataProtectionApplication
      > metadata:
      >   namespace: openshift-adp
      >   name: example-velero
      > spec:
      >   configuration:
      >     velero:
      >       defaultPlugins:
      >       - openshift
      >       - csi
      >       - aws
      >     restic:
      >       enable: true
      >   backupLocations:
      >     - name: default
      >       velero:
      >         provider: aws
      >         default: true
      >         objectStorage:
      >           bucket: $BUCKET
      >           prefix: velero
      >         config:
      >           profile: default
      >           region: $REGION
      >         credential:
      >           name: cloud-credentials
      >           key: cloud
      >   snapshotLocations:
      >     - name: default
      >       velero:
      >         default: true
      >         provider: aws
      >         config:
      >           region: $REGION
      >           profile: default
      > #  olmManaged: true
      > EOF
      dataprotectionapplication.oadp.openshift.io/example-velero created
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                READY   STATUS    RESTARTS   AGE
      oadp-example-velero-1-aws-registry-888ff7cf-wk4kp   0/1     Running   0          5s
      openshift-adp-controller-manager-59f48998d7-q2r9v   2/2     Running   0          33s
      restic-7snmv                                        1/1     Running   0          5s
      restic-9xrhc                                        1/1     Running   0          5s
      restic-b9dgd                                        1/1     Running   0          5s
      restic-gk66f                                        1/1     Running   0          5s
      restic-j4ldp                                        1/1     Running   0          5s
      restic-rpqpd                                        1/1     Running   0          5s
      velero-bc86b77d9-hrmpj                              1/1     Running   0          5s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                READY   STATUS    RESTARTS   AGE
      oadp-example-velero-1-aws-registry-888ff7cf-wk4kp   0/1     Running   0          7s
      openshift-adp-controller-manager-59f48998d7-q2r9v   2/2     Running   0          35s
      restic-7snmv                                        1/1     Running   0          7s
      restic-9xrhc                                        1/1     Running   0          7s
      restic-b9dgd                                        1/1     Running   0          7s
      restic-gk66f                                        1/1     Running   0          7s
      restic-j4ldp                                        1/1     Running   0          7s
      restic-rpqpd                                        1/1     Running   0          7s
      velero-bc86b77d9-hrmpj                              1/1     Running   0          7s
      
      NAME                                                READY   STATUS    RESTARTS   AGE
      oadp-example-velero-1-aws-registry-888ff7cf-wk4kp   0/1     Running   0          17s
      openshift-adp-controller-manager-59f48998d7-q2r9v   2/2     Running   0          45s
      restic-7snmv                                        1/1     Running   0          17s
      restic-9xrhc                                        1/1     Running   0          17s
      restic-b9dgd                                        1/1     Running   0          17s
      restic-gk66f                                        1/1     Running   0          17s
      restic-j4ldp                                        1/1     Running   0          17s
      restic-rpqpd                                        1/1     Running   0          17s
      velero-bc86b77d9-hrmpj                              1/1     Running   0          17s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                READY   STATUS    RESTARTS   AGE
      oadp-example-velero-1-aws-registry-888ff7cf-wk4kp   1/1     Running   0          21s
      openshift-adp-controller-manager-59f48998d7-q2r9v   2/2     Running   0          49s
      restic-7snmv                                        1/1     Running   0          21s
      restic-9xrhc                                        1/1     Running   0          21s
      restic-b9dgd                                        1/1     Running   0          21s
      restic-gk66f                                        1/1     Running   0          21s
      restic-j4ldp                                        1/1     Running   0          21s
      restic-rpqpd                                        1/1     Running   0          21s
      velero-bc86b77d9-hrmpj                              1/1     Running   0          21s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      

      After deleting the DPA instance & secret and creating secret & DPA with S3-compatible provider(minio in this example) - WORKS AS EXPECTED:

      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc delete dpa -n openshift-adp --all
      dataprotectionapplication.oadp.openshift.io "example-velero" deleted
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc delete secret cloud-credentials --namespace openshift-adp
      secret "cloud-credentials" deleted
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ cat > minio-creds <<EOF
      > [minio]
      > aws_access_key_id=minio
      > aws_secret_access_key=minio123
      > EOF
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=minio-creds
      secret/cloud-credentials created
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ cat <<EOF | oc create -f -
      > apiVersion: oadp.openshift.io/v1alpha1
      > kind: DataProtectionApplication
      > metadata:
      >   namespace: openshift-adp
      >   name: example-velero
      > spec:
      >   configuration:
      >     velero:
      >       defaultPlugins:
      >       - openshift
      >       - csi
      >       - aws
      >       featureFlags:
      >       - EnableCSI
      >     restic:
      >       enable: true
      >   backupLocations:
      >     - name: default
      >       velero:
      >         provider: aws
      >         default: true
      >         objectStorage:
      >           bucket: $BUCKET
      >           prefix: velero
      >         config:
      >           insecureSkipTLSVerify: 'true'
      >           profile: minio
      >           region: minio
      >           s3ForcePathStyle: 'true'
      >           s3Url: http://$(oc get route minio -n minio -o jsonpath='{.spec.host}')
      >         credential:
      >           name: cloud-credentials
      >           key: cloud
      > #  olmManaged: true
      > EOF
      
      dataprotectionapplication.oadp.openshift.io/example-velero created
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp                             
      NAME                                                  READY   STATUS    RESTARTS   AGE
      oadp-example-velero-1-aws-registry-7d58d6ccc5-2msh5   0/1     Running   0          15s
      openshift-adp-controller-manager-59f48998d7-q2r9v     2/2     Running   0          2m58s
      restic-5nttn                                          1/1     Running   0          15s
      restic-6l5tr                                          1/1     Running   0          15s
      restic-c7ldq                                          1/1     Running   0          15s
      restic-m6lqg                                          1/1     Running   0          15s
      restic-m97hc                                          1/1     Running   0          15s
      restic-mx77s                                          1/1     Running   0          15s
      velero-bc86b77d9-22j9g                                1/1     Running   0          15s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                  READY   STATUS    RESTARTS   AGE
      oadp-example-velero-1-aws-registry-7d58d6ccc5-2msh5   0/1     Running   0          17s
      openshift-adp-controller-manager-59f48998d7-q2r9v     2/2     Running   0          3m
      restic-5nttn                                          1/1     Running   0          17s
      restic-6l5tr                                          1/1     Running   0          17s
      restic-c7ldq                                          1/1     Running   0          17s
      restic-m6lqg                                          1/1     Running   0          17s
      restic-m97hc                                          1/1     Running   0          17s
      restic-mx77s                                          1/1     Running   0          17s
      velero-bc86b77d9-22j9g                                1/1     Running   0          17s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      

      Now again, deleting secret and DPA and setting secret & DPA with AWS S3 - Fails:

      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc delete secret cloud-credentials --namespace openshift-adp                             
      secret "cloud-credentials" deleted
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc delete dpa -n openshift-adp --all                             
      dataprotectionapplication.oadp.openshift.io "example-velero" deleted
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=~/git/velero-examples/aws-creds
      secret/cloud-credentials created
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ cat <<EOF | oc create -f -
      > apiVersion: oadp.openshift.io/v1alpha1
      > kind: DataProtectionApplication
      > metadata:
      >   namespace: openshift-adp
      >   name: example-velero
      > spec:
      >   configuration:
      >     velero:
      >       defaultPlugins:
      >       - openshift
      >       - csi
      >       - aws
      >     restic:
      >       enable: true
      >   backupLocations:
      >     - name: default
      >       velero:
      >         provider: aws
      >         default: true
      >         objectStorage:
      >           bucket: $BUCKET
      >           prefix: velero
      >         config:
      >           profile: default
      >           region: $REGION
      >         credential:
      >           name: cloud-credentials
      >           key: cloud
      >   snapshotLocations:
      >     - name: default
      >       velero:
      >         default: true
      >         provider: aws
      >         config:
      >           region: $REGION
      >           profile: default
      > #  olmManaged: true
      > EOF
      dataprotectionapplication.oadp.openshift.io/example-velero created
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                  READY   STATUS    RESTARTS   AGE
      oadp-example-velero-1-aws-registry-748b8c56cc-7wqqx   0/1     Running   0          9s
      openshift-adp-controller-manager-59f48998d7-q2r9v     2/2     Running   0          3m47s
      restic-9kvh7                                          1/1     Running   0          8s
      restic-clkl4                                          1/1     Running   0          8s
      restic-hvm5p                                          1/1     Running   0          8s
      restic-hwsfj                                          1/1     Running   0          8s
      restic-kwgw6                                          1/1     Running   0          8s
      restic-zwfnf                                          1/1     Running   0          8s
      velero-bc86b77d9-dvns5                                1/1     Running   0          8s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                  READY   STATUS    RESTARTS   AGE
      oadp-example-velero-1-aws-registry-748b8c56cc-7wqqx   0/1     Running   0          26s
      openshift-adp-controller-manager-59f48998d7-q2r9v     2/2     Running   0          4m4s
      restic-9kvh7                                          1/1     Running   0          25s
      restic-clkl4                                          1/1     Running   0          25s
      restic-hvm5p                                          1/1     Running   0          25s
      restic-hwsfj                                          1/1     Running   0          25s
      restic-kwgw6                                          1/1     Running   0          25s
      restic-zwfnf                                          1/1     Running   0          25s
      velero-bc86b77d9-dvns5                                1/1     Running   0          25s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                  READY   STATUS    RESTARTS     AGE
      oadp-example-velero-1-aws-registry-748b8c56cc-7wqqx   0/1     Running   2 (7s ago)   67s
      openshift-adp-controller-manager-59f48998d7-q2r9v     2/2     Running   0            4m45s
      restic-9kvh7                                          1/1     Running   0            66s
      restic-clkl4                                          1/1     Running   0            66s
      restic-hvm5p                                          1/1     Running   0            66s
      restic-hwsfj                                          1/1     Running   0            66s
      restic-kwgw6                                          1/1     Running   0            66s
      restic-zwfnf                                          1/1     Running   0            66s
      velero-bc86b77d9-dvns5                                1/1     Running   0            66s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ oc get pods -n openshift-adp
      NAME                                                  READY   STATUS             RESTARTS      AGE
      oadp-example-velero-1-aws-registry-748b8c56cc-7wqqx   0/1     CrashLoopBackOff   5 (53s ago)   4m38s
      openshift-adp-controller-manager-59f48998d7-q2r9v     2/2     Running            0             8m16s
      restic-9kvh7                                          1/1     Running            0             4m37s
      restic-clkl4                                          1/1     Running            0             4m37s
      restic-hvm5p                                          1/1     Running            0             4m37s
      restic-hwsfj                                          1/1     Running            0             4m37s
      restic-kwgw6                                          1/1     Running            0             4m37s
      restic-zwfnf                                          1/1     Running            0             4m37s
      velero-bc86b77d9-dvns5                                1/1     Running            0             4m37s
      (mtc-e2e-venv) [mperetz@mperetz oadp-operator]$ 
      

      Also observed in the pod's logs those errors:

      time="2021-11-30T08:27:43.98247765Z" level=debug msg="s3aws.Walk(\"/docker/registry/v2/repositories\")" go.version=go1.17.2 http.request.host="10.129.2.46:5000" http.request.id=66ba51b2-9839-494c-a469-86f8f1e33f38 http.request.method=GET http.request.remoteaddr="10.129.2.1:49480" http.request.uri="/v2/_catalog?n=5" http.request.useragent=kube-probe/1.22 trace.duration=3.572922ms trace.file=/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Walk" trace.id=73dc041d-73fe-4102-bf09-bce7e71e18d2 trace.line=232
      time="2021-11-30T08:27:43.982551427Z" level=error msg="response completed with error" err.code=unknown err.detail="s3aws: InvalidAccessKeyId: The Access Key Id you provided does not exist in our records.\n\tstatus code: 403, request id: 16BC45936EFFBFF4, host id: " err.message="unknown error" go.version=go1.17.2 http.request.host="10.129.2.46:5000" http.request.id=66ba51b2-9839-494c-a469-86f8f1e33f38 http.request.method=GET http.request.remoteaddr="10.129.2.1:49480" http.request.uri="/v2/_catalog?n=5" http.request.useragent=kube-probe/1.22 http.response.contenttype=application/json http.response.duration=6.495369ms http.response.status=500 http.response.written=123
      

      Steps to reconstruct:

      1. Install OADP.

      2. Create S3 bucket & credentials for velero: http://pastebin.test.redhat.com/1011899

      3. Install minio.

      4. Create minio credentials, bucket and set DPA instance: ** http://pastebin.test.redhat.com/1011901

      • Verify DPA is successfully deployed and all the pods within openshift-adp are running and ready

      5. delete the dpa instance and the secret:
      oc delete secret cloud-credentials --namespace openshift-adp && oc delete dpa -n openshift-adp --all

      6. Create secret for AWS S3 with the credentials file created in step 2 & create DPA instance:

       oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=aws-credscat <<EOF | oc create -f -
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        namespace: openshift-adp
        name: example-velero
      spec:
        configuration:
          velero:
            defaultPlugins:
            - openshift
            - csi
            - aws
          restic:
            enable: true
        backupLocations:
          - name: default
            velero:
              provider: aws
              default: true
              objectStorage:
                bucket: $BUCKET
                prefix: velero
              config:
                profile: default
                region: $REGION
              credential:
                name: cloud-credentials
                key: cloud   
        snapshotLocations:
          - name: default
            velero:
              default: true
              provider: aws
              config:
                region: $REGION
                profile: default
        olmManaged: true
      EOF
      
      • Note that after few restarts, the oadp-example-velero-1-aws-registry pod goes into CrashLoopBackOff

      7. Try to delete secret and and DPA instatnce and try step 6 again after a while.

      • note that you get the same results

      8. delete the project and repeat step 6

      • note that after few seconds all the pods in openshift-adp become ready and are running.

              emcmulla@redhat.com Emily McMullan
              mperetz@redhat.com Maya Peretz
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: