• Icon: Sub-task Sub-task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • 4
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0.000
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

      Description of problem:

      CSI restore fails on a new cluster with Datamover enabled on both the clusters.

      # oc logs csi-snapshot-controller-7b4dbd9b55-9xx4v -n openshift-cluster-storage-operator
      I0923 15:59:07.974528       1 main.go:125] Version: v4.12.0-202208231618.p0.g5a93140.assembly.stream-0-gb48c7b0-dirty
      I0923 15:59:07.984341       1 main.go:174] Start NewCSISnapshotController with kubeconfig [] resyncPeriod [15m0s]
      E0923 16:00:08.139182       1 main.go:86] Failed to list v1 volumesnapshots with error=Get "https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1/volumesnapshots": dial tcp 172.30.0.1:443: i/o timeout
      I0923 16:00:14.047264       1 leaderelection.go:248] attempting to acquire leader lease openshift-cluster-storage-operator/snapshot-controller-leader...
      I0923 16:00:14.326347       1 leaderelection.go:258] successfully acquired lease openshift-cluster-storage-operator/snapshot-controller-leader
      I0923 16:00:14.327196       1 leader_election.go:178] became leader, starting
      I0923 16:00:14.340326       1 snapshot_controller_base.go:152] Starting snapshot controller
      I0929 10:59:47.167089       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-w9mr7 was successfully created by the CSI driver.
      I0929 10:59:47.167167       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-w9mr7 is ready to use.
      I1006 16:05:07.772888       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI
      driver.
      I1006 16:05:07.772958       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use.
      I1008 10:01:56.228759       1 snapshot_controller_base.go:269] deletion of snapshot "ns-3abd639f-018b-489d-b3b4-ba9a02fa1782/name-3abd639f-018b-489d-b3b4-ba9a02fa1782" was already processed
      I1008 10:45:51.057537       1 snapshot_controller_base.go:269] deletion of snapshot "ns-b76a37bb-d8d5-4141-a494-2299231f5d9f/name-b76a37bb-d8d5-4141-a494-2299231f5d9f" was already processed
      E1010 09:27:06.437440       1 snapshot_controller_base.go:403] could not sync snapshot "ocp-mysql/velero-mysql-c7ckk": snapshot controller failed to update velero-mysql-c7ckk on API server: Operation cannot be fulfilled on volumesnapshots.snapshot.storage.k8s.io "velero-mysql-c7ckk": StorageError: invalid object, Code: 4, Key: /kubernetes.io/snapshot.storage.k8s.io/volumesnapshots/ocp-mysql/velero-mysql-c7ckk, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 6f5ca640-0dfa-4a9c-8696-7a52fc93bae3, UID in object meta:
      I1010 09:27:07.440434       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed
      I1010 09:27:36.803123       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed
      I1010 09:27:47.630757       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-w9mr7" was already processed
      I1010 09:29:52.134217       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI
      driver.
      I1010 09:29:52.134265       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:
      1. Install Volsync from latest stable channel from the operator hub on both the clusters
      2. Create a secret 
      $ oc create secret generic <secret-name> -n openshift-adp --from-file cloud=<credentials file path>
      3.  Create a DPA instance with dataMover:{ enable: true} and csi plugin listed on the defaultPlugins list on both the clusters.

      For example:

      • apiVersion: oadp.openshift.io/v1alpha1
          kind: DataProtectionApplication
          metadata:
            name: example-velero
            namespace: openshift-adp
          spec:
            backupLocations:
            - velero:
                config:
                  profile: default
                  region: eu-central-1
                credential:
                  key: cloud
                  name: cloud-credentials
                default: true
                objectStorage:
                  bucket: myoadptestbucket
                  prefix: velero
                provider: aws
            configuration:
              velero:
                defaultPlugins:
                - openshift
                - aws
                - kubevirt
                - csi
            features:
              dataMover:
                enable: true
        kind: List
        metadata:
          resourceVersion: ""
          selfLink: ""
        EOF

       

      4. Edit the volumesnapshotclass as follows:

      1. oc get vsclass ocs-storagecluster-cephfsplugin-snapclass -oyaml
        apiVersion: snapshot.storage.k8s.io/v1
        deletionPolicy: Retain
        driver: openshift-storage.cephfs.csi.ceph.com
        kind: VolumeSnapshotClass
        metadata:
          creationTimestamp: "2022-09-26T08:33:11Z"
          generation: 2
          labels:
            velero.io/csi-volumesnapshot-class: "true"
          name: ocs-storagecluster-cephfsplugin-snapclass
          resourceVersion: "1710327"
          uid: 9185ea19-f13f-479f-8d7a-68da472d7d2a
        parameters:
          clusterID: openshift-storage
          csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner
          csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage

      5. Deploy a sample application 

       

      # oc get all,pvc -n ocp-mysql
      NAME                        READY   STATUS    RESTARTS   AGE
      pod/mysql-6d64ccd47-b4vwr   1/1     Running   0          3d19h
      NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
      service/mysql   ClusterIP   172.30.175.178   <none>        3306/TCP   3d19h
      NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
      deployment.apps/mysql   1/1     1            1           3d19h
      NAME                              DESIRED   CURRENT   READY   AGE
      replicaset.apps/mysql-6d64ccd47   1         1         1       3d19h
      NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
      persistentvolumeclaim/mysql   Bound    pvc-fd0159b3-f7ba-4fb0-9a76-e7f8fe988457   2Gi        RWO            ocs-storagecluster-cephfs   3d19h
      

       

      6. Create backup on cluster1

      # cat backup.yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        name: velero-sample1-withcsi-ocpmysql
        labels:
          velero.io/storage-location: default
        namespace: openshift-adp
      spec:
        hooks: {}
        includedNamespaces:
        - ocp-mysql
        includeClusterResources: false
        storageLocation: velero-sample-1
        ttl: 720h0m0s

       

      # oc get backup -n openshift-adp
      NAME                                          AGE
      velero-sample-withcsi-ocpmysql                3d19h
      velero-sample-withcsi-sep19                   3d19h
      velero-sample-withcsi-sep20acmeair            3d19h
      velero-sample-withcsi-sep21fileuploader       3d19h
      velero-sample-withcsisep26-sepfileuploader    3d19h
      velero-sample-witherestic-sep22fileuploader   3d19h
      velero-sample1-withcsi-ocpmysql               3d19h
      # oc get backup -n openshift-adp velero-sample1-withcsi-ocpmysql  -o jsonpath={.status.phase}
      Completed

       

      7. Check for the volumesnapshotcontent on the cluster

      # oc get volumesnapshotcontents | grep ocp-mysql
      snapcontent-a7737939-fe72-45f7-b41f-83575698681e   true         2147483648    Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   velero-mysql-c7ckk                          ocp-mysql  3d19h

      8. Restore the backup on the new cluster 

      # cat restore.yaml
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: velero-restore1-withcsi-ocpmysql
        namespace: openshift-adp
      spec:
        backupName: velero-sample1-withcsi-ocpmysql
        excludedResources:
        - nodes
        - events
        - events.events.k8s.io
        - backups.velero.io
        - restores.velero.io
        - resticrepositories.velero.io
        restorePVs: true

      9. Check that the volumesnapshotcontent is copied to the newcluster

      [root@m4202001 ~]# oc get volumesnapshotcontents | grep ocp-mysql
      velero-velero-mysql-c7ckk-nd98j                    true         0             Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   velero-mysql-c7ckk                          ocp-mysql                                 93m

      10. Verify that the application is up and running

       

      Actual results:

      pvc and the pod des not come up on the new cluster

       

      # oc get all,pvc -n ocp-mysql
      NAME                        READY   STATUS    RESTARTS   AGE
      pod/mysql-6d64ccd47-hlv4f   0/1     Pending   0          96m
      NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
      service/mysql   ClusterIP   172.30.170.174   <none>        3306/TCP   96m
      NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
      deployment.apps/mysql   0/1     1            0           96m
      NAME                              DESIRED   CURRENT   READY   AGE
      replicaset.apps/mysql-6d64ccd47   1         1         0       96m
      NAME                          STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS                AGE
      persistentvolumeclaim/mysql   Pending                                      ocs-storagecluster-cephfs   96m
      
       

       

      # oc describe persistentvolumeclaim/mysql -nocp-mysql
      Name:          mysql
      Namespace:     ocp-mysql
      StorageClass:  ocs-storagecluster-cephfs
      Status:        Pending
      Volume:
      Labels:        app=mysql
                     testlabel=selectors
                     testlabel2=foo
                     velero.io/backup-name=velero-sample1-withcsi-ocpmysql
                     velero.io/restore-name=velero-restore1-withcsi-ocpmysql
                     velero.io/volume-snapshot-name=velero-mysql-c7ckk
      Annotations:   velero.io/backup-name: velero-sample1-withcsi-ocpmysql
                     velero.io/volume-snapshot-name: velero-mysql-c7ckk
                     volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com
      Finalizers:    [kubernetes.io/pvc-protection]
      Capacity:
      Access Modes:
      VolumeMode:    Filesystem
      DataSource:
        APIGroup:  snapshot.storage.k8s.io
        Kind:      VolumeSnapshot
        Name:      velero-mysql-c7ckk
      Used By:     mysql-6d64ccd47-hlv4f
      Events:
        Type     Reason                Age                    From                                                                                                                      Message
        ----     ------                ----                   ----                                                                                                                      -------
        Warning  ProvisioningFailed    75m (x14 over 94m)     openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5  failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Internal desc = key not found: no snap source in omap for "csi.snap.82f2f23f-d274-4a09-bc68-f89a66e8c1c0"
        Normal   ExternalProvisioning  4m31s (x371 over 94m)  persistentvolume-controller                                                                                               waiting for a volume to be created, either by external provisioner "openshift-storage.cephfs.csi.ceph.com" or manually created by system administrator
        Normal   Provisioning          35s (x34 over 94m)     openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5  External provisioner is provisioning volume for claim "ocp-mysql/mysql"

      Expected results:

      The application should be restored successfully on the new cluster

      Additional info:

      Slack threads: https://coreos.slack.com/archives/C0144ECKUJ0/p1666104590173779

              rhn-support-ssingla Sachin Singla
              mperetz@redhat.com Maya Peretz
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: