Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-844

Add to known issues: a generated Route can have incorrect hostname on restore if hostname had been modified.

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • If Release Note Needed, Set a Value
    • Set a Value
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Documentation Task summary:

      The TL;DR is that a generated Route can have incorrect hostname on restore if hostname had been modified.

      This is a suggested known issue draft from tkaovila@redhat.com : Please modify it as you see fit.

      A generated Route can have incorrect hostname on restore if hostname had been modified

      A generated Route which have annotation "openshift.io/host.generated: 'true'" is assumed to have its .spec.host value populated by the cluster and unmodified by the user. If the user have modified the host value from a generated value, the host value can be lost on restore.

      There is no mechanism at this time in OADP to dynamically set route host value based on cluster base domain name for a non generated route.

      Likewise, for a generated route, the host value will be stripped by oadp-operator to be regenerated on restore cluster. Any modifications to .spec.host of a generated route will be lost on restore.

       

      == Original post below ==

      Description of problem:

      CSI restore fails on a new cluster with Datamover enabled on both the clusters.

      # oc logs csi-snapshot-controller-7b4dbd9b55-9xx4v -n openshift-cluster-storage-operator
      I0923 15:59:07.974528       1 main.go:125] Version: v4.12.0-202208231618.p0.g5a93140.assembly.stream-0-gb48c7b0-dirty
      I0923 15:59:07.984341       1 main.go:174] Start NewCSISnapshotController with kubeconfig [] resyncPeriod [15m0s]
      E0923 16:00:08.139182       1 main.go:86] Failed to list v1 volumesnapshots with error=Get "https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1/volumesnapshots": dial tcp 172.30.0.1:443: i/o timeout
      I0923 16:00:14.047264       1 leaderelection.go:248] attempting to acquire leader lease openshift-cluster-storage-operator/snapshot-controller-leader...
      I0923 16:00:14.326347       1 leaderelection.go:258] successfully acquired lease openshift-cluster-storage-operator/snapshot-controller-leader
      I0923 16:00:14.327196       1 leader_election.go:178] became leader, starting
      I0923 16:00:14.340326       1 snapshot_controller_base.go:152] Starting snapshot controller
      I0929 10:59:47.167089       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-w9mr7 was successfully created by the CSI driver.
      I0929 10:59:47.167167       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-w9mr7 is ready to use.
      I1006 16:05:07.772888       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI
      driver.
      I1006 16:05:07.772958       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use.
      I1008 10:01:56.228759       1 snapshot_controller_base.go:269] deletion of snapshot "ns-3abd639f-018b-489d-b3b4-ba9a02fa1782/name-3abd639f-018b-489d-b3b4-ba9a02fa1782" was already processed
      I1008 10:45:51.057537       1 snapshot_controller_base.go:269] deletion of snapshot "ns-b76a37bb-d8d5-4141-a494-2299231f5d9f/name-b76a37bb-d8d5-4141-a494-2299231f5d9f" was already processed
      E1010 09:27:06.437440       1 snapshot_controller_base.go:403] could not sync snapshot "ocp-mysql/velero-mysql-c7ckk": snapshot controller failed to update velero-mysql-c7ckk on API server: Operation cannot be fulfilled on volumesnapshots.snapshot.storage.k8s.io "velero-mysql-c7ckk": StorageError: invalid object, Code: 4, Key: /kubernetes.io/snapshot.storage.k8s.io/volumesnapshots/ocp-mysql/velero-mysql-c7ckk, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 6f5ca640-0dfa-4a9c-8696-7a52fc93bae3, UID in object meta:
      I1010 09:27:07.440434       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed
      I1010 09:27:36.803123       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed
      I1010 09:27:47.630757       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-w9mr7" was already processed
      I1010 09:29:52.134217       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI
      driver.
      I1010 09:29:52.134265       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:
      1. Install Volsync from latest stable channel from the operator hub on both the clusters
      2. Create a secret 
      $ oc create secret generic <secret-name> -n openshift-adp --from-file cloud=<credentials file path>
      3.  Create a DPA instance with dataMover:{ enable: true} and csi plugin listed on the defaultPlugins list on both the clusters.

      For example:

      • apiVersion: oadp.openshift.io/v1alpha1
          kind: DataProtectionApplication
          metadata:
            name: example-velero
            namespace: openshift-adp
          spec:
            backupLocations:
            - velero:
                config:
                  profile: default
                  region: eu-central-1
                credential:
                  key: cloud
                  name: cloud-credentials
                default: true
                objectStorage:
                  bucket: myoadptestbucket
                  prefix: velero
                provider: aws
            configuration:
              velero:
                defaultPlugins:
                - openshift
                - aws
                - kubevirt
                - csi
            features:
              dataMover:
                enable: true
        kind: List
        metadata:
          resourceVersion: ""
          selfLink: ""
        EOF

       

      4. Edit the volumesnapshotclass as follows:

      1. oc get vsclass ocs-storagecluster-cephfsplugin-snapclass -oyaml
        apiVersion: snapshot.storage.k8s.io/v1
        deletionPolicy: Retain
        driver: openshift-storage.cephfs.csi.ceph.com
        kind: VolumeSnapshotClass
        metadata:
          creationTimestamp: "2022-09-26T08:33:11Z"
          generation: 2
          labels:
            velero.io/csi-volumesnapshot-class: "true"
          name: ocs-storagecluster-cephfsplugin-snapclass
          resourceVersion: "1710327"
          uid: 9185ea19-f13f-479f-8d7a-68da472d7d2a
        parameters:
          clusterID: openshift-storage
          csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner
          csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage

      5. Deploy a sample application 

       

      # oc get all,pvc -n ocp-mysql
      NAME                        READY   STATUS    RESTARTS   AGE
      pod/mysql-6d64ccd47-b4vwr   1/1     Running   0          3d19h
      NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
      service/mysql   ClusterIP   172.30.175.178   <none>        3306/TCP   3d19h
      NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
      deployment.apps/mysql   1/1     1            1           3d19h
      NAME                              DESIRED   CURRENT   READY   AGE
      replicaset.apps/mysql-6d64ccd47   1         1         1       3d19h
      NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
      persistentvolumeclaim/mysql   Bound    pvc-fd0159b3-f7ba-4fb0-9a76-e7f8fe988457   2Gi        RWO            ocs-storagecluster-cephfs   3d19h
      

       

      6. Create backup on cluster1

      # cat backup.yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        name: velero-sample1-withcsi-ocpmysql
        labels:
          velero.io/storage-location: default
        namespace: openshift-adp
      spec:
        hooks: {}
        includedNamespaces:
        - ocp-mysql
        includeClusterResources: false
        storageLocation: velero-sample-1
        ttl: 720h0m0s

       

      # oc get backup -n openshift-adp
      NAME                                          AGE
      velero-sample-withcsi-ocpmysql                3d19h
      velero-sample-withcsi-sep19                   3d19h
      velero-sample-withcsi-sep20acmeair            3d19h
      velero-sample-withcsi-sep21fileuploader       3d19h
      velero-sample-withcsisep26-sepfileuploader    3d19h
      velero-sample-witherestic-sep22fileuploader   3d19h
      velero-sample1-withcsi-ocpmysql               3d19h
      # oc get backup -n openshift-adp velero-sample1-withcsi-ocpmysql  -o jsonpath={.status.phase}
      Completed

       

      7. Check for the volumesnapshotcontent on the cluster

      # oc get volumesnapshotcontents | grep ocp-mysql
      snapcontent-a7737939-fe72-45f7-b41f-83575698681e   true         2147483648    Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   velero-mysql-c7ckk                          ocp-mysql  3d19h

      8. Restore the backup on the new cluster 

      # cat restore.yaml
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: velero-restore1-withcsi-ocpmysql
        namespace: openshift-adp
      spec:
        backupName: velero-sample1-withcsi-ocpmysql
        excludedResources:
        - nodes
        - events
        - events.events.k8s.io
        - backups.velero.io
        - restores.velero.io
        - resticrepositories.velero.io
        restorePVs: true

      9. Check that the volumesnapshotcontent is copied to the newcluster

      [root@m4202001 ~]# oc get volumesnapshotcontents | grep ocp-mysql
      velero-velero-mysql-c7ckk-nd98j                    true         0             Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   velero-mysql-c7ckk                          ocp-mysql                                 93m

      10. Verify that the application is up and running

       

      Actual results:

      pvc and the pod des not come up on the new cluster

       

      # oc get all,pvc -n ocp-mysql
      NAME                        READY   STATUS    RESTARTS   AGE
      pod/mysql-6d64ccd47-hlv4f   0/1     Pending   0          96m
      NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
      service/mysql   ClusterIP   172.30.170.174   <none>        3306/TCP   96m
      NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
      deployment.apps/mysql   0/1     1            0           96m
      NAME                              DESIRED   CURRENT   READY   AGE
      replicaset.apps/mysql-6d64ccd47   1         1         0       96m
      NAME                          STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS                AGE
      persistentvolumeclaim/mysql   Pending                                      ocs-storagecluster-cephfs   96m
      
       

       

      # oc describe persistentvolumeclaim/mysql -nocp-mysql
      Name:          mysql
      Namespace:     ocp-mysql
      StorageClass:  ocs-storagecluster-cephfs
      Status:        Pending
      Volume:
      Labels:        app=mysql
                     testlabel=selectors
                     testlabel2=foo
                     velero.io/backup-name=velero-sample1-withcsi-ocpmysql
                     velero.io/restore-name=velero-restore1-withcsi-ocpmysql
                     velero.io/volume-snapshot-name=velero-mysql-c7ckk
      Annotations:   velero.io/backup-name: velero-sample1-withcsi-ocpmysql
                     velero.io/volume-snapshot-name: velero-mysql-c7ckk
                     volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com
      Finalizers:    [kubernetes.io/pvc-protection]
      Capacity:
      Access Modes:
      VolumeMode:    Filesystem
      DataSource:
        APIGroup:  snapshot.storage.k8s.io
        Kind:      VolumeSnapshot
        Name:      velero-mysql-c7ckk
      Used By:     mysql-6d64ccd47-hlv4f
      Events:
        Type     Reason                Age                    From                                                                                                                      Message
        ----     ------                ----                   ----                                                                                                                      -------
        Warning  ProvisioningFailed    75m (x14 over 94m)     openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5  failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Internal desc = key not found: no snap source in omap for "csi.snap.82f2f23f-d274-4a09-bc68-f89a66e8c1c0"
        Normal   ExternalProvisioning  4m31s (x371 over 94m)  persistentvolume-controller                                                                                               waiting for a volume to be created, either by external provisioner "openshift-storage.cephfs.csi.ceph.com" or manually created by system administrator
        Normal   Provisioning          35s (x34 over 94m)     openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5  External provisioner is provisioning volume for claim "ocp-mysql/mysql"

      Expected results:

      The application should be restored successfully on the new cluster

      Additional info:

      Slack threads: https://coreos.slack.com/archives/C0144ECKUJ0/p1666104590173779

        1. Screenshot 2022-11-03 at 12.27.15.png
          Screenshot 2022-11-03 at 12.27.15.png
          156 kB
        2. Screenshot 2022-11-03 at 12.21.54-1.png
          Screenshot 2022-11-03 at 12.21.54-1.png
          91 kB
        3. noname
          3 kB
        4. noname
          3 kB
        5. acmeair-master_pod_logs.zip
          8 kB
        6. acmeair_pods.yaml
          44 kB
        7. acmeair_podlogs.log
          96 kB
        8. acmeair_pod_logs.zip
          9 kB
        9. acmeair_pod_desc.log
          29 kB

              rhn-support-anarnold A Arnold
              sravikab2 Sravika Balusu (Inactive)
              Sachin Singla Sachin Singla
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: