Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: OADP 1.2.2
Affects Version/s: None
Component/s: Documentation
Labels:

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Documentation for OADP-1.2
QEStatus:
ToDo
Release Note Type:
If Release Note Needed, Set a Value
Release Note Status:
Set a Value

Cost of Delay:
0
WSJF:
0
Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Documentation Task summary:

The TL;DR is that a generated Route can have incorrect hostname on restore if hostname had been modified.

This is a suggested known issue draft from tkaovila@redhat.com : Please modify it as you see fit.

A generated Route can have incorrect hostname on restore if hostname had been modified

A generated Route which have annotation "openshift.io/host.generated: 'true'" is assumed to have its .spec.host value populated by the cluster and unmodified by the user. If the user have modified the host value from a generated value, the host value can be lost on restore.

There is no mechanism at this time in OADP to dynamically set route host value based on cluster base domain name for a non generated route.

Likewise, for a generated route, the host value will be stripped by oadp-operator to be regenerated on restore cluster. Any modifications to .spec.host of a generated route will be lost on restore.

== Original post below ==

Description of problem:

CSI restore fails on a new cluster with Datamover enabled on both the clusters.

# oc logs csi-snapshot-controller-7b4dbd9b55-9xx4v -n openshift-cluster-storage-operator
I0923 15:59:07.974528       1 main.go:125] Version: v4.12.0-202208231618.p0.g5a93140.assembly.stream-0-gb48c7b0-dirty
I0923 15:59:07.984341       1 main.go:174] Start NewCSISnapshotController with kubeconfig [] resyncPeriod [15m0s]
E0923 16:00:08.139182       1 main.go:86] Failed to list v1 volumesnapshots with error=Get "https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1/volumesnapshots": dial tcp 172.30.0.1:443: i/o timeout
I0923 16:00:14.047264       1 leaderelection.go:248] attempting to acquire leader lease openshift-cluster-storage-operator/snapshot-controller-leader...
I0923 16:00:14.326347       1 leaderelection.go:258] successfully acquired lease openshift-cluster-storage-operator/snapshot-controller-leader
I0923 16:00:14.327196       1 leader_election.go:178] became leader, starting
I0923 16:00:14.340326       1 snapshot_controller_base.go:152] Starting snapshot controller
I0929 10:59:47.167089       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-w9mr7 was successfully created by the CSI driver.
I0929 10:59:47.167167       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-w9mr7 is ready to use.
I1006 16:05:07.772888       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI
driver.
I1006 16:05:07.772958       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use.
I1008 10:01:56.228759       1 snapshot_controller_base.go:269] deletion of snapshot "ns-3abd639f-018b-489d-b3b4-ba9a02fa1782/name-3abd639f-018b-489d-b3b4-ba9a02fa1782" was already processed
I1008 10:45:51.057537       1 snapshot_controller_base.go:269] deletion of snapshot "ns-b76a37bb-d8d5-4141-a494-2299231f5d9f/name-b76a37bb-d8d5-4141-a494-2299231f5d9f" was already processed
E1010 09:27:06.437440       1 snapshot_controller_base.go:403] could not sync snapshot "ocp-mysql/velero-mysql-c7ckk": snapshot controller failed to update velero-mysql-c7ckk on API server: Operation cannot be fulfilled on volumesnapshots.snapshot.storage.k8s.io "velero-mysql-c7ckk": StorageError: invalid object, Code: 4, Key: /kubernetes.io/snapshot.storage.k8s.io/volumesnapshots/ocp-mysql/velero-mysql-c7ckk, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 6f5ca640-0dfa-4a9c-8696-7a52fc93bae3, UID in object meta:
I1010 09:27:07.440434       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed
I1010 09:27:36.803123       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed
I1010 09:27:47.630757       1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-w9mr7" was already processed
I1010 09:29:52.134217       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI
driver.
I1010 09:29:52.134265       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Install Volsync from latest stable channel from the operator hub on both the clusters
2. Create a secret
$ oc create secret generic <secret-name> -n openshift-adp --from-file cloud=<credentials file path>
3. Create a DPA instance with dataMover:{ enable: true} and csi plugin listed on the defaultPlugins list on both the clusters.

For example:

apiVersion: oadp.openshift.io/v1alpha1
  kind: DataProtectionApplication
  metadata:
    name: example-velero
    namespace: openshift-adp
  spec:
    backupLocations:
    - velero:
        config:
          profile: default
          region: eu-central-1
        credential:
          key: cloud
          name: cloud-credentials
        default: true
        objectStorage:
          bucket: myoadptestbucket
          prefix: velero
        provider: aws
    configuration:
      velero:
        defaultPlugins:
        - openshift
        - aws
        - kubevirt
        - csi
    features:
      dataMover:
        enable: true
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
EOF

4. Edit the volumesnapshotclass as follows:

oc get vsclass ocs-storagecluster-cephfsplugin-snapclass -oyaml
apiVersion: snapshot.storage.k8s.io/v1
deletionPolicy: Retain
driver: openshift-storage.cephfs.csi.ceph.com
kind: VolumeSnapshotClass
metadata:
  creationTimestamp: "2022-09-26T08:33:11Z"
  generation: 2
  labels:
    velero.io/csi-volumesnapshot-class: "true"
  name: ocs-storagecluster-cephfsplugin-snapclass
  resourceVersion: "1710327"
  uid: 9185ea19-f13f-479f-8d7a-68da472d7d2a
parameters:
  clusterID: openshift-storage
  csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage

5. Deploy a sample application

# oc get all,pvc -n ocp-mysql
NAME                        READY   STATUS    RESTARTS   AGE
pod/mysql-6d64ccd47-b4vwr   1/1     Running   0          3d19h
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/mysql   ClusterIP   172.30.175.178   <none>        3306/TCP   3d19h
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql   1/1     1            1           3d19h
NAME                              DESIRED   CURRENT   READY   AGE
replicaset.apps/mysql-6d64ccd47   1         1         1       3d19h
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
persistentvolumeclaim/mysql   Bound    pvc-fd0159b3-f7ba-4fb0-9a76-e7f8fe988457   2Gi        RWO            ocs-storagecluster-cephfs   3d19h

6. Create backup on cluster1

# cat backup.yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: velero-sample1-withcsi-ocpmysql
  labels:
    velero.io/storage-location: default
  namespace: openshift-adp
spec:
  hooks: {}
  includedNamespaces:
  - ocp-mysql
  includeClusterResources: false
  storageLocation: velero-sample-1
  ttl: 720h0m0s

# oc get backup -n openshift-adp
NAME                                          AGE
velero-sample-withcsi-ocpmysql                3d19h
velero-sample-withcsi-sep19                   3d19h
velero-sample-withcsi-sep20acmeair            3d19h
velero-sample-withcsi-sep21fileuploader       3d19h
velero-sample-withcsisep26-sepfileuploader    3d19h
velero-sample-witherestic-sep22fileuploader   3d19h
velero-sample1-withcsi-ocpmysql               3d19h

# oc get backup -n openshift-adp velero-sample1-withcsi-ocpmysql  -o jsonpath={.status.phase}
Completed

7. Check for the volumesnapshotcontent on the cluster

# oc get volumesnapshotcontents | grep ocp-mysql
snapcontent-a7737939-fe72-45f7-b41f-83575698681e   true         2147483648    Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   velero-mysql-c7ckk                          ocp-mysql  3d19h

8. Restore the backup on the new cluster

# cat restore.yaml
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: velero-restore1-withcsi-ocpmysql
  namespace: openshift-adp
spec:
  backupName: velero-sample1-withcsi-ocpmysql
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  restorePVs: true

9. Check that the volumesnapshotcontent is copied to the newcluster

[root@m4202001 ~]# oc get volumesnapshotcontents | grep ocp-mysql
velero-velero-mysql-c7ckk-nd98j                    true         0             Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   velero-mysql-c7ckk                          ocp-mysql                                 93m

10. Verify that the application is up and running

Actual results:

pvc and the pod des not come up on the new cluster

# oc get all,pvc -n ocp-mysql
NAME                        READY   STATUS    RESTARTS   AGE
pod/mysql-6d64ccd47-hlv4f   0/1     Pending   0          96m
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/mysql   ClusterIP   172.30.170.174   <none>        3306/TCP   96m
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql   0/1     1            0           96m
NAME                              DESIRED   CURRENT   READY   AGE
replicaset.apps/mysql-6d64ccd47   1         1         0       96m
NAME                          STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS                AGE
persistentvolumeclaim/mysql   Pending                                      ocs-storagecluster-cephfs   96m

# oc describe persistentvolumeclaim/mysql -nocp-mysql
Name:          mysql
Namespace:     ocp-mysql
StorageClass:  ocs-storagecluster-cephfs
Status:        Pending
Volume:
Labels:        app=mysql
               testlabel=selectors
               testlabel2=foo
               velero.io/backup-name=velero-sample1-withcsi-ocpmysql
               velero.io/restore-name=velero-restore1-withcsi-ocpmysql
               velero.io/volume-snapshot-name=velero-mysql-c7ckk
Annotations:   velero.io/backup-name: velero-sample1-withcsi-ocpmysql
               velero.io/volume-snapshot-name: velero-mysql-c7ckk
               volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
DataSource:
  APIGroup:  snapshot.storage.k8s.io
  Kind:      VolumeSnapshot
  Name:      velero-mysql-c7ckk
Used By:     mysql-6d64ccd47-hlv4f
Events:
  Type     Reason                Age                    From                                                                                                                      Message
  ----     ------                ----                   ----                                                                                                                      -------
  Warning  ProvisioningFailed    75m (x14 over 94m)     openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5  failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Internal desc = key not found: no snap source in omap for "csi.snap.82f2f23f-d274-4a09-bc68-f89a66e8c1c0"
  Normal   ExternalProvisioning  4m31s (x371 over 94m)  persistentvolume-controller                                                                                               waiting for a volume to be created, either by external provisioner "openshift-storage.cephfs.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          35s (x34 over 94m)     openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5  External provisioner is provisioning volume for claim "ocp-mysql/mysql"

Expected results:

The application should be restored successfully on the new cluster

Additional info:

Slack threads: https://coreos.slack.com/archives/C0144ECKUJ0/p1666104590173779

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

acmeair_pod_desc.log
2022/10/20 8:00 AM
29 kB
Sravika Balusu
Hide
acmeair_pod_logs.zip
2022/10/21 2:22 PM
9 kB
Sravika Balusu
Extracting archive...
Show
acmeair_pod_logs.zip
2022/10/21 2:22 PM
9 kB
Sravika Balusu
acmeair_podlogs.log
2022/10/20 8:00 AM
96 kB
Sravika Balusu
acmeair_pods.yaml
2022/10/19 4:11 PM
44 kB
Sravika Balusu
Hide
acmeair-master_pod_logs.zip
2022/11/08 2:36 PM
8 kB
Sravika Balusu
Extracting archive...
Show
acmeair-master_pod_logs.zip
2022/11/08 2:36 PM
8 kB
Sravika Balusu
noname
2022/11/16 1:37 PM
3 kB
Tiger Kaovilai
noname
2022/11/16 3:30 AM
3 kB
Tiger Kaovilai
Screenshot 2022-11-03 at 12.21.54-1.png
2022/11/03 11:22 AM
91 kB
Sravika Balusu
Screenshot 2022-11-03 at 12.27.15.png
2022/11/03 11:27 AM
156 kB
Sravika Balusu

is related to

OADP-943 Data mover restore successful on separate cluster with VolumeSnapshotContent deletionPolicy not set to Retain.

Closed

is triggered by

OADP-989 Data Mover could restore PVCs with mover pod in a different node to the workload pod causing issues for PVCs with ReadWriteOnce accessMode

Closed

relates to

OADP-1181 Update Release Notes Page & Known Issues

Testing

links to

openshift/openshift-docs#65626: OADP-844: add to known issues - generated route can have incorrect hostnamect ho…

openshift/openshift-docs#68867: OADP-844: Adding to known issues - generated route can have incorrect…

openshift/openshift-docs#68888: [enterprise-4.15] OADP-844: Adding to known issues - generated route can have incorrect…

openshift/openshift-docs#68889: [enterprise-4.14] OADP-844: Adding to known issues - generated route can have incorrect…

openshift/openshift-docs#68890: [enterprise-4.13] OADP-844: Adding to known issues - generated route can have incorrect…

openshift/openshift-docs#68891: [enterprise-4.12] OADP-844: Adding to known issues - generated route can have incorrect…

openshift/openshift-docs#68900: OADP-844: updating release notes for 4.11 following failed cherry pick

(5 links to)

1.	[RedHat QE] Verify Bug OADP-844 - Data Mover: CSI restore on a new cluster fails	Release Pending	Sachin Singla
2.	[IBM QE-P] Verify Bug OADP-844 - Data Mover: CSI restore on a new cluster fails	Release Pending	Sonia Garudi
3.	[IBM QE-Z] Verify Bug OADP-844 - Data Mover: CSI restore on a new cluster fails	Release Pending	Maya Anilson (Inactive)

Assignee:: A Arnold

Reporter:: Sravika Balusu (Inactive)

QA Contact:: Sachin Singla

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2022/10/10 11:23 AM

Updated:: 2024/04/01 1:26 PM

Resolved:: 2023/12/04 8:50 PM

Details

Description

6. Create backup on cluster1

8. Restore the backup on the new cluster

9. Check that the volumesnapshotcontent is copied to the newcluster

10. Verify that the application is up and running

pvc and the pod des not come up on the new cluster

The application should be restored successfully on the new cluster

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates