Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-602

[2322019] [RDR] [Flatten] Proper error messages aren't shown when a drpolicy without flattening is applied to cloned/snapshot restored PVC

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • odf-4.18
    • odf-4.17
    • odf-dr/ramen
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      Version of all relevant components (if applicable):
      OCP 4.17.0-0.nightly-2024-10-20-231827
      ODF 4.17.0-126
      ACM 2.12.0-DOWNSTREAM-2024-10-18-21-57-41
      OpenShift Virtualization 4.17.1-19
      Submariner 0.19 unreleased downstream image 846949
      ceph version 18.2.1-229.el9cp (ef652b206f2487adfc86613646a4cac946f6b4e0) reef (stable)
      OADP 1.4.1
      OpenShift GitOps 1.14.0
      VolSync 0.10.1

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      Is there any workaround available to the best of your knowledge?

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      Can this issue reproducible?

      Can this issue reproduce from the UI?

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1. Deploy a RBD CNV workload on a RDR setup using discovered apps. Create a clone of the PVC.
      2. Delete the workload.
      3. Now deploy a workload in such a way that it consumes the cloned PVC.
      4. DR protect this workload with a drpolicy where flattening is not enabled.
      5. The VR will go to primary and sync and backup would initially look fine for the workload but the RBD image will not undergo flattening.
      6. After a while, sync wouldn't continue for this workload and it's hard to debug the root cause because proper error messages are missing in the VR/DRPC resource.

      Actual results:
      VR-

      oc describe vr -n busybox-workloads-100
      Name: root-disk
      Namespace: busybox-workloads-100
      Labels: ramendr.openshift.io/owner-name=busybox-100
      ramendr.openshift.io/owner-namespace-name=openshift-dr-ops
      Annotations: <none>
      API Version: replication.storage.openshift.io/v1alpha1
      Kind: VolumeReplication
      Metadata:
      Creation Timestamp: 2024-10-27T18:04:31Z
      Finalizers:
      replication.storage.openshift.io
      Generation: 1
      Resource Version: 9855180
      UID: c4ae8511-9fa1-4a53-8374-8b87288255d1
      Spec:
      Auto Resync: false
      Data Source:
      API Group:
      Kind: PersistentVolumeClaim
      Name: root-disk
      Replication Handle:
      Replication State: primary
      Volume Replication Class: rbd-volumereplicationclass-1625360775
      Status:
      Conditions:
      Last Transition Time: 2024-10-27T18:04:35Z
      Message:
      Observed Generation: 1
      Reason: Promoted
      Status: True
      Type: Completed
      Last Transition Time: 2024-10-27T18:04:35Z
      Message:
      Observed Generation: 1
      Reason: Healthy
      Status: False
      Type: Degraded
      Last Transition Time: 2024-10-27T18:04:35Z
      Message:
      Observed Generation: 1
      Reason: NotResyncing
      Status: False
      Type: Resyncing
      Last Completion Time: 2024-10-27T18:47:06Z
      Last Sync Duration: 0s
      Last Sync Time: 2024-10-27T18:45:00Z
      Message: volume is marked primary
      Observed Generation: 1
      State: Primary
      Events: <none>

      DRPC-

      oc get drpc busybox-100 -oyaml -n openshift-dr-ops
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRPlacementControl
      metadata:
      annotations:
      drplacementcontrol.ramendr.openshift.io/app-namespace: openshift-dr-ops
      drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: amagrawa-21o-1
      creationTimestamp: "2024-10-27T18:04:31Z"
      finalizers:

      • drpc.ramendr.openshift.io/finalizer
        generation: 2
        labels:
        cluster.open-cluster-management.io/backup: ramen
        name: busybox-100
        namespace: openshift-dr-ops
        ownerReferences:
      • apiVersion: cluster.open-cluster-management.io/v1beta1
        blockOwnerDeletion: true
        controller: true
        kind: Placement
        name: busybox-100-placement-1
        uid: e36cc23e-b6ad-4e24-ab76-0b8f2332aa9e
        resourceVersion: "8573969"
        uid: 552aaddd-3376-4550-ba3d-b7150e27ac91
        spec:
        drPolicyRef:
        apiVersion: ramendr.openshift.io/v1alpha1
        kind: DRPolicy
        name: odr-policy-5m
        kubeObjectProtection:
        captureInterval: 5m0s
        kubeObjectSelector:
        matchExpressions:
      • key: appname
        operator: In
        values:
      • vm
        placementRef:
        apiVersion: cluster.open-cluster-management.io/v1beta1
        kind: Placement
        name: busybox-100-placement-1
        namespace: openshift-dr-ops
        preferredCluster: amagrawa-21o-1
        protectedNamespaces:
      • busybox-workloads-100
        pvcSelector:
        matchExpressions:
      • key: appname
        operator: In
        values:
      • vm
        status:
        actionDuration: 15.045573062s
        actionStartTime: "2024-10-27T18:04:46Z"
        conditions:
      • lastTransitionTime: "2024-10-27T18:04:31Z"
        message: Initial deployment completed
        observedGeneration: 2
        reason: Deployed
        status: "True"
        type: Available
      • lastTransitionTime: "2024-10-27T18:04:31Z"
        message: Ready
        observedGeneration: 2
        reason: Success
        status: "True"
        type: PeerReady
      • lastTransitionTime: "2024-10-27T18:07:31Z"
        message: VolumeReplicationGroup (openshift-dr-ops/busybox-100) on cluster amagrawa-21o-1
        is protecting required resources and data
        observedGeneration: 2
        reason: Protected
        status: "True"
        type: Protected
        lastGroupSyncDuration: 0s
        lastGroupSyncTime: "2024-10-27T18:10:00Z"
        lastKubeObjectProtectionTime: "2024-10-27T18:54:38Z"
        lastUpdateTime: "2024-10-27T18:59:33Z"
        observedGeneration: 2
        phase: Deployed
        preferredDecision:
        clusterName: amagrawa-21o-1
        clusterNamespace: amagrawa-21o-1
        progression: Completed
        resourceConditions:
        conditions:
      • lastTransitionTime: "2024-10-27T18:04:35Z"
        message: PVCs in the VolumeReplicationGroup are ready for use
        observedGeneration: 1
        reason: Ready
        status: "True"
        type: DataReady
      • lastTransitionTime: "2024-10-27T18:04:32Z"
        message: VolumeReplicationGroup is replicating
        observedGeneration: 1
        reason: Replicating
        status: "False"
        type: DataProtected
      • lastTransitionTime: "2024-10-27T18:04:31Z"
        message: Nothing to restore
        observedGeneration: 1
        reason: Restored
        status: "True"
        type: ClusterDataReady
      • lastTransitionTime: "2024-10-27T18:04:39Z"
        message: Cluster data of all PVs are protected
        observedGeneration: 1
        reason: Uploaded
        status: "True"
        type: ClusterDataProtected
        resourceMeta:
        generation: 1
        kind: VolumeReplicationGroup
        name: busybox-100
        namespace: openshift-dr-ops
        protectedpvcs:
      • root-disk
        resourceVersion: "9869528"

      Although it fires the VolumeSyncronizationDelay alert on the hub cluster if cluster monitoring labelling is done, please note that it's optional and doesn't highlight where the root cause is. There could be n number of reasons why sync isn't progressing?

      Also, to check if image under went flattening or not, one has to rsh into toolbox pod and run ceph progress command which isn't recommended for customers.

      Expected results:[RDR] [Flatten] Proper error messages should be shown in VR and DRPC resource when a drpolicy without flattening is applied to cloned/snapshot restored PVC and sync doesn't resume/rbd-image doesn't undergo flattening.

      Additional info:

              nsoffer@redhat.com Nir Soffer
              amagrawa@redhat.com Aman Agrawal
              Nir Soffer
              Krishnaram Karthick Ramdoss Krishnaram Karthick Ramdoss
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

                Created:
                Updated: