Loading...

Type: Bug
Resolution: Won't Do
Priority: Critical
Fix Version/s: None
Affects Version/s: odf-4.13
Component/s: odf-dr/ramen
Labels:
- FailedQA
- FutureFeature

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2232673
Dev Approval:
?
Prod build version:
4.18.0-107
QE Approval:
?
Release Note Type:
Release Note Not Required
Target Release:

odf-4.18.z
Intelligence Requested:
Market:

Sprint:
RamenDR sprint 2024 #21

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem (please be detailed as possible and provide log
snippests):
The VR and VRG can have correct (healthy) status conditions when the image Peer_site state is NOT up+Replaying (i.e up+starting_replay).

Example for busybox VRG, VR and associated cephrbd image:

$ oc get vrg busybox-placement-1-drpc -n busybox-sample -o jsonpath='

{.status.conditions}' | jq
[
{ "lastTransitionTime": "2023-08-17T16:06:31Z", "message": "PVCs in the VolumeReplicationGroup are ready for use", "observedGeneration": 1, "reason": "Ready", "status": "True", "type": "DataReady" },
{ "lastTransitionTime": "2023-08-17T16:06:08Z", "message": "VolumeReplicationGroup is replicating", "observedGeneration": 1, "reason": "Replicating", "status": "False", "type": "DataProtected" },
{ "lastTransitionTime": "2023-08-17T16:06:07Z", "message": "Restored cluster data", "observedGeneration": 1, "reason": "Restored", "status": "True", "type": "ClusterDataReady" },
{ "lastTransitionTime": "2023-08-17T16:06:08Z", "message": "Cluster data of all PVs are protected", "observedGeneration": 1, "reason": "Uploaded", "status": "True", "type": "ClusterDataProtected" }
]

$ oc get vr busybox-pvc -n busybox-sample -o jsonpath='{.status.conditions}

' | jq
[

{ "lastTransitionTime": "2023-08-17T16:06:31Z", "message": "", "observedGeneration": 1, "reason": "Promoted", "status": "True", "type": "Completed" }

,

{ "lastTransitionTime": "2023-08-17T16:06:31Z", "message": "", "observedGeneration": 1, "reason": "Healthy", "status": "False", "type": "Degraded" }

,

{ "lastTransitionTime": "2023-08-17T16:06:31Z", "message": "", "observedGeneration": 1, "reason": "NotResyncing", "status": "False", "type": "Resyncing" }

]

$ oc get vr busybox-pvc -n busybox-sample -o jsonpath='

{.status.conditions}

' | jq
[

{ "lastTransitionTime": "2023-08-17T16:06:31Z", "message": "", "observedGeneration": 1, "reason": "Promoted", "status": "True", "type": "Completed" }

,

{ "lastTransitionTime": "2023-08-17T16:06:31Z", "message": "", "observedGeneration": 1, "reason": "Healthy", "status": "False", "type": "Degraded" }

,

{ "lastTransitionTime": "2023-08-17T16:06:31Z", "message": "", "observedGeneration": 1, "reason": "NotResyncing", "status": "False", "type": "Resyncing" }

]

$ rbd -p ocs-storagecluster-cephblockpool mirror image status csi-vol-c8bc0681-76f1-4f7c-8866-2c6e47372276
csi-vol-c8bc0681-76f1-4f7c-8866-2c6e47372276:
global_id: f7362123-b264-48ee-85a3-7fb30b8f0e08
state: up+stopped
description: local image is primary
service: a on bos5-zwmb8-ocs-0-nmd2n
last_update: 2023-08-17 20:37:56
peer_sites:
name: 981d7c1f-0ab5-4ed1-a91b-050586b08ab8
state: up+starting_replay
description: starting replay
last_update: 2023-08-17 20:37:56
snapshots:
10 .mirror.primary.f7362123-b264-48ee-85a3-7fb30b8f0e08.b81c40e6-3b52-4db0-ae5b-3df80855a7a4 (peer_uuids:[2fca7624-ef09-4ea4-961d-629af99fd6c0])
11 .mirror.primary.f7362123-b264-48ee-85a3-7fb30b8f0e08.061cbbae-a897-4ed6-a6c6-a4b6e70cc758 (peer_uuids:[2fca7624-ef09-4ea4-961d-629af99fd6c0])
12 .mirror.primary.f7362123-b264-48ee-85a3-7fb30b8f0e08.a84ca191-fc01-4a20-a4ee-6c2c17a5cd67 (peer_uuids:[2fca7624-ef09-4ea4-961d-629af99fd6c0])
13 .mirror.primary.f7362123-b264-48ee-85a3-7fb30b8f0e08.b57f9475-5a13-4d8e-a18c-056713bb1308 (peer_uuids:[2fca7624-ef09-4ea4-961d-629af99fd6c0])
113 .mirror.primary.f7362123-b264-48ee-85a3-7fb30b8f0e08.b9499d87-2d7a-46f6-97ac-2decb58885a9 (peer_uuids:[2fca7624-ef09-4ea4-961d-629af99fd6c0])

$ rbd -p ocs-storagecluster-cephblockpool mirror pool status --verbose
health: WARNING
daemon health: OK
image health: WARNING
images: 1 total
1 starting_replay

DAEMONS
service 15646:
instance_id: 15652
client_id: a
hostname: bos5-zwmb8-ocs-0-nmd2n
version: 17.2.6-70.el9cp
leader: true
health: OK

IMAGES
csi-vol-c8bc0681-76f1-4f7c-8866-2c6e47372276:
global_id: f7362123-b264-48ee-85a3-7fb30b8f0e08
state: up+stopped
description: local image is primary
service: a on bos5-zwmb8-ocs-0-nmd2n
last_update: 2023-08-17 20:38:56
peer_sites:
name: 981d7c1f-0ab5-4ed1-a91b-050586b08ab8
state: up+starting_replay
description: starting replay
last_update: 2023-08-17 20:38:56

Version of all relevant components (if applicable):
$ oc version
Client Version: 4.14.0-ec.4
Kustomize Version: v5.0.1
Server Version: 4.13.6
Kubernetes Version: v1.26.6+73ac561

ManagedCluster:
$ oc get csv -n openshift-storage
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.13.1-rhodf NooBaa Operator 4.13.1-rhodf mcg-operator.v4.13.0-rhodf Succeeded
ocs-operator.v4.13.1-rhodf OpenShift Container Storage 4.13.1-rhodf ocs-operator.v4.13.0-rhodf Succeeded
odf-csi-addons-operator.v4.13.1-rhodf CSI Addons 4.13.1-rhodf odf-csi-addons-operator.v4.13.0-rhodf Succeeded
odf-operator.v4.13.1-rhodf OpenShift Data Foundation 4.13.1-rhodf odf-operator.v4.13.0-rhodf Succeeded
odr-cluster-operator.v4.13.1-rhodf Openshift DR Cluster Operator 4.13.1-rhodf odr-cluster-operator.v4.13.0-rhodf Succeeded
volsync-product.v0.7.4 VolSync 0.7.4 volsync-product.v0.7.3 Succeeded

Hub Cluster:
$ oc get csv -n openshift-operators
NAME DISPLAY VERSION REPLACES PHASE
odf-multicluster-orchestrator.v4.13.1-rhodf ODF Multicluster Orchestrator 4.13.1-rhodf odf-multicluster-orchestrator.v4.13.0-rhodf Succeeded
odr-hub-operator.v4.13.1-rhodf Openshift DR Hub Operator 4.13.1-rhodf odr-hub-operator.v4.13.0-rhodf Succeeded

$ oc get csv -n open-cluster-management
NAME DISPLAY VERSION REPLACES PHASE
advanced-cluster-management.v2.8.0 Advanced Cluster Management for Kubernetes 2.8.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Steps to Reproduce:
1. Configure RDR and create busybox application on managed cluster.
2. Remove Submariner connectivity
3. Assign DRPolicy to busybox app
4. Install Submariner again

Actual results:
Image replication does not start and VR and VRG status conditions do not reflect any problems with replication.

Expected results:
Image replication does not start and VR and VRG status conditions do reflect a image replication problem.

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty