Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: odf-4.17.3
Affects Version/s: odf-4.16
Component/s: odf-dr/ramen
Labels:
- Tracking

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ Internal Whiteboard:
4.17.1
Bugzilla Bug:
RHBZ: 2295782
Dev Approval:
Committed
Prod build version:
4.17.0-105
QE Approval:
Committed
Release Note Text:

Hide
.Post hub recovery, subscription app pods now come up after Failover

Previously, post hub recovery, the subscription application pods did not come up after failover from primary to the secondary managed clusters. This caused RBAC error occurs in AppSub subscription resource on managed cluster due to a timing issue in the backup and restore scenario.

This issue has been fixed, and subscription app pods now come up after failover from primary to secondary managed clusters.

Show
.Post hub recovery, subscription app pods now come up after Failover Previously, post hub recovery, the subscription application pods did not come up after failover from primary to the secondary managed clusters. This caused RBAC error occurs in AppSub subscription resource on managed cluster due to a timing issue in the backup and restore scenario. This issue has been fixed, and subscription app pods now come up after failover from primary to secondary managed clusters.
Release Note Type:
Bug Fix
Target Release:

odf-4.17.3
Intelligence Requested:
Market:

Target Version:

odf-4.17.3

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem (please be detailed as possible and provide log
snippests):
Observing an issue related to subscription apps post MDR co-situated hub recovery(c1+activehub+ceph(zone b) was down). Was able to failover appset pull and discovered apps successfully using the new hub.
But Sub app pods are not showing up after failover from c1 to c2, but PVCs, vrg of are failedover for these apps.

DRPC of sub apps shows it has failedover successfully, but respective app pods are missing in c2:
busybox-sub-1 busybox-sub-1-placement-1-drpc 17h pbyregow-cl1 pbyregow-cl2 Failover FailedOver Completed 2024-07-03T16:04:38Z 2h0m45.152881171s True
vm-pvc-acm-sub1 vm-pvc-acm-sub1-placement-1-drpc 17h pbyregow-cl1 pbyregow-cl2 Failover FailedOver Completed 2024-07-03T16:17:57Z 2h14m58.850396117s True
vm-pvc-acm-sub2 vm-pvc-acm-sub2-placement-1-drpc 17h pbyregow-cl1 pbyregow-cl2 Failover FailedOver Completed 2024-07-03T16:18:03Z 2h14m52.041023629s True

for i in

{busybox-sub-1,vm-pvc-acm-sub1,vm-pvc-acm-sub2}

;do oc get pod,pvc,vrg -n $i;done
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/busybox-cephfs-pvc-1 Bound pvc-cba9f468-46ee-41de-a6a5-0650e9235b8b 100Gi RWO ocs-external-storagecluster-cephfs <unset> 19h
persistentvolumeclaim/busybox-rbd-pvc-1 Bound pvc-4be77410-ef6b-454f-9835-2b8c111f88c6 100Gi RWO ocs-external-storagecluster-ceph-rbd <unset> 19h

NAME DESIREDSTATE CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/busybox-sub-1-placement-1-drpc primary Primary
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/vm-1-pvc Bound pvc-96184450-4ed0-4879-84a7-76fd3407af7a 512Mi RWX ocs-external-storagecluster-ceph-rbd <unset> 19h

NAME DESIREDSTATE CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/vm-pvc-acm-sub1-placement-1-drpc primary Primary
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/vm-1-pvc Bound pvc-584707a8-81af-4994-9f08-90556b4f26a7 512Mi RWX ocs-external-storagecluster-ceph-rbd <unset> 19h

NAME DESIREDSTATE CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/vm-pvc-acm-sub2-placement-1-drpc primary Primary

Seeing this error in subscription in Acm console for busybox-sub-1 app:

{ggithubcom-red-hat-storage-ocs-workloads-ns/ggithubcom-red-hat-storage-ocs-workloads <nil> [] 0xc0025bd470 [] <nil> nil [] [] false}

{ 0001-01-01 00:00:00
+0000 UTC

{ [] []}

map[]}}: channels.apps.open-cluster-management.io
"ggithubcom-red-hat-storage-ocs-workloads" is forbidden: User
"system:open-cluster-management:cluster:pbyregow-cl2:addon:application-manager:agent:application-manager"
cannot get resource "channels" in API group
"apps.open-cluster-management.io" in the namespace
"ggithubcom-red-hat-storage-ocs-workloads-ns"

Version of all relevant components (if applicable):
OCP: 4.16.0-0.nightly-2024-06-27-091410
ODF: 4.16.0-134
ACM: 2.11.0-137
OADP: 1.4 (latest) hub/managed cluster

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Is there any workaround available to the best of your knowledge?

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

Can this issue reproducible?

Can this issue reproduce from the UI?

If this is a regression, please provide more details to justify this:

Steps to Reproduce:
1. Configured MDR cluster as per the versions listed.
2. Deployed sub, appset pull and discovered apps, applied policies and had them in different states(Deployed/FailedOver/Relocate) on both clusters.
3. Configured backup, waited ~2 hrs to take latest backup. Had the latest backup without any changes in between for any apps.
4. Brought down c1+activehub+3cephnodes
5. Restored on newhub, Restore completed successfully, followed the hub recovery doc to apply appliedManifestWorkEvictionGracePeriod: "24h"
6. DRpolicy reached validated state.
7. Removed appliedManifestWorkEvictionGracePeriod after DRpolicy and drpc recovered.
7. Failedover apps from c1 to c2.

Actual results:
Subscription app pods did not come up after failover post hubrecovery.

Expected results:
Sub apps pods should up along with rest of the resources.

Additional info:
Rest of the apps(appset-pull & disc) got failedover to c2 successfully.

blocks

DFBUGS-98 [2281703] [Tracker] ODF 4.17.0 Release Notes

Closed

external trackers

Red Hat Issue Tracker ACM-12448

Assignee:: Elena Gershkovich

Reporter:: Parikshith Byregowda

Need Info From:: Aman Agrawal, Elena Gershkovich

QA Contact:: Aman Agrawal

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2024/07/04 12:09 PM

Updated:: 2025/01/09 5:00 PM

Resolved:: 2025/01/09 5:00 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty