-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.16
Description of problem (please be detailed as possible and provide log
snippests):
Observing an issue related to subscription apps post MDR co-situated hub recovery(c1+activehub+ceph(zone b) was down). Was able to failover appset pull and discovered apps successfully using the new hub.
But Sub app pods are not showing up after failover from c1 to c2, but PVCs, vrg of are failedover for these apps.
DRPC of sub apps shows it has failedover successfully, but respective app pods are missing in c2:
busybox-sub-1 busybox-sub-1-placement-1-drpc 17h pbyregow-cl1 pbyregow-cl2 Failover FailedOver Completed 2024-07-03T16:04:38Z 2h0m45.152881171s True
vm-pvc-acm-sub1 vm-pvc-acm-sub1-placement-1-drpc 17h pbyregow-cl1 pbyregow-cl2 Failover FailedOver Completed 2024-07-03T16:17:57Z 2h14m58.850396117s True
vm-pvc-acm-sub2 vm-pvc-acm-sub2-placement-1-drpc 17h pbyregow-cl1 pbyregow-cl2 Failover FailedOver Completed 2024-07-03T16:18:03Z 2h14m52.041023629s True
for i in
{busybox-sub-1,vm-pvc-acm-sub1,vm-pvc-acm-sub2};do oc get pod,pvc,vrg -n $i;done
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/busybox-cephfs-pvc-1 Bound pvc-cba9f468-46ee-41de-a6a5-0650e9235b8b 100Gi RWO ocs-external-storagecluster-cephfs <unset> 19h
persistentvolumeclaim/busybox-rbd-pvc-1 Bound pvc-4be77410-ef6b-454f-9835-2b8c111f88c6 100Gi RWO ocs-external-storagecluster-ceph-rbd <unset> 19h
NAME DESIREDSTATE CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/busybox-sub-1-placement-1-drpc primary Primary
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/vm-1-pvc Bound pvc-96184450-4ed0-4879-84a7-76fd3407af7a 512Mi RWX ocs-external-storagecluster-ceph-rbd <unset> 19h
NAME DESIREDSTATE CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/vm-pvc-acm-sub1-placement-1-drpc primary Primary
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/vm-1-pvc Bound pvc-584707a8-81af-4994-9f08-90556b4f26a7 512Mi RWX ocs-external-storagecluster-ceph-rbd <unset> 19h
NAME DESIREDSTATE CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/vm-pvc-acm-sub2-placement-1-drpc primary Primary
Seeing this error in subscription in Acm console for busybox-sub-1 app:
{ggithubcom-red-hat-storage-ocs-workloads-ns/ggithubcom-red-hat-storage-ocs-workloads <nil> [] 0xc0025bd470 [] <nil> nil [] [] false} { 0001-01-01 00:00:00
+0000 UTC
map[]}}: channels.apps.open-cluster-management.io
"ggithubcom-red-hat-storage-ocs-workloads" is forbidden: User
"system:open-cluster-management:cluster:pbyregow-cl2:addon:application-manager:agent:application-manager"
cannot get resource "channels" in API group
"apps.open-cluster-management.io" in the namespace
"ggithubcom-red-hat-storage-ocs-workloads-ns"
Version of all relevant components (if applicable):
OCP: 4.16.0-0.nightly-2024-06-27-091410
ODF: 4.16.0-134
ACM: 2.11.0-137
OADP: 1.4 (latest) hub/managed cluster
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Is there any workaround available to the best of your knowledge?
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
Can this issue reproducible?
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Configured MDR cluster as per the versions listed.
2. Deployed sub, appset pull and discovered apps, applied policies and had them in different states(Deployed/FailedOver/Relocate) on both clusters.
3. Configured backup, waited ~2 hrs to take latest backup. Had the latest backup without any changes in between for any apps.
4. Brought down c1+activehub+3cephnodes
5. Restored on newhub, Restore completed successfully, followed the hub recovery doc to apply appliedManifestWorkEvictionGracePeriod: "24h"
6. DRpolicy reached validated state.
7. Removed appliedManifestWorkEvictionGracePeriod after DRpolicy and drpc recovered.
7. Failedover apps from c1 to c2.
Actual results:
Subscription app pods did not come up after failover post hubrecovery.
Expected results:
Sub apps pods should up along with rest of the resources.
Additional info:
Rest of the apps(appset-pull & disc) got failedover to c2 successfully.
- blocks
-
DFBUGS-98 [2281703] [Tracker] ODF 4.17.0 Release Notes
- Closed
- external trackers