-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
odf-4.14
-
None
Description of problem (please be detailed as possible and provide log
snippests):
Following the docs to set drcluster annotations:
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.14/html/configuring_openshift_data_foundation_disaster_recovery_for_openshift_workloads/metro-dr-solution#add-fencing-annotations-to-drclusters_mdr
The instructions use a generic secret name for all drclusters:
drcluster.ramendr.openshift.io/storage-secret-name: rook-csi-rbd-provisioner
However this caused Fencing of the primary cluster to immediately fail in my case:
- oc describe NetworkFence -A --context dc2
Name: network-fence-dc1
[...]
Secret:
Name: rook-csi-rbd-provisioner
Namespace: openshift-storage
Status:
Message: rpc error: code = InvalidArgument desc = secrets "rook-csi-rbd-provisioner" not found
I could only get to a Fenced state by editing drcluster dc1 to use the dc2 secret name:
- oc --context dc1 get secret -A | grep rook-csi-rbd-provisioner
openshift-storage rook-csi-rbd-provisioner-cluster1-rbdpool Opaque [...] - oc --context dc2 get secret -A | grep rook-csi-rbd-provisioner
openshift-storage rook-csi-rbd-provisioner-cluster2-rbdpool Opaque [...]
- oc edit drcluster dc1
- I configured dc2 secret name:
drcluster.ramendr.openshift.io/storage-secret-name: rook-csi-rbd-provisioner-cluster2-rbdpool
- I configured dc2 secret name:
Note, I think this is because when I ran the ceph-external-cluster-details-exporter.py script to configure the connection to the external RHCS, I used "--cluster-name cluster[1|2] --restricted-auth-permission true" to differentiate between the two clusters as mentioned in:
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.14/html/deploying_openshift_data_foundation_in_external_mode/deploy-openshift-data-foundation-using-red-hat-ceph-storage#creating-an-openshift-data-foundation-cluster-service-for-external-storage_ceph-external
This is because if I used "--run-as-user client.odf.cluster1" as some docs suggested I got this error in the ceph admin journal:
ceph-mon[16028]: cephx server client.odf.cluster1: couldn't find entity name: client.odf.cluster1
So if it is a valid option to use --cluster-name when configuring an external cluster, it seems that the ramendr storage-secret-name should either be generic so all clusters can reference the same annotation, or if that is not possible the docs should mention that annotation on the primary drcluster should reference the secondary drcluster secret name (and vise versa?)?
Version of all relevant components (if applicable):
OCP 4.14.6
4.14.3-rhodf
RHCS 6.1 on RHEL 9.2 nodes
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
The procedure to debug the Fencing failure was not clear at first, but I have recovered.
Is there any workaround available to the best of your knowledge?
Yes I worked around by changing the annotation.
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4
Can this issue reproducible?
Yes
Can this issue reproduce from the UI?
Yes
If this is a regression, please provide more details to justify this:
No
Steps to Reproduce:
1. Create external ceph connections using --cluster-name
2. Follow DR annotation docs
3. Fencing errors will occur
Actual results:
Fencing error due to generic secret name
Expected results:
Clear docs