-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.17
-
None
Description of problem (please be detailed as possible and provide log
snippests):
Version of all relevant components (if applicable):
OCP 4.17.0-0.nightly-2024-10-20-231827
ODF 4.17.0-126
ACM 2.12.0-DOWNSTREAM-2024-10-18-21-57-41
OpenShift Virtualization 4.17.1-19
Submariner 0.19 unreleased downstream image 846949
ceph version 18.2.1-229.el9cp (ef652b206f2487adfc86613646a4cac946f6b4e0) reef (stable)
OADP 1.4.1
OpenShift GitOps 1.14.0
VolSync 0.10.1
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Is there any workaround available to the best of your knowledge? After restart of all noobaa pods on both C1 and C2 ODF clusters, backup resumed.
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
Can this issue reproducible?
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. On a ODF Regional DR setup, deploy a CNV workload for discovered app using data volume template from https://github.com/RamenDR/ocm-ramen-samples/tree/main/workloads/kubevirt/vm-dvt/odr-regional.
2. Create a snapshot of the pvc and restore it as a PVC.
3. Delete the snapshot and the workload except the data volume and PVC.
4. Create the workload again in a way that it now consumes the existing snapshot-restored PVC already available. The VM should use this PVC and not create a new one.
5. Repeat the above steps to clone a PVC instead of snapshot in another namespace for another CNV workload.
6. DR protect these workloads with a unique label on the required resources such as VM, Datavolume, PVC and secret to be backed up to odrbucket created by ramen on the primary managed cluster.
7. During DR protection, ensure backups are being done every 5mins and noobaa s3 is accessible.
8. Run IOs for a few days (4-5 days in this case) and check if regular backups are being taken and Noobaa S3 remains accessible.
Actual results: [RDR] Noobaa S3 becomes unreachable after a few days hence backup stops for discovered apps
Backup stopped somewhere on 25th OCT 2024 while s3 was accessible before that.
The cluster was idle during this time. No node related or any other operation was performed by me (or at least it's unknown to me).
Must gather logs from the setup- http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/26oct24/
where C1 and C2 are the ODF clusters
Hub/RHACM- where RHACM is installed but not the ODF
From C1-
s3cmd ls
ERROR: Error parsing xml: Malformed error XML returned from remote server.. ErrorXML: b"<html><body><h1>504 Gateway Time-out</h1>\nThe server didn't respond in time.\n</body></html>\n"
WARNING: Retrying failed request: / (504 (Gateway Time-out))
WARNING: Waiting 3 sec...
^CSee ya!
However, all the Noobaa pods were up and running on both the managed clusters.
Expected results: Noobaa s3 should remain accessible even on long running setups.
Additional info: