Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: odf-4.18
Affects Version/s: odf-4.17
Component/s: Multi-Cloud Object Gateway
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2321994
Dev Approval:
?
QE Approval:
?
Release Note Type:
If docs needed, set a value
Intelligence Requested:
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem (please be detailed as possible and provide log
snippests):

Version of all relevant components (if applicable):
OCP 4.17.0-0.nightly-2024-10-20-231827
ODF 4.17.0-126
ACM 2.12.0-DOWNSTREAM-2024-10-18-21-57-41
OpenShift Virtualization 4.17.1-19
Submariner 0.19 unreleased downstream image 846949
ceph version 18.2.1-229.el9cp (ef652b206f2487adfc86613646a4cac946f6b4e0) reef (stable)
OADP 1.4.1
OpenShift GitOps 1.14.0
VolSync 0.10.1

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Is there any workaround available to the best of your knowledge? After restart of all noobaa pods on both C1 and C2 ODF clusters, backup resumed.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

Can this issue reproducible?

Can this issue reproduce from the UI?

If this is a regression, please provide more details to justify this:

Steps to Reproduce:
1. On a ODF Regional DR setup, deploy a CNV workload for discovered app using data volume template from https://github.com/RamenDR/ocm-ramen-samples/tree/main/workloads/kubevirt/vm-dvt/odr-regional.
2. Create a snapshot of the pvc and restore it as a PVC.
3. Delete the snapshot and the workload except the data volume and PVC.
4. Create the workload again in a way that it now consumes the existing snapshot-restored PVC already available. The VM should use this PVC and not create a new one.
5. Repeat the above steps to clone a PVC instead of snapshot in another namespace for another CNV workload.
6. DR protect these workloads with a unique label on the required resources such as VM, Datavolume, PVC and secret to be backed up to odrbucket created by ramen on the primary managed cluster.
7. During DR protection, ensure backups are being done every 5mins and noobaa s3 is accessible.
8. Run IOs for a few days (4-5 days in this case) and check if regular backups are being taken and Noobaa S3 remains accessible.

Actual results: [RDR] Noobaa S3 becomes unreachable after a few days hence backup stops for discovered apps

Backup stopped somewhere on 25th OCT 2024 while s3 was accessible before that.
The cluster was idle during this time. No node related or any other operation was performed by me (or at least it's unknown to me).

Must gather logs from the setup- http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/26oct24/
where C1 and C2 are the ODF clusters

Hub/RHACM- where RHACM is installed but not the ODF

From C1-

s3cmd ls
ERROR: Error parsing xml: Malformed error XML returned from remote server.. ErrorXML: b"<html><body><h1>504 Gateway Time-out</h1>\nThe server didn't respond in time.\n</body></html>\n"
WARNING: Retrying failed request: / (504 (Gateway Time-out))
WARNING: Waiting 3 sec...
^CSee ya!

However, all the Noobaa pods were up and running on both the managed clusters.

Expected results: Noobaa s3 should remain accessible even on long running setups.

Additional info:

Assignee:: Ben Eli

Reporter:: Aman Agrawal

QA Contact:: Krishnaram Karthick Ramdoss

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2024/10/27 9:25 AM

Updated:: 2024/11/10 12:11 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty