Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: Multi-Cloud Object Gateway
Labels:
- Performance

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2106742
Dev Approval:
?
QE Approval:
?
Target Release:

odf-4.19

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

For small object sizes of 15KiB, with 100 buckets each with 667 objects/bucket (~ 1 GB total dataset size), MCG is unable to saturate the underlying RGW performance in baremetal ODF environment. We ran tests directly on RGW and then on MCG which used RGW backing store. We see a significant differnence in performance. Here's the summarized results:

Seqread : 1725 OPS on RGW whereas 947 OPS on MCG (45% Less performance with MCG)
Seqwrite: 1138 OPS on RGW whereas 240 OPS on MCG (79% Less performance with MCG)
Randread : 1709 OPS on RGW whereas 1037 OPS on MCG (39% Less performance with MCG)
Randwrite: 1132 OPS on RGW whereas 237 OPS on MCG (79% Less performance with MCG)

If you see the numbers above it's evident that RGW is not the bottlneck. Note, MCG was tuned according to the KCS article: https://access.redhat.com/solutions/6719951 . We did not see any resource bottlenck on Noobaa DB, core or Endpoints POD.

We have determined that the problem is in the MCG stack. We have tried the following with no improvement:

Increased the Noobaa DB memory to 8 and eventually to 16
Increase the minimum Endpoints count to 6
Increase the load on the system by increasing Cosbench workers (from 88 to 176 to 352) and drivers (from 4 to 8) to push the system more hard
Try to run two concurrent loads on the system (This actually divides the write perf to ~115 OPS on each, it points to some hard bottleneck in the IO stack of Noobaa)
Increase the PG count on RGW data pool to 128 from default of 32 with auto scale off

oc version
Client Version: 4.10.18
Server Version: 4.10.15
Kubernetes Version: v1.23.5+3afdacb

ocos get csv
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded
ocs-operator.v4.10.4 OpenShift Container Storage 4.10.4 ocs-operator.v4.10.3 Succeeded
odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.3 Succeeded
odf-operator.v4.10.4 OpenShift Data Foundation 4.10.4 odf-operator.v4.10.3 Succeeded

How reproducible: consistently

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)? no - just poor write performance observed

Is there any workaround available to the best of your knowledge? no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)? 1

Can this issue reproducible? yes

Can this issue reproduce from the UI?

If this is a regression, please provide more details to justify this:

Steps to Reproduce:
1. Capture MCG performance
2. Capture underlying RGW performance
3. Compare the two

Additional info:

Output of oc cluster-info dump -n openshift-storage --output-directory="ocs-pod-logs" can be found here: http://perf1.perf.lab.eng.bos.redhat.com/shberry/MCG_rgw/

external trackers

Github noobaa/noobaa-core/pull/7025

Github noobaa/noobaa-core/pull/7110

Github noobaa/noobaa-core/pull/7486

Assignee:: Unassigned

Reporter:: Matthew Currier

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2022/07/13 12:35 PM

Updated:: 2025/04/02 8:22 AM

Resolved:: 2025/04/02 8:22 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty