Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-837

[2106742] Poor performance of NooBaa for small Object size (15KiB) on RGW backing store - unable to saturate underlying RGW performance in bare metal ODF envt

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • None

      For small object sizes of 15KiB, with 100 buckets each with 667 objects/bucket (~ 1 GB total dataset size), MCG is unable to saturate the underlying RGW performance in baremetal ODF environment. We ran tests directly on RGW and then on MCG which used RGW backing store. We see a significant differnence in performance. Here's the summarized results:

      Seqread : 1725 OPS on RGW whereas 947 OPS on MCG (45% Less performance with MCG)
      Seqwrite: 1138 OPS on RGW whereas 240 OPS on MCG (79% Less performance with MCG)
      Randread : 1709 OPS on RGW whereas 1037 OPS on MCG (39% Less performance with MCG)
      Randwrite: 1132 OPS on RGW whereas 237 OPS on MCG (79% Less performance with MCG)

      If you see the numbers above it's evident that RGW is not the bottlneck. Note, MCG was tuned according to the KCS article: https://access.redhat.com/solutions/6719951 . We did not see any resource bottlenck on Noobaa DB, core or Endpoints POD.

      We have determined that the problem is in the MCG stack. We have tried the following with no improvement:

      • Increased the Noobaa DB memory to 8 and eventually to 16
      • Increase the minimum Endpoints count to 6
      • Increase the load on the system by increasing Cosbench workers (from 88 to 176 to 352) and drivers (from 4 to 8) to push the system more hard
      • Try to run two concurrent loads on the system (This actually divides the write perf to ~115 OPS on each, it points to some hard bottleneck in the IO stack of Noobaa)
      • Increase the PG count on RGW data pool to 128 from default of 32 with auto scale off

      oc version
      Client Version: 4.10.18
      Server Version: 4.10.15
      Kubernetes Version: v1.23.5+3afdacb

      ocos get csv
      NAME DISPLAY VERSION REPLACES PHASE
      mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded
      ocs-operator.v4.10.4 OpenShift Container Storage 4.10.4 ocs-operator.v4.10.3 Succeeded
      odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.3 Succeeded
      odf-operator.v4.10.4 OpenShift Data Foundation 4.10.4 odf-operator.v4.10.3 Succeeded

      How reproducible: consistently

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)? no - just poor write performance observed

      Is there any workaround available to the best of your knowledge? no

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)? 1

      Can this issue reproducible? yes

      Can this issue reproduce from the UI?

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1. Capture MCG performance
      2. Capture underlying RGW performance
      3. Compare the two

      Additional info:

      Output of oc cluster-info dump -n openshift-storage --output-directory="ocs-pod-logs" can be found here: http://perf1.perf.lab.eng.bos.redhat.com/shberry/MCG_rgw/

              rh-ee-aprinzse Amit Prinz Setter
              mcurrier@redhat.com Matthew Currier
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: