Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-4459

Performance Degradation using Kopia on Restore for Large Files Compared to OADP 1.3.x

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • OADP 1.5.0
    • OADP 1.4.0
    • kopia
    • 4
    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-operator-bundle-container-1.4.1-20
    • ToDo
    • 0
    • 0.000
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • Yes

      We have observed a significant performance degradation in the restore operation using Kopia for large files in our regression tests. Specifically, in Case 2.4.1.9, the restore of a single namespace containing 100 files, each of 10GB, took 0:51:17 on the new version of Kopia compared to 0:13:53 on OADP 1.3.1-54.

      This degradation may be related to the size and count of the files being restored. While the new Kopia version shows performance improvements in other cases with smaller file sizes, it appears to struggle with larger files, resulting in a noticeable degradation compared to previous OADP versions 1.3.0-1.3.1.

      Steps to Reproduce:

      1. Use Kopia to back up a single namespace with 100 files, each sized at 10GB.
      2. Measure the time taken to restore this backup using the new version of Kopia.
      3. Compare the restore time with the restore time on OADP 1.3.1-54.

      Expected Result:

      The restore time for large files should be comparable to or better than the restore time on OADP 1.3.1-54.

      Actual Result:

      The restore time for large files on the new Kopia version is significantly longer than on OADP 1.3.1-54, indicating a performance degradation.

      Additional Information:

      • Case Reference: 2.4.1.9
      • Restore Time on New Kopia: 0:51:17
      • Restore Time on OADP 1.3.1-54: 0:13:53
      • Observed Degradation: Approximately 269.75%

      Notes:

      • this cycle was executed with the same OADP version on both clouds 33 & 15  ( reproduce twice )
      • I have checked all the relevant logs, CRs, and related objects, and nothing looks suspicious. The only potentially related information was found in the node-agent pod logs with warnings at the error level.
      time="2024-07-05T18:16:38Z" level=warning msg="active indexes [xn0_0b51efb539698aecc1c85a27c7ee2f5a-sc3f440f8b765623c12a-c1 xn0_3dc75c368016affc51aedc53f46591ae-s00a02b9b28c1e36912a-c1 xn0_54bbd1c54996c528f3a806fcf18866da-sd37c4c5641f83a6212a-c1 xn0_8f3500db186d5626e13909e844a71cf8-sb2d80086508d9ac212a-c1 xn0_e48aef60a63e17ec66e160852f000312-se910052600473a3e12a-c1] deletion watermark 0001-01-01 00:00:00 +0000 UTC" PodVolumeRestore=openshift-adp/restore-kopia-pvc-util-2-4-1-9-cephrbd-100f-10gb-1001g-vft6s controller=PodVolumeRestore logModule=kopia/kopia/format logSource="/remote-source/velero/app/pkg/kopia/kopia_log.go:101" logger name="[index-blob-manager]" pod=perf-datagen-case3-cephrbd/deploy-perf-datagen-2-4-1-9-1200gi-1-rbd-0-c7bc487ff-9csct restore=openshift-adp/restore-kopia-pvc-util-2-4-1-9-cephrbd-100f-10gb-1001g sublevel=error
      time="2024-07-05T18:31:38Z" level=warning msg="active indexes [xn0_0b51efb539698aecc1c85a27c7ee2f5a-sc3f440f8b765623c12a-c1 xn0_3dc75c368016affc51aedc53f46591ae-s00a02b9b28c1e36912a-c1 xn0_54bbd1c54996c528f3a806fcf18866da-sd37c4c5641f83a6212a-c1 xn0_8f3500db186d5626e13909e844a71cf8-sb2d80086508d9ac212a-c1 xn0_e48aef60a63e17ec66e160852f000312-se910052600473a3e12a-c1] deletion watermark 0001-01-01 00:00:00 +0000 UTC" PodVolumeRestore=openshift-adp/restore-kopia-pvc-util-2-4-1-9-cephrbd-100f-10gb-1001g-vft6s controller=PodVolumeRestore logModule=kopia/kopia/format logSource="/remote-source/velero/app/pkg/kopia/kopia_log.go:101" logger name="[index-blob-manager]" pod=perf-datagen-case3-cephrbd/deploy-perf-datagen-2-4-1-9-1200gi-1-rbd-0-c7bc487ff-9csct restore=openshift-adp/restore-kopia-pvc-util-2-4-1-9-cephrbd-100f-10gb-1001g sublevel=error
      time="2024-07-05T18:46:38Z" level=warning msg="active indexes [xn0_0b51efb539698aecc1c85a27c7ee2f5a-sc3f440f8b765623c12a-c1 xn0_3dc75c368016affc51aedc53f46591ae-s00a02b9b28c1e36912a-c1 xn0_54bbd1c54996c528f3a806fcf18866da-sd37c4c5641f83a6212a-c1 xn0_8f3500db186d5626e13909e844a71cf8-sb2d80086508d9ac212a-c1 xn0_e48aef60a63e17ec66e160852f000312-se910052600473a3e12a-c1] deletion watermark 0001-01-01 00:00:00 +0000 UTC" PodVolumeRestore=openshift-adp/restore-kopia-pvc-util-2-4-1-9-cephrbd-100f-10gb-1001g-vft6s controller=PodVolumeRestore logModule=kopia/kopia/format logSource="/remote-source/velero/app/pkg/kopia/kopia_log.go:101" logger name="[index-blob-manager]" pod=perf-datagen-case3-cephrbd/deploy-perf-datagen-2-4-1-9-1200gi-1-rbd-0-c7bc487ff-9csct restore=openshift-adp/restore-kopia-pvc-util-2-4-1-9-cephrbd-100f-10gb-1001g sublevel=error 
      • Further investigation is needed to determine the root cause of this performance issue and to identify potential optimizations for handling large file restores in the new Kopia version.

      OCP : 4.16.0
      OADP : 1.4.0-13
      ODF : 4.15.4

      full logs from both clouds can be found here : 
      https://drive.google.com/drive/folders/1AXKQHLQ_2fYxwU_tR5UJAZeofL5yQIJ8?usp=sharing

              rhn-engineering-mpryc Michal Pryc
              tzahia Tzahi Ashkenazi
              Tzahi Ashkenazi Tzahi Ashkenazi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: