Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-6525

Dramatic performance dropoff over time - Velero backup performance degradation

XMLWordPrintable

    • Product / Portfolio Work
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      Issue Summary:
      This story addresses a critical performance regression identified in upstream Velero that affects OADP backup operations. Users are experiencing dramatic performance degradation where backups that previously took 30 minutes now take 6+ hours for the same workload.

      Upstream Issue: https://github.com/vmware-tanzu/velero/issues/9169

      Problem Description:

      • Velero v1.11.1: ~300k objects backed up in ~30 minutes (CPU: 1 core, Memory: 3Gi)
      • Velero v1.16.2: Same 300k objects now take ~6 hours (CPU: 3.5 cores, Memory: 4.5Gi)
      • Performance starts fast (~5k objects in seconds) then drops to ~3 objects/sec
      • Resource increases and configuration tuning have not resolved the issue

      Configuration Details:

      • Snapshots and filesystem backup disabled
      • Backup schedule: Daily at 4 AM
      • Includes all namespaces and resources
      • Storage location: default
      • TTL: 888h0m0s

      Attempted Mitigations (unsuccessful):

      • Increased resource requests to 4 cores/6Gi
      • Increased clientPageSize to 700
      • Increased itemBlockWorkerCount to 5
      • Increased clientQPS to 100
      • Increased clientBurst to 100
      • Increased uploaderConfig.parallelFilesUpload to 30

      Impact on OADP:
      This performance regression directly affects OADP users running similar backup workloads and needs investigation for the 1.6.0 release to ensure optimal backup performance.

      Environment:

      • Kubernetes: v1.32.4-gke.1767000
      • Cloud: Google Cloud GKE
      • OS: Container-Optimized OS from Google

      Acceptance Criteria:

      • Investigate the root cause of performance degradation in newer Velero versions
      • Identify if this affects OADP's Velero integration
      • Implement fixes or workarounds for OADP 1.6.0 if needed
      • Ensure backup performance meets acceptable standards for large workloads
      • Document any configuration recommendations for optimal performance

              spampatt@redhat.com Shubham Pampattiwar
              wnstb Wes Hayutin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: