Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-6175

ClusterResourceQuotas can cause Restores to fail due to PVC request.storage

XMLWordPrintable

    • Future Sustainability
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Moderate
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      Description of problem:

      When a ClusterResourceQuota with a limit on requests.storage goes across both the OADP/Velero namespaces and the application to Restore can cause the restore to fail with unable to allocate the PVC in the OADP/Velero namespace.

       

      During restore with kopia datamovers a template PVC is first placed in the application namespace. The requests.storage field is required. This PVC is stuck in Pending until Restore of the volume finishes due to spec.selector and no matching PV.

      During datamovement the node-agent uses the application PVC as a template. The actual PVC used for the Restore is created in the OADP/Velero namespace.

      Both namespace PVCs have a request.storage field set. If both OADP/Velero and application namespaces are covered under the ClusterResourceQuota namespace label/annotation selectors then the storage requests is counted twice despite no extra actual storage is allocated or in use. 

      If the amount requested exceed the hard limit the PVC allocation is rejected.

      The DataDownload will report the error in the status.message field.

      Version-1.4.4

      How reproducible:

       

      Steps to Reproduce:
      1. Install OADP and an application that uses PVCs in alternative namespace ie. "test-app".
      2. Configure OADP to use kopia datamovers and CSI snapshot.
      3. Label the namespaces of both applications with "application: test"
      4. Create a ClusterResourceQuota with the quota value equal to that in use by "test-app" namespace.

      apiVersion: quota.openshift.io/v1
      kind: ClusterResourceQuota
      metadata:
        name: example
      spec:
        quota:
          hard:
            requests.storage: 250Gi
        selector:
          labels:
            matchLabels:
              application: test

      5. Backup the application.
      6. Delete the namespace "test-app"
      7. Perform a Restore.

       

      Actual results:

      PVC creation fails. See attached image. The error message from the DataDownload is shown below.

       

      Expected results:

      Restore to succeed.

      Additional info:

      I don't see an easy way of fixing this. The PVC template system is essential for Velero kopia-based restores. 

      And the only other way of fixing this is modifying OpenShift quota system for a status.capacity value which is the actual storage in use rather than requests.storage value.

      Workarounds:

      1. Temporarily remove the ClusterResourceQuota
      2. Do not include OADP/Velero namespaces in storage.request ClusterResourceQuotas.
      3. Temporarily increase the ClusterResourceQuota values so that the Restore may succeed.

      Severity

      Due to the availability of multiple workarounds the severity is moderate. 

      Not low severity due to requiring manual intervention for the workarounds.

              wnstb Wes Hayutin
              msfrucht_rh Michael Fruchtman (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: