Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-5739

XFS/Ext4 PVC restore fails if volume usage is 100%

    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Moderate
    • 1.667
    • Very Likely
    • 0
    • Customer Escalated, Customer Facing
    • 5
    • None
    • Unset
    • Unknown
    • None

      Dear team,
      when doing a restore of a PVC which was at 100% disk utilization at backup time, the restore will fail with a "disk full" error.

      Steps to reproduce:
      1. create app/pod with a PVC

      2. Fill up this PVC with "dd" or something like this to 100% usage

      3. Do a backup using OADP

      4. Restore from the backup to a new/same namespace

      5. Restore will fail with "disk full" error message and pod using this PVC will hang in "restore-wait" init process.

      Workaround:
      1. Kill hanging pod. It will respawn and come up fine, since the "restore-wait" init process got killed and is no longer stopping pod upstart.

       

      Reason:
      1. PVCs are recreated via stored config

      2. Data is copied to this PVCs from backup files

      3. HERE IT HAPPENS: a "done" file has to be written to a hidden ".velero" directory in the root path of the PVC. And the "restore-wait" process is waiting and looking for this "done" file.

      4. Since PVC is at 100% after data restore, there is no space left on device to create/store this "done" file.

       

      Solution:
      Separate user data on disks from restore information needed by the restore process.

      Mitigation in Lab Setup:
      1. Mount PVC to pod

      2. Create and mount "emtpyDir" to PVCroot/.velero

      3. Userdata with 100% gets restored to PVC recreation. Velero "done" file is written to emptyDir directory and hence has no issues with the original PVC being at 100% usage.

       

      Thanks, Chris

       

              wnstb Wes Hayutin
              rhn-support-ctawfik Chris Tawfik
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: