[OADP-5739] XFS/Ext4 PVC restore fails if volume usage is 100% - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: OADP 1.6.0
Affects Version/s: None
Component/s: velero
Labels:

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
QEStatus:
ToDo
Intelligence Requested:
Market:

Severity:
Moderate
WSJF:
1.667
Risk Probability:
Very Likely
Risk Score:
0
Customer Impact:

Customer Escalated, Customer Facing
Cost of Delay:
5

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Dear team,
when doing a restore of a PVC which was at 100% disk utilization at backup time, the restore will fail with a "disk full" error.

Steps to reproduce:
1. create app/pod with a PVC

2. Fill up this PVC with "dd" or something like this to 100% usage

3. Do a backup using OADP

4. Restore from the backup to a new/same namespace

5. Restore will fail with "disk full" error message and pod using this PVC will hang in "restore-wait" init process.

Workaround:
1. Kill hanging pod. It will respawn and come up fine, since the "restore-wait" init process got killed and is no longer stopping pod upstart.

Reason:
1. PVCs are recreated via stored config

2. Data is copied to this PVCs from backup files

3. HERE IT HAPPENS: a "done" file has to be written to a hidden ".velero" directory in the root path of the PVC. And the "restore-wait" process is waiting and looking for this "done" file.

4. Since PVC is at 100% after data restore, there is no space left on device to create/store this "done" file.

Solution:
Separate user data on disks from restore information needed by the restore process.

Mitigation in Lab Setup:
1. Mount PVC to pod

2. Create and mount "emtpyDir" to PVCroot/.velero

3. Userdata with 100% gets restored to PVC recreation. Velero "done" file is written to emptyDir directory and hence has no issues with the original PVC being at 100% usage.

Thanks, Chris

links to

https://github.com/vmware-tanzu/velero/issues?q=is%3Aissue%20%22no%20space%20left%20on%20device%22%20

https://redhat-internal.slack.com/archives/C0144ECKUJ0/p1741796339331429

Assignee:: Wes Hayutin

Reporter:: Chris Tawfik

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/03/07 9:57 AM

Updated:: 2025/04/16 6:01 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide