-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Product / Portfolio Work
-
3
-
False
-
-
False
-
Not Selected
-
ToDo
-
-
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
None
This issue tracks the upstream Velero GitHub issue #9225 which is part of the Velero v1.18 milestone.
Description
I would like a better way to keep track of maintenance Job success and failures.
Currently the way was by tracking Job object success with the number of Jobs were set from --keep-latest-maintenance-jobs. Maintenance job failure will eventually cause severe performance degradation of backups over time as kopia based backups continue to succeed.
This argument is going away in Velero 1.17. I would like to propose an alternative solution.
The proposal is to add publishing prometheus metrics as a way of keeping tracking of maintenance job success and failures.
In addition, some cloud providers provide software to trigger emails and other alerts off of Prometheus metrics.
Prometheus metrics are already published regarding dataupload/datadownload.
Interested values:
- success in context of what maintenance job to BackupRepository correspondance
- failure in the same context as success
- execution time of the Job
Upstream Details
- GitHub Issue: https://github.com/vmware-tanzu/velero/issues/9225
- Status: Open
- Assignee: shubham-pampattiwar
- Labels: Metrics
- Created: 2025-09-04T15:40:36Z
- Updated: 2025-09-19T06:23:42Z
This addition makes sense in light of the upcoming Velero 1.17 changes.