XMLWordPrintable

    • Product / Portfolio Work
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • ToDo
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      This issue tracks the upstream Velero GitHub issue #9225 which is part of the Velero v1.18 milestone.

      Description

      I would like a better way to keep track of maintenance Job success and failures.

      Currently the way was by tracking Job object success with the number of Jobs were set from --keep-latest-maintenance-jobs. Maintenance job failure will eventually cause severe performance degradation of backups over time as kopia based backups continue to succeed.

      This argument is going away in Velero 1.17. I would like to propose an alternative solution.

      The proposal is to add publishing prometheus metrics as a way of keeping tracking of maintenance job success and failures.

      In addition, some cloud providers provide software to trigger emails and other alerts off of Prometheus metrics.

      Prometheus metrics are already published regarding dataupload/datadownload.

      Interested values:

      • success in context of what maintenance job to BackupRepository correspondance
      • failure in the same context as success
      • execution time of the Job

      Upstream Details

      This addition makes sense in light of the upcoming Velero 1.17 changes.

              wnstb Wes Hayutin
              wnstb Wes Hayutin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: