Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-1720

Metrics could be confusing when there are several velero instances.

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Low
    • 3
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

      I have a customer scenario where we see:

      velero get backups
      cluster-backup-full-20221213074843 PartiallyFailed 18 0 2022-12-13 02:48:43 -0500 EST 6d velero-backup-1 <none>
      cluster-backup-full-20221212081142 PartiallyFailed 25 0 2022-12-12 03:11:47 -0500 EST 5d velero-backup-1 <none>
      cluster-backup-full-20221207055243 Completed 0 0 2022-12-07 00:52:43 -0500 EST 21h velero-backup-1 <none>
      etcd-backup-velero-schedule-20221213074843 Completed 0 0 2022-12-13 02:51:57 -0500 EST 29d velero-backup-1

      But as metrics are shown only in latest instance, we can see something of these sort:

      velero_backup_partial_failure_total

      {schedule=""}

      0
      velero_backup_partial_failure_total

      {schedule="cluster-backup-full"}

      1
      velero_backup_partial_failure_total

      {schedule="etcd-backup-velero-schedule"}

      0

      Apparently, this generates confusion in what can be seen and the metrics exposed to prometheus.

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Expected results:

      Additional info:

              rhn-engineering-mpryc Michal Pryc
              rhn-support-gparente German Parente
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: