Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1803

compliance_operator_compliance_scan_error_total metric has a problematic error label

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.12
    • Compliance Operator
    • None
    • 2
    • CMP Sprint 61
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The compliance_operator_compliance_scan_error_total metric has an "error" label which will have a different value every time a different error message is seen. This goes against good instrumentation practices because
      1) it is impossible to predict the cardinality of the metric (e.g. how many label key/value combinations could exist at the same time for the compliance_operator_compliance_scan_error_total metric). Unbounded metrics like this can put lots of memory load on Prometheus.
      2) the label value can be very long, leading to issues when users want to push the metric to other systems (see https://issues.redhat.com/browse/OBSDA-205).
      And in practice, alerting on the metric is likely to be complicated.
      

      Version-Release number of selected component (if applicable):

      4.12 (but probably applies to previous versions too)
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Trigger a scan that will fail.
      2. Go to OCP console > metrics page and query "compliance_operator_compliance_scan_error_total".
      3.
      

      Actual results:

      compliance_operator_compliance_scan_error_total metric with an "error" label containing an error message.
      

      Expected results:

      No "error" label.
      

      Additional info:

      https://prometheus.io/docs/practices/naming/#labels
      https://github.com/openshift/compliance-operator/blob/master/doc/usage.md#metrics
      https://issues.redhat.com/browse/OBSDA-205
      

            lbragsta@redhat.com Lance Bragstad
            spasquie@redhat.com Simon Pasquier
            Xiaojie Yuan Xiaojie Yuan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: