Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-7274

Create alert for signaling when there are errors when evaluating rules

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • Logging 6.4.z
    • Logging 6.0.9, Logging 6.1.7, Logging 6.2.3, Logging 6.3.0, Logging 6.4.0
    • Log Storage
    • None
    • Future Sustainability
    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • NEW
    • ASSIGNED
    • Feature
    • Logging - Sprint 277, Logging - Sprint 279

      Note: This issue has been converted from a bug to a story. Some information might still refer to its bug status.

      Description of problem: 

      When AlertingRule is created with a non-aggregation query, the rule does not fire when dataModel is 'Otel' and gives a 'Maximum of series (500) reached for a single query' issue. The same query is firing when dataModel is 'Viaq'.

      Version-Release number of selected component (if applicable):

      Logging 6.3

      How reproducible:

      Always

      Steps to Reproduce:

      1. Forward logs to LokiStack using CLO v6.3
      2. Create AlertingRule
      3. Observe Ruler logs and console for firing Alerts

      Actual results:

      Alert is not firing for Otel Model but fires for Viaq model.

      Expected results:

      Alert should be firing for both Otel and Viaq dataModel.

      Additional info:

      AlertingRule:

      apiVersion: loki.grafana.com/v1
      kind: AlertingRule
      metadata:
        labels:
          openshift.io/cluster-monitoring: 'true'
        name: my-workload-alert
        namespace: my-app
      spec:
        groups:
          - interval: 1m
            name: MyApplication
            rules:
              - alert: MyApplicationLogVolumeIsHigh
                annotations:
                  description: My application has high amount of logs.
                  summary: project "my-app" log volume is high.
                expr: >
                  count_over_time({k8s_namespace_name="my-app"}[2m])
                  > 10
                for: 5m
                labels:
                  severity: info
                  project: my-app
        tenantID: application

      When the AlertingRule is updated to an aggregation query, the alertingRule starts to fire for Otel Model.

      AlertingRule fires for:

      sum(count_over_time({log_type="application", k8s_namespace_name="my-app"}[5m]))by(k8s_namespace_name) >2 

      AlertingRule does not fire for:

      count_over_time({k8s_namespace_name="my-app"}[2m]) > 10

              Unassigned Unassigned
              rhn-support-kbharti Kabir Bharti
              Kabir Bharti Kabir Bharti
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: