Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61262

Duplicate `prometheusrules.monitoring.coreos.com`

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.16
    • Monitoring
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

       
      "duplicate" PrometheusRule objects observed. 
      
      ~~~    
      oc get -n openshift-monitoring prometheusrules.monitoring.coreos.com
      NAME                                           AGE
      alertmanager-main-rules                        2y149d
      bis-custom-alertingrules-bd731a                153d
      bis-custom-alertingrules-d4367e                509d
      ~~~
      
      one with 
      
      ~~~
      labels:
          app.kubernetes.io/component: alerting-rules-controller
          app.kubernetes.io/name: cluster-monitoring-operator
          app.kubernetes.io/part-of: openshift-monitoring
          app.kubernetes.io/version: 4.16.38
          prometheus: k8s
          role: alerting-rules
      ~~~
      
      the other one with 
      
      ~~~
      labels:
          app.kubernetes.io/component: alerting-rules-controller
          app.kubernetes.io/name: cluster-monitoring-operator
          app.kubernetes.io/part-of: openshift-monitoring
          app.kubernetes.io/version: 4.14.29
          prometheus: k8s
          role: alerting-rules
      ~~~
      
      they both have an ownerReferences pointing to the same uid 8a7ef0f2-db18-4c10-9a56-3df02e4885a7
      The rules have changed over time, and the customer observe that in the prometheus-k8s-rulefiles-0 configmap, managed by the operator, there are 2 versions of some of those rules

      Version-Release number of selected component (if applicable):

       4.16    

      How reproducible:

         Unable to reproduce  

      Steps to Reproduce:

         
          

      Actual results:

      Example: the bis-PersistentVolumeUsageCritical alert definition
      It's defined only once in the Alertingrules config, but found twice in the prometheus-k8s-rulefiles-0 configmap
      The order of those duplicate entries in the prometheus-k8s-rulefiles-0 configmap changes between clusters, so effectively in some clusters we are using the new version of the rule, while in others we are still using the old version of the rule.   

      Expected results:

      Updated rules should replace existing rules  

      Additional info:

          

              jfajersk@redhat.com Jan Fajerski
              rhn-support-nigsmith Nigel Smith
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: