Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-1949

Improve prometheus-adapter consistency

XMLWordPrintable

    • False
    • False
    • NEW
    • VERIFIED
    • Monitoring - Sprint 208, Sprint 224, MON Sprint 225

      The current integration of prometheus-adapter in OpenShift uses the platform Prometheus as a backend to get metrics. The problem with this design is that we are getting metrics from 2 different Prometheus instances which don't have replicated data, so two queries sent at the same time to prometheus-adapter might yield different results since the underlying promQL queries executed by prometheus-adapter might be on different Prometheus servers. The consequence is that we end up having inconsistent data across multiple autoscaling requests.

      This can be easily tested by running:

      $ while true ; do date; oc adm top pod -n openshift-monitoring  prometheus-k8s-0 ; echo; sleep 1 ;done 
      
      Mon Jul 26 03:55:07 EDT 2021
      NAME               CPU(cores)   MEMORY(bytes)   
      prometheus-k8s-0   208m         4879Mi          
      
      Mon Jul 26 03:55:08 EDT 2021                               
      NAME               CPU(cores)   MEMORY(bytes)   
      prometheus-k8s-0   246m         4877Mi          
      
      Mon Jul 26 03:55:09 EDT 2021                               
      NAME               CPU(cores)   MEMORY(bytes)   
      prometheus-k8s-0   208m         4879Mi          
      
      Mon Jul 26 03:55:10 EDT 2021
      NAME               CPU(cores)   MEMORY(bytes)   
      prometheus-k8s-0   246m         4877Mi          
      

      This isn't a bug in itself since it was designed that way, but we could do better by using thanos-querier as a backend instead of the platform Prometheus because it will duplicate the metrics from both instances and serve one consistent result based on the data that it will get from the Prometheuses.

      DoD:

      • Use thanos-querier as a backend for prometheus-adapter

        1. 2022-08-18-142920_1595x794.png
          97 kB
          Jan Fajerski
        2. 2022-08-18-145803_1597x795.png
          82 kB
          Jan Fajerski
        3. cpu usage with different honortimestamps.png
          107 kB
          Simon Pasquier
        4. real timestamps with honortimestamps true.png
          94 kB
          Simon Pasquier
        5. staleness.png
          100 kB
          Simon Pasquier

              jfajersk@redhat.com Jan Fajerski
              dgrisonn@redhat.com Damien Grisonnet
              Tai Gao Tai Gao
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: