-
Story
-
Resolution: Duplicate
-
Normal
-
None
-
False
-
False
The current integration of prometheus-adapter in OpenShift uses the platform Prometheus as a backend to get metrics. The problem with this design is that we are getting metrics from 2 different Prometheus instances which don't have replicated data, so two queries sent at the same time to prometheus-adapter might yield different results since the underlying promQL queries executed by prometheus-adapter might be on different Prometheus servers. The consequence is that we end up having inconsistent data across multiple autoscaling requests.
This can be easily tested by running:
$ while true ; do date; oc adm top pod -n openshift-monitoring prometheus-k8s-0 ; echo; sleep 1 ;done Mon Jul 26 03:55:07 EDT 2021 NAME CPU(cores) MEMORY(bytes) prometheus-k8s-0 208m 4879Mi Mon Jul 26 03:55:08 EDT 2021 NAME CPU(cores) MEMORY(bytes) prometheus-k8s-0 246m 4877Mi Mon Jul 26 03:55:09 EDT 2021 NAME CPU(cores) MEMORY(bytes) prometheus-k8s-0 208m 4879Mi Mon Jul 26 03:55:10 EDT 2021 NAME CPU(cores) MEMORY(bytes) prometheus-k8s-0 246m 4877Mi
This isn't a bug in itself since it was designed that way, but we could do better by using thanos-querier as a backend instead of the platform Prometheus because it will duplicate the metrics from both instances and serve one consistent result based on the data that it will get from the Prometheuses.
DoD:
- Use thanos-querier as a backend for prometheus-adapter
- clones
-
OBSDOCS-64 Improve prometheus-adapter consistency
- Closed
- is duplicated by
-
OBSDOCS-215 Explicitly document how to enable dedicated Service Monitors
- Closed