-
Task
-
Resolution: Done
-
Normal
-
None
-
5
-
False
-
-
False
-
-
-
OBSDOCS (May 6 - May 28) #254, OBSDOCS (May 27 - Jun 17) #255
Our documentation suggests creating an alert after configuring scrape sample limits.
That PrometheusRule object has two alerts configured within it [1]
`ApproachingEnforcedSamplesLimit`
`TargetDown`
The `Targetdown` alert is designed to fire after the `ApproachingEnforcedSamplesLimit` because the target is dropped once the enforced sample limit is reached
The TargetDown alert is creating false positives - its firing for reasons other than pods in the namespace have reached there enforced sample limit (e.g. the metrics endpoint may be down).
User-defined monitoring should provide out-of-the-box metrics that will help with troubleshooting:
- Update Prometheus user-workload to enable additional scrape metrics [2]
- Rewrite the ApproachingEnforcedSamplesLimit alert expression in the OCP documentation like "(scrape_samples_post_metric_relabeling / (scrape_sample_limit > 0)) > 0.9" (which reads as "alert when the number of ingested samples reaches 90% of the configured limit).
- Document how a user would know that a target has hit the limit (e.g. the Targets page should have the information).
[2] - https://prometheus.io/docs/prometheus/latest/feature_flags/#extra-scrape-metrics
- documents
-
MON-3256 Improve scrape sample alerts
- Closed
- links to