-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
scrape sample limit alerts
-
False
-
None
-
False
-
Not Selected
-
NEW
-
To Do
-
NEW
-
0% To Do, 0% In Progress, 100% Done
Our documentation suggests creating an alert after configuring scrape sample limits.
That PrometheusRule object has two alerts configured within it [1]
`ApproachingEnforcedSamplesLimit`
`TargetDown`
The `Targetdown` alert is designed to fire after the `ApproachingEnforcedSamplesLimit` because the target is dropped once the enforced sample limit is reached
The TargetDown alert is creating false positives - its firing for reasons other than pods in the namespace have reached there enforced sample limit (e.g. the metrics endpoint may be down).
User-defined monitoring should provide out-of-the-box metrics that will help with troubleshooting:
- Update Prometheus user-workload to enable additional scrape metrics [2]
- Rewrite the ApproachingEnforcedSamplesLimit alert expression in the OCP documentation like "(scrape_samples_post_metric_relabeling / (scrape_sample_limit > 0)) > 0.9" (which reads as "alert when the number of ingested samples reaches 90% of the configured limit).
- Document how a user would know that a target has hit the limit (e.g. the Targets page should have the information).
[2] - https://prometheus.io/docs/prometheus/latest/feature_flags/#extra-scrape-metrics
- is documented by
-
OBSDOCS-977 Document improved scrape sample alerts
- Closed
- relates to
-
OBSDA-394 Improve scrape sample alerts
- In Progress
- links to