Loading...

XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- groomed

Epic Name:
scrape sample limit alerts
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Docs QE Status:
NEW
Epic Status:
To Do
QE Status:
NEW
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done

Target Version:

openshift-4.16

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Our documentation suggests creating an alert after configuring scrape sample limits.

That PrometheusRule object has two alerts configured within it [1]

`ApproachingEnforcedSamplesLimit`

`TargetDown`

The `Targetdown` alert is designed to fire after the `ApproachingEnforcedSamplesLimit` because the target is dropped once the enforced sample limit is reached

The TargetDown alert is creating false positives - its firing for reasons other than pods in the namespace have reached there enforced sample limit (e.g. the metrics endpoint may be down).

User-defined monitoring should provide out-of-the-box metrics that will help with troubleshooting:

Update Prometheus user-workload to enable additional scrape metrics [2]
Rewrite the ApproachingEnforcedSamplesLimit alert expression in the OCP documentation like "(scrape_samples_post_metric_relabeling / (scrape_sample_limit > 0)) > 0.9" (which reads as "alert when the number of ingested samples reaches 90% of the configured limit).
Document how a user would know that a target has hit the limit (e.g. the Targets page should have the information).

[1] - https://docs.openshift.com/container-platform/4.12/monitoring/configuring-the-monitoring-stack.html#creating-scrape-sample-alerts_configuring-the-monitoring-stack

[2] - https://prometheus.io/docs/prometheus/latest/feature_flags/#extra-scrape-metrics

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

pending_alerts.png
2024/04/10 6:36 AM
376 kB
Jayapriya Pai
query_result_greater_than_zero.png
2024/04/10 6:37 AM
207 kB
Jayapriya Pai
query_result_without_filtering.png
2024/04/10 6:38 AM
385 kB
Jayapriya Pai
sample_scrape_limit.png
2024/04/10 6:38 AM
346 kB
Jayapriya Pai
sample_limit_exceeded.png
2024/04/10 6:40 AM
194 kB
Jayapriya Pai
alerting_tab.png
2024/04/10 6:41 AM
323 kB
Jayapriya Pai
user-alerts.png
2024/04/10 7:08 AM
323 kB
Jayapriya Pai
ApproachingEnforcedSamplesLimit.png
2024/04/10 7:08 AM
364 kB
Jayapriya Pai
ApproachingEnforcedSampleLimit-alertfiring.png
2024/04/10 7:27 AM
328 kB
Jayapriya Pai
user-alerts-firing.png
2024/04/10 7:27 AM
303 kB
Jayapriya Pai

is documented by

OBSDOCS-977 Document improved scrape sample alerts

Closed

relates to

OBSDA-394 Improve scrape sample alerts

In Progress

links to

openshift/cluster-monitoring-operator#2302: MON-3621: Enable `extra-scrape-metrics` feature in PrometheusUWM

Assignee:: Jayapriya Pai

Reporter:: Roger Florén

QA Contact:: Tai Gao

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/06/21 12:26 PM

Updated:: 2025/09/12 9:09 PM

Resolved:: 2024/04/29 2:05 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates