Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: prometheus-operator
Labels:
- groomed

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW

Sprint:
MON Sprint 246

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

In OCP 4.10, the PrometheusOperatorRejectedResources alert (along with other alerts related to Prometheus operator) has been extended to cover the openshift-user-workload-monitoring namespace.

The CCX team has seen that about 5% of 4.10 clusters have the alert firing for openshift-user-workload-monitoring. In practice it means that some of the user-defined pod/service monitors aren't valid (like invalid scrape interval values or references to missing secrets for scrape authentication).

Eventually we want the Prometheus operator to be more user-friendly and provide direct feedback to the users:

1. Do more with OpenAPI spec validations
2. Implement/configure validating webhooks for things that can't be modeled directly with OpenAPI.
3. Implement the status subresource for service/pod monitors.

But in the mean time, the alert description should be improved to include more details about the cause and how to mitigate the issue. In the same way, we need to add a runbook in github.com/openshift/runbooks and link it in the CMO alert.

[1] https://github.com/openshift/cluster-monitoring-operator/pull/1370

DoD

Improved the description & summary annotations of the upstream alerts.
Dedicated runbook in openshift/runbooks.
Everything pulled together in CMO.

is documented by

OBSDOCS-390 Edit content for new PrometheusOperatorRejectedResources runbook

Closed

relates to

OCPBUGS-36406 PrometheusOperatorRejectedResources should link its runbook

Closed

links to

openshift/ops-sop#2087: Create PrometheusOperatorRejectedResources.md

openshift/runbooks#126: [MON-2358] add alert PrometheusOperatorRejectedResources

Assignee:: Haoyu Sun

Reporter:: Simon Pasquier

Contributors:: Simon Pasquier

QA Contact:: Junqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2022/03/28 8:36 AM

Updated:: 2024/07/01 8:40 PM

Resolved:: 2023/12/20 8:37 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates