Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: ACM 2.13.0
Component/s: Observability
Labels:

Activity Type:
Quality / Stability / Reliability
Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Acceptance Criteria:
Hide

Provide the required acceptance criteria using this template.

...
Show
Provide the required acceptance criteria using this template. ...
Intelligence Requested:
Market:

Sprint:
Observability Sprint 39, Observability Sprint 40
Severity:
Important

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Context

Many clients are handling the cluster-monitoring-config through gitops, applying conflicting updates with what the endpoint-operator adds. This triggers conflicting reconcile loops on both ends. On every configuration change, prometheus is restarted, disrupting the monitoring and alerting.

What

Define a way to surface these reconcile loops to the user.

Possible solutions:

Add a platform serviceMonitor for the endpoint-monitor with alert rule to alert on this case. This is possibly the best solution but might not work if Prometheus is constantly restarting without being able to scrape metrics.
Detect these loops from inside the operator and degrade the addon state with relevant message

Acceptance criteria:

Implement solution that detect these loops from inside the operator and degrade the addon state with relevant message
Write troubleshooting steps for what to do when in this situation (ACM docs or KCS)

Assignee:: Thibault Mange

Reporter:: Thibault Mange

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/02/07 10:28 AM

Updated:: 2025/09/08 12:19 PM

Resolved:: 2025/04/15 11:15 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates