Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Notifications
Labels:
- platform-integrations

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Acceptance Criteria:
None
BZ requires_doc_text:
Unset
Regression:
None
BZ Keywords:
- Unset
Intelligence Requested:
Market:
PX Impact Score:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The user_provider_get_users metrics are including the org-id as a label. This is driving up the cardinality count for these metrics and causing issues for the prometheus / grafana servers. This triggered an app-interface/app-sre alert:

https://redhat-internal.slack.com/archives/CCRND57FW/p1770395587714289

The issue appears to have been triggered by an surge in messages from RBAC on 02/02/2026. This caused the number of user_provider_get_users_* metrics to be unique for each org. The count continued to climb unit it reached 50k which triggered an alert.

Had we redeployed this week, then the metric would have reset and we likely would not have triggered the alert and would have missed this issue.

This looks like where the metric is used: https://github.com/RedHatInsights/notifications-backend/blob/master/recipients-resolver/src/main/java/com/redhat/cloud/notifications/recipients/resolver/FetchUsersFromExternalServices.java#L185

Revert the change that dropped the problematic metrics:
https://gitlab.cee.redhat.com/service/app-interface/-/blob/master/resources/insights-prod/notifications-prod/service-monitor/notifications-recipients-resolver.servicemonitor.yml?ref_type=heads#L13-16

is caused by

RHCLOUD-44724 Fix the notifications for changes of system roles

Backlog

mentioned on

Merge request - Notifications: Upgrading prod

Assignee:: Guillaume Duval

Reporter:: Derek Horton

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2026/02/06 7:46 PM

Updated:: 2026/02/19 1:14 PM

Resolved:: 2026/02/10 4:07 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates