Loading...

XML

Word

Printable

Type: Spike
Resolution: Unresolved
Priority: Normal
Fix Version/s: ConsoleDot CY26Q2
Affects Version/s: None
Component/s: None
Labels:
- platform-accessmanagement

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Acceptance Criteria:

Hide

No more error logs like this:
Failed to produce messages to topic-partition TopicPartition(topic='platform.notifications.ingress', partition=2) with base offset -1 log start offset None and error KafkaTimeoutError: Batch for TopicPartition(topic='platform.notifications.ingress', partition=2) containing 44 record(s) expired: 30 seconds have passed since batch creation plus linger time.
Expired 4 batches in accumulator

Show
No more error logs like this: Failed to produce messages to topic-partition TopicPartition(topic='platform.notifications.ingress', partition=2) with base offset -1 log start offset None and error KafkaTimeoutError: Batch for TopicPartition(topic='platform.notifications.ingress', partition=2) containing 44 record(s) expired: 30 seconds have passed since batch creation plus linger time. Expired 4 batches in accumulator
BZ requires_doc_text:
Unset
Regression:
None
BZ Keywords:
- Unset
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

When there is a change in system roles, notifications would be sent out to all orgs. That would be a ton of messages. But in the log, there are something like this:

Failed to produce messages to topic-partition TopicPartition(topic='platform.notifications.ingress', partition=2) with base offset -1 log start offset None and error KafkaTimeoutError: Batch for TopicPartition(topic='platform.notifications.ingress', partition=2) containing 44 record(s) expired: 30 seconds have passed since batch creation plus linger time.
Expired 4 batches in accumulator

That's because producer.send() is asynchronous and batches aren't flushed, so they accumulate and expire after 30 seconds (default timeout)

So the messages are not actually sent out and they get lost. That explains why notifications team not get overwhelmed by us.

Might have to talk to notifications team if they can handle that much load
Check how many tenants we are sending notifications to (e.g., ready=True). Find a way to send to orgs that care about this? Maybe we can find orgs subscribed something?

causes

RHCLOUD-44952 [notifications] high cardinality issue with user_provider_get_users metrics

Closed

Assignee:: Unassigned

Reporter:: Jay Zeng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2026/02/02 8:18 PM

Updated:: 2026/02/10 4:07 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates