-
Spike
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
-
False
-
-
Unset
-
None
-
-
When there is a change in system roles, notifications would be sent out to all orgs. That would be a ton of messages. But in the log, there are something like this:
Failed to produce messages to topic-partition TopicPartition(topic='platform.notifications.ingress', partition=2) with base offset -1 log start offset None and error KafkaTimeoutError: Batch for TopicPartition(topic='platform.notifications.ingress', partition=2) containing 44 record(s) expired: 30 seconds have passed since batch creation plus linger time. Expired 4 batches in accumulator
That's because producer.send() is asynchronous and batches aren't flushed, so they accumulate and expire after 30 seconds (default timeout)
So the messages are not actually sent out and they get lost. That explains why notifications team not get overwhelmed by us.
Might have to talk to notifications team if they can handle that much load
Check how many tenants we are sending notifications to (e.g., ready=True). Find a way to send to orgs that care about this? Maybe we can find orgs subscribed something?
- causes
-
RHCLOUD-44952 [notifications] high cardinality issue with user_provider_get_users metrics
-
- Closed
-