-
Bug
-
Resolution: Done
-
Major
-
None
-
False
-
None
-
False
-
No
-
---
-
---
-
MK - Sprint 221
WHAT
As discovered by MGDSTRM-8698 (https://github.com/strimzi/strimzi-canary/issues/188), the canary is incorrectly recording message latencies. The latency is being skewed up to the consumer max wait time.
This is happening as internally the canary is sharing a single client for both produce and consume sides, so there is a single connection to each broker. If the canary produces a message whilst a fetch response is still pending, the response to the produce cannot be heard until the fetch response completes.
WHY
This is impactful to RHOSAK as it uses end to end message latencies for service alerts and we also desire to expose a internal SLI for message latency.
HOW
Fix the canary to use a separate Sarama clients for producing and consuming. Ping strimzi team to make sure it's clear we're going to work on it in case someone already started on a fix.
- relates to
-
MGDSTRM-8698 Canary 0.98 percentile end-to-end and publish latencies appear to be a function of Sarama Consumer.MaxWaitTime
- Closed
- links to