-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
None
-
None
-
None
-
False
-
None
-
False
-
No
-
---
-
---
We discovered whilst testing out the Canary 0.3.0 RC that there is an odd relation between the Sarama client config Consumer.MaxWaitTime and the 0.98 end-to-end (and producer) latencies reported by Prometheus.
This had been brought to light by https://github.com/Shopify/sarama/pull/2227 in Sarama which increased the default Consumer.MaxWaitTime from 250 to 500ms. With this configuration, the prometheus graphs jumped from ~250ms to ~500ms.
Thus far, we haven't been able to explain the reason why the two things are related. We did rule out the new Sarama client and the new Canary potential root causes.
This is impactful to RHOSAK as we wish to canary metrics for an internal SLI. Having latency numbers that are trust worthy are therefore vital.
The chart below demonstrates the problem. This particular chart back from comparing Strimzi Canary 0.2.0 and the 0.3.0 RC2 (a switch was made between the images at 13.00.), but later git bisection identified the source of the difference in behaviour as #2227.
We've worked around the problem in the canary with https://github.com/strimzi/strimzi-canary/pull/183 however we should establish root cause. This is the purpose of this defect report.
Examining the canary of a developer instance (single broker) and/or running the canary with the connection establishment check turned off might be useful approaches to narrow the problem space.
- is related to
-
MGDSTRM-8798 Canary incorrectly records producer/end-to-end latency owing to use of shared client (and thus shared connection) for publish/consumer
- Closed