-
Story
-
Resolution: Done
-
Minor
-
None
-
None
-
8
-
False
-
None
-
False
-
-
-
NetObserv - Sprint 250, NetObserv - Sprint 251, NetObserv - Sprint 252, NetObserv - Sprint 253, NetObserv - Sprint 254, NetObserv - Sprint 255
Our recommendation table here [1] shows configuration without Kafka on 10-nodes clusters and with Kafka on 25-nodes and above. Although there are various reasons to recommend using Kafka anyway, it would be good to have numbers to back this recommendation in terms of performance and resource consumption.
So, we should run some scale-test jobs for the mentioned cluster sizes (10, 25, 65 and 120 nodes), each with and without Kafka. This gives us the following runs / configs:
- 10 nodes, no Kafka
- 10 nodes, kafka, 6 replicas, 12 partitions
- 25 nodes, no kafka
- 25 nodes, kafka, 12 replicas, 24 partitions*
- 25 nodes, kafka, 24 replicas, 48 partitions
- 65 nodes, no kafka
- 65 nodes, kafka, 24 replicas, 48 partitions
- 120 nodes, no kafka
- 120 nodes, kafka, 24 replicas, 48 partitions
*: currently we recommend 24 replicas on 25-nodes cluster .. which perhaps is too much (almost 1 per node) ; I'd just like to verify if it's beneficial / how it compares with just 12 replicas for instance.
PS: I don't know which of the test script makes more sense to provide a realistic workload (ingress-perf? node-density? cluster-density?) - we need to get traffic distributed among a variety of different workloads (ie. involving different deployments), and I think cluster-density does that, but perhaps the others as well.
Goal: Depending on our finding, we may adapt our recommendation doc, and/or provide some precision, such as that we find xx% to yy% additional resource usage when using some mode, so that the users can make a more informed choice.