-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Refine the concept of streaming units and machine types for Enterprise Kafka
-
False
-
None
-
False
-
No
-
To Do
-
MGDSRVS-145 - RHOSAK Enterprise Plan: RHOSAK on customer-owned OSD/ROSA/ARO
-
---
-
---
A streaming unit may not be as applicable to kafka enterprise. We have a couple of directions:
- like confluent, still have the concept of a streaming unit. Note from their docs https://docs.confluent.io/cloud/current/clusters/cluster-types.html#dimensions-with-a-recommended-guideline that most of our service limits are categorized as guidelines, and not hard limits - it's not clear if they are enforced via quotas. Most of their hard limits are measures we are not enforcing. This is more complicated if the machine type is allowed to vary. We'll have to have pre-determine performance characteristics for a range of possible instance types and expected storage classes - otherwise we'd have to run something like the instance profiler - to capture this after the cluster is created. This gets even harder with gcp supporting custom machine sizes. Even if we get past that, the mapping of streaming unit to machine type gets problematic - if the customer is using m5.4xlarge for example and we don't allow for alternative sizing / topologies, then we'll end up supporting 1 streaming unit over 3 nodes with half of those resources wasted. If we do allow for the sizings to change for m5.4xlarge, but stay 1 per node, then we should only support even multiples of streaming units.
- like aws, provide the ability to create kafka instances on any given machine type, that can be scaled horizontally or vertically as needed - https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html#brokers-per-cluster https://docs.aws.amazon.com/msk/latest/developerguide/msk-update-broker-type.html and even storage auto-scaling https://docs.aws.amazon.com/msk/latest/developerguide/msk-autoexpand.html - there's really no need for the concept of a streaming unit here, you are directly, or ideally through automated paths, right sizing the cluster.
Related to this - for streaming units the presumption is that we're "fully utilizing" the nodes. However that's not quite correct. By logically collocating the non-broker pods we're adding quite a bit of padding, which gets proportionally worse for larger kafka instances. We may eventually change the topology or even expect the non-broker pods to run on a different machine pool (of the customer's general cluster).
- clones
-
MGDSTRM-9649 Verify ROSA capacity needed to host new {x} Streaming Units OpenShift Streams instance
- Closed