-
Enhancement
-
Resolution: Unresolved
-
Major
-
3.2.0.Final
-
None
Which use case/requirement will be addressed by the proposed feature?
The dynamic partition key assignment implemented as part of DBZ-9195 and released in v3.2.0-Final negates any partition routing performed by the PartitionRouting SMT.
Example
Using the following transforms config with sink.eventhubs.partitionid and sink.eventhubs.partitionkey unset
transforms: - config: partition.payload.fields: change.ParentID partition.topic.num: 20 predicate: child type: io.debezium.transforms.partitions.PartitionRouting - config: partition.payload.fields: change.ID partition.topic.num: 20 predicate: parent type: io.debezium.transforms.partitions.PartitionRouting
Prior to v3.2.0-Final, the SMT hashes the payload values and assigns a partition to each record. When those events are picked up by the EventHubsChangeConsumer, they're batched by the record partition ID and published to the Event Hub in up to 20 batches (total number of partitions).
Since v3.2.0-Final, the dynamic partition key assignment takes precedence over the assigned partition if the record key is not null.
Not only does this break the recommended configuration for complex partition routing described in the docs, but using a partition key at event production time to Event Hubs isn't optimal for use cases where there are often many small changes for different IDs in a short time frame. When using a partition key, this results in many small (often single event) batches being produced instead of all events being batched up into a max of 20 batches.
Implementation ideas
Add option to Azure Event Hubs sink config named, e.g. debezium.sink.eventhubs.dynamicpartitionrouting. The options would be:
- default - existing behaviour
- key - If record key is not null, apply key as batch partition key. Otherwise use round-robin
- partitionid - If record partition id is not null, produce events in batches to specific partition id. Otherwise use round-robin
The debezium.sink.eventhubs.dynamicpartitionrouting config would only apply in both of the following options are unset:
- debezium.sink.eventhubs.partitionid
- debezium.sink.eventhubs.partitionkey