Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-9572

Support configuring how to apply dynamic partition routing in Azure Event Hubs sink

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: Major Major
    • 3.4.0.Alpha2
    • 3.2.0.Final
    • debezium-server
    • None
    • Important

      Which use case/requirement will be addressed by the proposed feature?

      The dynamic partition key assignment implemented as part of DBZ-9195 and released in v3.2.0-Final negates any partition routing performed by the PartitionRouting SMT.

      Example

      Using the following transforms config with sink.eventhubs.partitionid and sink.eventhubs.partitionkey unset

       

      transforms:
          - config:
              partition.payload.fields: change.ParentID
              partition.topic.num: 20
            predicate: child
            type: io.debezium.transforms.partitions.PartitionRouting
          - config:
              partition.payload.fields: change.ID
              partition.topic.num: 20
            predicate: parent
            type: io.debezium.transforms.partitions.PartitionRouting 

        
      Prior to v3.2.0-Final, the SMT hashes the payload values and assigns a partition to each record. When those events are picked up by the EventHubsChangeConsumer, they're batched by the record partition ID and published to the Event Hub in up to 20 batches (total number of partitions).

       

      Since v3.2.0-Final, the dynamic partition key assignment takes precedence over the assigned partition if the record key is not null.

      Not only does this break the recommended configuration for complex partition routing described in the docs, but using a partition key at event production time to Event Hubs isn't optimal for use cases where there are often many small changes for different IDs in a short time frame. When using a partition key, this results in many small (often single event) batches being produced instead of all events being batched up into a max of 20 batches.

      Implementation ideas

      Add option to Azure Event Hubs sink config named, e.g. debezium.sink.eventhubs.dynamicpartitionrouting. The options would be:

      • default - existing behaviour
      • key - If record key is not null, apply key as batch partition key. Otherwise use round-robin
      • partitionid - If record partition id is not null, produce events in batches to specific partition id. Otherwise use round-robin

      The debezium.sink.eventhubs.dynamicpartitionrouting config would only apply in both of the following options are unset:

      • debezium.sink.eventhubs.partitionid
      • debezium.sink.eventhubs.partitionkey

              Unassigned Unassigned
              callum.rigby Callum Rigby
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: