-
Enhancement
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
None
-
False
Feature request or enhancement
We have experienced very low performance using the current pulsar sink implementation. The root cause seems to be that it uses the synchronous send method and it waits for every single message to be delivered and written to pulsar before trying to send the next one.
{{}}
Which use case/requirement will be addressed by the proposed feature?
Producing high amount of change messages to a Pulsar topic can result in a significant increase in the `MilliSecondsBehindSource` metric due to the low throughput of the current synchronous message send implementation. Introducing asynchronous message sending could significantly improve message rate.
Implementation ideas (optional)
For much better message rate it is recommended to use sendAsync for the items in the batch and wait all of them to complete before marking the batch finished.
Introduced configuration option to optionally enable async behavior. Default is false.
Just add:
debezium.sink.pulsar.async=true
Real debezium performance measurements:
- async=false: sending messages one-by-one using send on my m1 macbook pro: ~600 msg/sec
- async=true: sending messages in batches using sendAsync on my m1 macbook pro: ~35000 msg/sec
Measured reate using pulsar-perf cli tool.
- links to
-
RHEA-2023:120698 Red Hat build of Debezium 2.3.4 release