-
Bug
-
Resolution: Done
-
Major
-
0.9.5.Final, 1.0.0.Final, 1.1.0.Alpha1
-
None
We've run into an issue where we get lost events when replicating from Postgres to Kafka while Kafka Connect restarts. This happens because a `commit()` call gets made before the events from `poll()` are actually committed to Kafka. When Kafka Connect gets terminated before some of the events actually are produced, on restart it picks up from the committed offset. This leads to data loss.
The underlying issue of these callbacks being called out of order is an open issue against Kafka Connect: https://issues.apache.org/jira/browse/KAFKA-5716
Here's a repo with steps to reproduce the issue: https://github.com/mmarvick-convoy/debezium-restart-issue
Note that in the repro steps, we use a custom build of Debezium with some additional logging during the commit and poll callbacks. You can edit the Kafka Connect dockerfile to use the latest stable version of Debezium 1.0 without the custom build, and you'll still be able to reproduce. We've been running on Debezium 0.9.5 and can still reproduce, so it's not a recent regression. Because of the underlying issue in Kafka Connect, this has probably been an issue from the beginning.