-
Bug
-
Resolution: Done
-
Major
-
None
-
None
Bug report
Discussed in this issue here: https://debezium.zulipchat.com/#narrow/stream/350571-community-dbz-server/topic/Debezium.20dropping.20messages.20sent.20to.20Pub.2FSub
It appears that, when certain errors are encountered and Debezium Server is restarted several times in a row, the offset is advanced despite errors being received and these messages are then never sent to pub/sub.
In production, I have previously seen this happen when the UNAVAILABLE pub/sub error is encountered. In these cases, our data engineers restart the connector multiple times as Google lists this as a transient condition: https://cloud.google.com/pubsub/docs/reference/error-codes
What Debezium connector do you use and what version?
My tests were done using Debezium 2.6.2 but we have observed this behaviour across different versions
What is the connector configuration?
Config is attached
What is the captured database version and mode of deployment?
My test instance is using Mysql as source but we have also observed this issue when using the Oracle connector with Oracle 19C (all on-prem)
What behavior do you expect?
If a message fails to send to pub/sub, Debezium should not advance its low watermark past that batch to avoid messages being lost
What behavior do you see?
Messages are lost after multiple restarts of Debezium Server
Do you see the same behaviour using the latest released Debezium version?
Not yet tested.
Do you have the connector logs, ideally from start till finish?
Yes, see attached trace logs including all restarts
How to reproduce the issue
1) I create a topic in pub/sub called all_topics
2) In a local mysql instance, I have a procedure that inserts rows into a table every 10 seconds for 5 minutes (so 300 total rows)
3) I start up Debezium Server for the first time and let everything initialize, then I stop Debezium.
4) I start my procedure that inserts the 300 rows over 5 minutes
5) I change the config to point to a pub/sub topic that hasn't been created yet (so something I know will error)
6) When debezium server crashes due to the NOT_FOUND error, I restart Debezium Server several times (simulating an auto startup process)
7) I correct the config issue and start up Debezium again (about 2.5 minutes into the procedure running)
8) I check my pub/sub subscription and only 170 messages are shown