Status: Closed (View Workflow)
Affects Version/s: 1.1.2.Final, 1.2.0.Final
Fix Version/s: 1.3.0.Alpha1
Steps to Reproduce:
- Run postgresql v1.1.2 connector
- Have some transactions which produce a lot of WAL logs
- Have a lot of small transactions at the same time
- Kill the connector while its processing such a large transaction
Git Pull Request:
thanks for all your work in debezium. It is truly an amazing tool. Thank you!
We are running debezium postgresql connector 1.1.2 against an RDS database. The other day we restarted kafka-connect and something strange happened. After the connector was restarted (after a unclean shutdown), it started to log these messages:
I am unsure why this happened, but the effects are bad:
- The replication slot is not advanced, therefore the postgres server keeps its WAL files (so a restart while catching up erases the progress)
- The connector is not producing new messages, therefore no changes arrive at downstream systems
- Catching up is extremely slow, therefore sometimes the gap increased
While doing this the connector was not producing production relevant data, therefore we were replacing it and where creating a new snapshot
After reading through the source code I think I was able to pinpoint the problem to this method:
If I reading this correctly, whenever messages are skipped this is considering as a "no pending messages" and therefore is not progressing the replication slot and actually waiting the poll interval for new messages.
I think the method should look like this:
(Patch is attached)
Thank you for considering and reading this.