-
Bug
-
Resolution: Done
-
Minor
-
None
-
None
Postgres replication slots contain the field confirmed_flushed_lsn, populated with the LSN flushed by the previous connection to that slot. In the absence of offset storage, this confirmed_flushed_lsn should be used by the PostgresConnector to more accurately resume a stream of changes.
Currently when offset storage is not present, the current PostgresConnector resumes from the LSN of the latest committed transaction. Depending on the Snapshot policy configured, this could either result in data loss, or unnecessary duplicated events. See startup logs below for data loss scenario:
INFO Postgres|test_server|postgres-connector-task Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLsn=44088400, catalogXmin=611] [io.debezium.connector.postgresql.connection.PostgresConnection] INFO Postgres|test_server|postgres-connector-task No previous offset found [io.debezium.connector.postgresql.PostgresConnectorTask] ... INFO Postgres|test_server|postgres-connector-task Creating initial offset context [io.debezium.connector.postgresql.PostgresSnapshotChangeEventSource] INFO Postgres|test_server|postgres-connector-task Read xlogStart at '0/2A0BE30' from transaction '622' [io.debezium.connector.postgresql.PostgresSnapshotChangeEventSource]
Enhancing the Postgres connector to resume from the confirmed_flushed_lsn would be very beneficial for user of the Debezium in embedded engine mode. It would negate the need to use persistent offset storage, as the connector would resume from the latest flushed LSN stored in the Replication Slot on the Postgres database server.
This PR is an example of the changes that may be necessary, and would result in the following changed startup behaviour
INFO Postgres|test_server|postgres-connector-task Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLsn=55603880, catalogXmin=667] [io.debezium.connector.postgresql.connection.PostgresConnection] INFO Postgres|test_server|postgres-connector-task No previous offset found [io.debezium.connector.postgresql.PostgresConnectorTask] INFO Postgres|test_server|postgres-connector-task Resuming offset context from Replication Slot's confirmed_flushed_lsn '55603880' [io.debezium.connector.postgresql.PostgresSnapshotChangeEventSource]