-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
-
False
In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.
Bug report
For bug reports, provide this information, please:
What Debezium connector do you use and what version?
3.0.8 Final
What is the connector configuration?
connector.class = io.debezium.connector.postgresql.PostgresConnector max.queue.size = 3 slot.name = slot1 record.processing.shutdown.timeout.ms = 1000 publication.name = publication1 signal.enabled.channels = in-process record.processing.order = ORDERED topic.prefix = topic1 offset.storage.file.filename = itc-3bcf4a43-c0be-4adf-8962-40adedf7449b.offsets record.processing.threads = errors.retry.delay.initial.ms = 300 value.converter = org.apache.kafka.connect.json.JsonConverter key.converter = org.apache.kafka.connect.json.JsonConverter publication.autocreate.mode = filtered database.user = test database.dbname = test offset.storage = org.apache.kafka.connect.storage.FileOffsetBackingStore offset.flush.timeout.ms = 5000 errors.retry.delay.max.ms = 10000 database.port = 32769 plugin.name = pgoutput offset.flush.interval.ms = 1000 internal.task.management.timeout.ms = 8000000 record.processing.with.serial.consumer = false errors.max.retries = -1 database.hostname = localhost database.password = ******** name = issue-test-connector table.include.list = public.table1 skipped.operations = none max.batch.size = 2 snapshot.mode = initial
What is the captured database version and mode of deployment?
(E.g. on-premises, with a specific cloud provider, etc.)
Postgres version 14
What behavior do you expect?
No data loss even in case connector restarts after failed ad-hoc blocking snapshot.
What behavior do you see?
The connector permanently loses portion of data that was inserted while it was offline.
Do you see the same behaviour using the latest released Debezium version?
Yes I have tested with ( 3.2.0.Final and 3.3.0.Alpha1)
Do you have the connector logs, ideally from start till finish?
Yes
How to reproduce the issue using our tutorial deployment?
Steps to reproduce the same
- Setup: Create two PostgreSQL tables (table1, table2) with table2 containing some "bad" records that will cause processing failures
- Initial Run: Start Debezium connector monitoring only table1, let it process initial data then stop it.
- Offline Data Insertion: While connector is stopped, bulk insert data into table1.
- Restart with Additional table: Restart connector monitoring both tables (table1, table2), trigger snapshot on table2 which will fail due to bad data.
- Final Restart: Restart connector again and insert some more data (to start streaming) and observe that portion of data that was inserted while the connector was offline is lost.
Here is a reproducible test case demonstrating the same that can be run independently Test Case
The issue appears to be related to offset handling when snapshot processing fails. When the connector restarts, it resumes from these incorrect / bad offsets, causing it to skip streaming events that occurred during the downtime.
above test also includes a potential workaround (not an actual fix) where it tries to clear the pending offset so that streaming can resume from where it left off.
On restart, the connector uses this kind of offset position as its starting point, resulting in data loss
{ "last_snapshot_record": false, "lsn": 3491303656, "txId": 14100, "ts_usec": 1747335133620186, "snapshot": "BLOCKING", "snapshot_completed": false }
Additional points
- The test case file also contains additional comments and observations about other potential issues and questions that, while not directly related to this bug, would be valuable if those can be addressed as well.
Do let me know if you need any additional information or clarification on this issue.
- relates to
-
DBZ-9410 Improve blocking snapshot test resilience
-
- Closed
-
- links to
-
RHEA-2025:154266 Red Hat build of Debezium 3.2.4 release