Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-8312

Debezium Postgres connector silent data loss on connector restart

XMLWordPrintable

    • False
    • None
    • False

      In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.

      Bug report

      For bug reports, provide this information, please:

      What Debezium connector do you use and what version?

      Debezium postgres v2.3.5.Final

      What is the connector configuration?

              "connector.class": "io.debezium.connector.postgresql.PostgresConnector",

              "snapshot.mode": "never",

              "plugin.name": "pgoutput",

              "producer.override.compression.type": "gzip",

              "producer.override.batch.size": "65536",

              "producer.override.max.request.size": "10485760",

              "producer.override.acks": "all",

              "max.queue.size": "81290",

              "max.batch.size": "20480",

              "heartbeat.interval.ms": "1000",

              "heartbeat.action.query": "<>;", 

              "tombstones.on.delete": "false",

              "decimal.handling.mode": "double",

              "time.precision.mode": "connect",

              "interval.handling.mode": "string",

              "topic.prefix": "<some-prefix>",

           

      What is the captured database version and mode of deployment?

      postgres-13, AWS RDS

      Background

      In order to boost confidence in debezium connector's data capture, we use a predictive method by monitoring DB operations on a specific table and verify alignment with the destination topic.

      We use a simple a heartbeat table

      CREATE TABLE audit_heartbeat ( 

      id BIGINT PRIMARY KEY,  ## primary key 

      ts_ms BIGINT NOT NULL,   ## – Timestamp in milliseconds

      );

       where we insert records at regular intervals (e.g., every 1s) and including it in the connector's table.include.list ensures these records are captured in Kafka. Monitoring their sequence helps detect gaps or data loss in the capture process.

      What behavior do you expect?

      On connector restart, Debezium should resume from the last processed LSN, ensuring no events are skipped to prevent data loss.

      It's acceptable for Debezium to reprocess events from the 'last processed LSN,' which may result in duplicates, but the primary focus must be on ensuring that no events are lost.

      What behavior do you see?

      On connector restart, we noticed that sequence 808024 was lost after restart 

      {"after":

      {"id":"808023","tsMs":"1728097285846"}

      ,"source":

      {"version":"2.3.5.Final","connector":"postgresql","name":"<redacted>","tsMs":"1728097285846","snapshot":"false","db":"<redacted>","schema":"public","table":"audit_heartbeat","txId":"3712864487","lsn":"354650320202872","xmin":"0"}

      ,"op":"c","tsMs":"1728097286246"}

      {"after":

      {"id":"808025","tsMs":"1728097287653"}

      ,"source":

      {"version":"2.3.5.Final","connector":"postgresql","name":"<redacted>","tsMs":"1728097287653","snapshot":"false","db":"<redacted>","schema":"public","table":"audit_heartbeat","txId":"3712864501","lsn":"354650320282016","xmin":"0"}

      ,"op":"c","tsMs":"1728097290303"}

      Do you see the same behaviour using the latest released Debezium version?

      (Ideally, also verify with latest Alpha/Beta/CR version)

      Did not test

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

      Offset topic entry

      {"transaction_id":null,"lsn_proc":354650320202872,"messageType":"INSERT","lsn_commit":354650320199344,"lsn":354650320202872,"txId":3712864487,"ts_usec":1728097285846965}

      Note that 

      • Lsn = 354650320202872 == 1428D/765AEC78 (sequence 808023)

      Connector Logs:

      [2024-10-05 03:01:30,047] INFO Starting streaming (io.debezium.pipeline.ChangeEventSourceCoordinator)
      [2024-10-05 03:01:30,047] INFO Retrieved latest position from stored offset 'LSN{1428D/765AEC78}' (io.debezium.connector.postgresql.PostgresStreamingChangeEventSource)
      [2024-10-05 03:01:30,048] INFO Looking for WAL restart position for last commit LSN 'LSN{1428D/765ADEB0}' and last change LSN 'LSN{1428D/765AEC78}' (io.debezium.connector.postgresql.connection.WalPositionLocator)
      [2024-10-05 03:01:30,048] INFO Initializing PgOutput logical decoder publication (io.debezium.connector.postgresql.connection.PostgresReplicationConnection)
      [2024-10-05 03:01:30,114] INFO Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLsn=LSN\\\{1428D/765ADEB0}, catalogXmin=3712864412] (io.debezium.connector.postgresql.connection.PostgresConnection)
      [2024-10-05 03:01:30,117] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection)
      [2024-10-05 03:01:30,119] INFO Seeking to LSN{1428D/765ADEB0} on the replication slot with command SELECT pg_replication_slot_advance(<redacted>, '1428D/765ADEB0') (io.debezium.connector.postgresql.connection.PostgresReplicationConnection)
      [2024-10-05 03:01:30,139] INFO Requested thread factory for connector PostgresConnector, id = <redacted> named = keep-alive (io.debezium.util.Threads)
      [2024-10-05 03:01:30,139] INFO Creating thread debezium-postgresconnector-<redacted>-keep-alive (io.debezium.util.Threads)
      [2024-10-05 03:01:30,180] INFO Searching for WAL resume position (io.debezium.connector.postgresql.PostgresStreamingChangeEventSource)
      [2024-10-05 03:01:30,182] INFO First LSN 'LSN{1428D/765AEC40}' received (io.debezium.connector.postgresql.connection.WalPositionLocator)
      [2024-10-05 03:01:30,194] INFO LSN after last stored change LSN 'LSN{1428D/765AED30}' received (io.debezium.connector.postgresql.connection.WalPositionLocator)
      [2024-10-05 03:01:30,194] INFO WAL resume position 'LSN{1428D/765AED30}' discovered (io.debezium.connector.postgresql.PostgresStreamingChangeEventSource)
      [2024-10-05 03:01:30,196] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection)
      [2024-10-05 03:01:30,198] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection)
      [2024-10-05 03:01:30,211] INFO Initializing PgOutput logical decoder publication (io.debezium.connector.postgresql.connection.PostgresReplicationConnection)
      [2024-10-05 03:01:30,213] INFO Seeking to LSN{1428D/765ADEB0} on the replication slot with command SELECT pg_replication_slot_advance(<redacted>, '1428D/765ADEB0') (io.debezium.connector.postgresql.connection.PostgresReplicationConnection)
      [2024-10-05 03:01:30,225] INFO Requested thread factory for connector PostgresConnector, id = <redacted> named = keep-alive (io.debezium.util.Threads)
      [2024-10-05 03:01:30,225] INFO Creating thread debezium-postgresconnector-<redacted>-keep-alive (io.debezium.util.Threads)
      [2024-10-05 03:01:30,225] INFO Processing messages (io.debezium.connector.postgresql.PostgresStreamingChangeEventSource)
      [2024-10-05 03:01:30,227] INFO Streaming requested from LSN LSN{1428D/765AEC78}, received LSN LSN{1428D/765AEC40} identified as already processed (io.debezium.connector.postgresql.connection.AbstractMessageDecoder)
      [2024-10-05 03:01:30,247] INFO Message with LSN 'LSN{1428D/765AED30}' arrived, switching off the filtering (io.debezium.connector.postgresql.connection.WalPositionLocator)

      How to reproduce the issue using our tutorial deployment?

      <Your answer>

      Feature request or enhancement

      For feature requests or enhancements, provide this information, please:

      Which use case/requirement will be addressed by the proposed feature?

      Prevent silent data loss

      Implementation ideas (optional)

      1. Introduce a connector config param to optionally disable `skipMessage` in WalPositionLocator
      2. Update resumeFromLsn in WalPositionLocator to ensure that it always exactly resumes from lastChangeLsn even if it means reprocessing previously processed lsn to prevent data loss

              Unassigned Unassigned
              ndhanpal19 Nitesh Dhanpal (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: