Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-2550

Catch up streaming before snapshot may duplicate messages upon resuming streaming

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Minor Minor
    • 1.3.0.CR1
    • 1.3.0.Beta2
    • postgresql-connector
    • None
    • False
    • False
    • Undefined
    • Hide

      1. Create a custom snapshot SPI where 

      shouldStreamEventsStartingFromSnapshot returns false.

      2. Start the connector and stream some data.

      3. Stop the connector.

      4. Write some changes to the db.

      5. Restart the connector.

      6. If the catch up streaming phase does not take a long time, it is likely that the normal streaming phase will produce some duplicated data.

      Show
      1. Create a custom snapshot SPI where  shouldStreamEventsStartingFromSnapshot returns false. 2. Start the connector and stream some data. 3. Stop the connector. 4. Write some changes to the db. 5. Restart the connector. 6. If the catch up streaming phase does not take a long time, it is likely that the normal streaming phase will produce some duplicated data.

      When using a Postgres connector custom snapshot SPI that performs catch up streaming, duplicated message may be sent during the non-catch up streaming phase.

      Normally when a connector gracefully shuts down, the connect framework attempts to commit offsets so the latest committed state gets acked on the replication stream.  While the connector is running, the framework periodically commits offsets. Debezium does not manage triggering an offset commit.  When the catch up streaming phase ends, there may be uncommitted state and the connector is unable to determine when the next commit will occur because the commit timing is externally managed. If a commit is not triggered between the end of the catch up streaming phase and the normal streaming phase after the snapshot, the connector may produce some duplicated messages. 

      Although the replication stream may be out of date, the internal OffsetContext is aware of the latest committed offset. When the snapshot phase recreates a new offset after catch up streaming, the previous offset has access to the latest state. 

              Unassigned Unassigned
              grant.cooksey Grant Cooksey (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: