Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-8244

An aborted ad-hoc blocking snapshot leaves the connector in a broken state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 3.0.0.Final
    • None
    • core-library
    • None

      What Debezium connector do you use and what version?

      2.6.2.Final

      Enhancement

      If an ad-hoc blocking snapshot is aborted due to an error, e.g.

      [ocking-snapshot] .d.r.RelationalSnapshotChangeEventSource : Error during snapshot
      java.util.concurrent.ExecutionException: org.apache.kafka.connect.errors.ConnectException: Snapshotting of table perftest.actor_3 failed
      ...
      Caused by: org.postgresql.util.PSQLException: ERROR: relation "perftest.actor_3" does not exist
      ...
      2024-09-16T12:35:03.907Z  INFO 30 --- [ocking-snapshot] .d.p.s.AbstractSnapshotChangeEventSource : Snapshot - Final stage
      2024-09-16T12:35:03.908204587Z 2024-09-16T12:35:03.908Z  WARN 30 --- [ocking-snapshot] .d.p.s.AbstractSnapshotChangeEventSource : Snapshot was not completed successfully, it will be re-executed upon connector restart

      the connector is left in a state where it is alive but no longer snapshotting or streaming data. I would expect the error to either be fatal or the connector to resume streaming changes. 

      I believe streaming is never resumed due to being in an aborted state when this line is hit: https://github.com/debezium/debezium/blob/main/debezium-core/src/main/java/io/debezium/pipeline/ChangeEventSourceCoordinator.java#L252C18-L252C72

      Given the existence of https://issues.redhat.com/browse/DBZ-7903, there seems to be quite a few issues around the reuse of the initial snapshot logic. Perhaps a rework is in order.

            rh-ee-mvitale Mario Fiore Vitale
            peterhmatillion Peter Hamer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: