Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-4107

Incremental snapshot doesn't work without primary key

    XMLWordPrintable

Details

    Description

      I have two Postgres tables, each with lots more than 100k rows. One of them has a primary key, one of the doesn't have primary key but has a couple of unique indexes.

      I have set incremental.snapshot.chunk.size to 20000.

      Then I signal to start the incremental snapshot for both tables.

      INSERT INTO public.debezium_signal
      VALUES('123', 'execute-snapshot', '

      {"data-collections": ["public.my_table"]}

      ');

      I am expecting that in both cases eventually all the rows of the table are added as events to kafka topics. I assume that parameter incremental.snapshot.chunk.size should only affect the performance/memory consumption etc but eventually all the rows should be added as events to topics.

      What I am seeing that in the case with the table with the primary key, everything works as expected. But with the table with unique keys and no primary key it seems that only 20000 (incremental.snapshot.chunk.size) to rows are added as events.

       

      Logs for primary key table:

      [2021-10-04 19:59:56,210] INFO Incremental snapshot for table 'public.my_table' will end at position [530203] (io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource:243)

      [2021-10-04 20:00:44,476] INFO WorkerSourceTask{id=debezium-pg-connector-0} flushing 8806 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:510)

      2021-10-04 20:01:03,263] INFO No data returned by the query, incremental snapshotting of table 'public.my_table' finished (io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource:249)

       

      Logs for unique key table:

      [2021-10-04 20:21:09,729] INFO Incremental snapshot for table 'public.my_table' will end at position [null] (io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource:243)

      [2021-10-04 20:21:13,749] INFO No data returned by the query, incremental snapshotting of table 'public.my_table' finished (io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource:249)

       

      Seems to me that snapshotting should be able to work without primary keys, just using unique indexes. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            jsyrjala2 Juha Syrjälä
            Votes:
            4 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: