Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1216

Debezium does not handle connector restart for large TX with GTID enabled databases

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • 0.9.3.Final
    • mysql-connector
    • None

      Let's suppose a command like DELETE FROM table is issued that removes larger number of rows.

      Binlog will contain a sequence of events

      • start GTID
      • update metadat table
      • delete rows
      • delete rows
        ...
      • COMMIT

      So multiple delter wors events are emitted for the single transaction.
      Every event and offset has a row field that identifies the position of the delete in the batch. This counter is reset for each batch so row is unique per batch but not per tranasaction.

      The last event emitted will contain GTID of the transaction but the offset will not as the GTID set in the offset is updated in commit offset.

      If no more events are emitted and connector is restarted then Debezium will replay the tx again - as it does not know it was completed.

      It will start from the first batch again and

      • skip a number of rows that are stored in offset, but the number of rows relate to the last batch
      • resend all remaining events from the first batch and all events from subsequent batches

      There are two problems then

      • Debezium generates unnecessary number of duplicates
      • The starting skip can be a larger number than number of events in the first batch available

              Unassigned Unassigned
              jpechane Jiri Pechanec
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: