Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-210

MySQL snapshotter is not guaranteed to give a consistent snapshot

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 0.6
    • None
    • mysql-connector
    • None

      Currently, the first few steps of the mysql snapshotter are:

      1. START TRANSACTION WITH CONSISTENT SNAPSHOT https://github.com/debezium/debezium/blob/06a25f8fd9b8d1f5580dfddf1d79e078ff7b6ce5/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/SnapshotReader.java#L182
      2. FLUSH TABLES WITH READ LOCK https://github.com/debezium/debezium/blob/06a25f8fd9b8d1f5580dfddf1d79e078ff7b6ce5/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/SnapshotReader.java#L195
      3. SHOW MASTER STATUS https://github.com/debezium/debezium/blob/06a25f8fd9b8d1f5580dfddf1d79e078ff7b6ce5/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/SnapshotReader.java#L580

      This snapshot is supposed to be consistent with respect to the binlog coordinates obtained in step 3. But if some transactions are executed between steps 1 and 2, the snapshot will not contain their data. Furthermore, when the binlog reader begins reading the binlog after the snapshot finishes, it will start reading the binlog in a position after those transactions. Thus, the data from these transactions will be forever missing.

      I believe the solution here is to simple swap the order of steps 1 and 2. In particular, we should be doing:

      1. FLUSH TABLES WITH READ LOCK
      2. START TRANSACTION WITH CONSISTENT SNAPSHOT
      3. SHOW MASTER STATUS
      4. continue existing steps...

      Since FLUSH TABLES WITH READ LOCK blocks all updates to the tables, it guarantees that the binlog coordinates for which the transaction is consistent with respect to are the same ones that are returned to us by SHOW MASTER STATUS.

      We can see that this is the same strategy employed by the mysqldump tool when the options -single-transaction and -master-data are used (which is what we would want for a consistent snapshot with binlog coordinates):

      1. FLUSH TABLES WITH READ LOCK https://github.com/mysql/mysql-server/blob/9930e2f54ae48279c1f1d8b0d386a36f10aa2398/client/mysqldump.c#L6024
      2. START TRANSACTION WITH CONSISTENT SNAPSHOT https://github.com/mysql/mysql-server/blob/9930e2f54ae48279c1f1d8b0d386a36f10aa2398/client/mysqldump.c#L6055
      3. SHOW MASTER STATUS https://github.com/mysql/mysql-server/blob/9930e2f54ae48279c1f1d8b0d386a36f10aa2398/client/mysqldump.c#L6068

              jpechane Jiri Pechanec
              dasl_jira David Leibovic (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: