Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-188

Allow a debezium mysql connector to filter production of DML events into kafka by the mysql UUID of the event

    XMLWordPrintable

Details

    • Feature Request
    • Resolution: Done
    • Major
    • 0.4.1
    • 0.4
    • mysql-connector
    • None

    Description

      Consider a master-master mysql setup with side A and B. We would like to connect a debezium instance to both sides:

      [ mysql A ] <---------> [ mysql B ]
            |                       |
            |                       |
            V                       V
      [ debezium A ]          [ debezium B ]
      

      A note on terminology: we will say a transaction was written to side A when side A is the side that originally executed the transaction. A transaction that is written to side A will eventually be replicated to side B and vice versa.

      In the setup described above, each transaction will be produced twice into kafka. If a transaction is written to mysql side A, debezium A will produce the transaction into kafka. When the same transaction is replicated to side B, debezium B will produce the transaction into kafka. Symmetric logic applies for transactions that are written to mysql side B.

      We would like to avoid this duplicate production of events into Kafka. Debezium currently has no way to deal with the above scenario.

      With GTIDs enabled, each transaction in the binlog contains a GTID event, which gives us access to the GTID of the transaction. The GTID has the following format: source_id:transaction_id, where source_id is the UUID of the mysql server the transaction was written to.

      I propose to allow a debezium instance to be configured with a UUID pattern to check against before producing DML events into Kafka. Debezium would produce a DML event into kafka if and only if the UUID in the event's GTID matches the pattern with which debezium was configured.

      In our master-master setup, debezium A would be configured with the UUID of mysql A, and debezium B would be configured with the UUID of mysql B. Thus, debezium A will only produce into kafka DML events that were written to side A. DML events that were replicated to side A from side B would be ignored by debezium A. Symmetric logic applies for DML events that were written to side B. We will no longer have duplicate events in kafka.

      Attachments

        Activity

          People

            dasl_jira David Leibovic (Inactive)
            dasl_jira David Leibovic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: