Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1091

Follow-up tasks for filter config change handling

    Details

    • Type: Task
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: mysql-connector
    • Labels:
      None

      Description

      This follows up on DBZ-175 (support for filter config changes for MySQL). These things still need to be done (we might extract separate issues as needed):

      • Add integration tests
      • Mockito tests fail with current JDKs
      • Ensure that no threads are leaked, no matter in which situation the connecor is shut down; there are different cases when this may happen currently:
        • the connector is still running the ReconcilingBinlogReader
        • when the "unified" binlog reader is running, its "upon completion" handler (which would have shut down the schema) is overwritten with the "upon completion" handler set in the constructor of ChainedReader (i.e. the call to readerCompletedPolling()
      • Revisit the "RESTART_" offset properties: are they truly needed or couldn't just the existing offsets be written for all emitted messages until the one unified binlog reader has been set up?
      • Eventually enable the support for filter config changes by default
      • ParallelSnapshotReader should be truly parallel, i.e. do the snapshot of "new" tables and log reading of "old" tables concurrently
      • Add a new option for "snapshot.new.tables", e.g. named "PARALLEL_SNAPSHOT"; its behavior would be like this:
        • When detecting a table list config change, the connector snapshots the newly whitelisted tables (at time T1) and continues log reading the previous ones (from the point where it left off before)
        • When the binlog reader reaches T1, it stops
        • When the snapshot is complete, a new log reader is set up that reads the binlog for all tables starting at T1
        • The motivation for this is to prevent the risk of an incorrectly ordered DB history topic (as discussed in the comments of DBZ-175) by avoiding parallel binlog reading. On the downside, changes to the "old" tables that occur after the snapshot of the "new" ones has begun, will only be emitted once that snapshot is done. But I think that's an acceptable trade-off for the sake of history consistency.
        • I also expect from this implementation that it avoids the issue of only reaching reconcilation stage after "some more" commits as it's the case currently; as soon as the snapshot of the "new" tables is done, there'll be just a single binlog reader. This will help to prevent any leaking threads when shutting down the connector (see above)
        • In terms of implementation, there'd be a varation of ParallelSnapshotReader that has a SnapshotReader for the "new" tables and a BinlogReader for the "old" tables. The latter receives a halting predicate that makes it stop as soon as it has reached the offset of the snapshot reader. In addition, there'd be a ChainedReader that comprises this parallel reader and a "unified" binlog reader

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  gunnar.morling Gunnar Morling
                • Votes:
                  3 Vote for this issue
                  Watchers:
                  8 Start watching this issue

                  Dates

                  • Created:
                    Updated: