Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-747

History compaction tool

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Major
    • 2.4-backlog
    • 1.7.0.CR1
    • mysql-connector
    • None

    Description

      Restarting a Debezium connector with a schema history topic can take a long time, depending on the number of schema history records to be recovered. E.g. there have been reports of this taking about 40 min for 100K schema history records. This can drastically sped up by discarding all old records, which are not needed any longer for restarting the connector from it's latest persisted offset.

      To address this concern, there should be a tool which copies the existing DB history topic to a topic with a given new name, only copying those entries which are needed as per the connector's current offset.   Here is an example:

       

      offset  | DDL                                  | notes
      ------------------------------------------------------
      001     | CREATE TABLE test ...                |
      002     | ALTER TABLE test ADD COLUMN foo ...  |
      003     | ALTER TABLE test ADD COLUMN bar ...  | connector gets stopped at this offset
      004     | ALTER TABLE test ADD COLUMN baz ...  |
      

       
      When the connector gets stopped at offset 003, upon restarting it currently will have to parse the statements 001, 002, 003, so to build up the full table schema valid at 003. The idea is to "compact" this information, persisting the table state valid as of 003, i.e. including the "foo" and "bar" columns, which then can be read back from this single event. After such compaction, the history topic would look like this (note the JSON representation should be used instead of DDL statements, which are used here just for illustration purposes):

       

      offset  | DDL                                  | notes
      ------------------------------------------------------
      003     | CREATE TABLE test (foo, bar) ...     |
      004     | ALTER TABLE test ADD COLUMN baz ...  |
      

      The tool would be used like this:

      • Stop the connector
      • Start the tool for copying the new, compacted history topic
        • There should be an option which recreates the JSON-based representation ("tableChanges") by re-parsing the DDL field
      • Point connector to the new DB history topic
      • (Delete the old topic or keep it for analysis purposes)
      • Re-start connector

      Connector start-up times will be improved, as many old entries won't be processed again.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gunnar.morling Gunnar Morling
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: