Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-4959

Improve performance of DB history recovery

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Major
    • 2.4-backlog
    • None
    • core-library
    • None
    • False
    • None
    • False

    Description

      There have been repeated reports where historized connectors took very long to recover the schema history after a restart, taking hours in some cases. In general, this is caused by large histories with many thousands or even millions of entries. There's two angles for improvements:

      • Reduce the number of entries which need to be recovered;
        • The schema history compaction tool (DBZ-747) will help with that, pruning superfluous old entries which are not relevant any longer;
        • other means for reducing the number of entries to parse are the options for only tracking filter-included tables and exclude views (in that case, we could abort parsing early on, as soon as we're in any of the VIEW-specific listeners);
        • We also may add another field to schema history entries, designating their type ("table", view", "procedure", etc.), which would allow us to skip for instances stored procedure definitions which typically are long and thus costly to parse (and we don't care about them anyways); better yet, don't persist procedure definitions and other irrelevant DDL events in the history to begin with
      • Speed up parsing itself; here I'm not so sure what we can do:
        • Switch to JSON-based flavour by default, this should definitely help
        • Optimize the grammar (see e.g. this post)

      Update: Based on some preliminary testing, I noticed that using JSON during recovery will substantially speed up things. Note this requires to a) persist view definitions as JSON (if tracking views is enabled, so as to so support subsequent CREATE TABLE table_foo AS SELECT from view_bar... statements) and b) skip function/procedure definitions from the history, as suggested above. This combination should largely address the problem.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gunnar.morling Gunnar Morling
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: