Details
-
Task
-
Resolution: Unresolved
-
Major
-
1.7.0.CR1
-
None
Description
Restarting a Debezium connector with a schema history topic can take a long time, depending on the number of schema history records to be recovered. E.g. there have been reports of this taking about 40 min for 100K schema history records. This can drastically sped up by discarding all old records, which are not needed any longer for restarting the connector from it's latest persisted offset.
To address this concern, there should be a tool which copies the existing DB history topic to a topic with a given new name, only copying those entries which are needed as per the connector's current offset. Here is an example:
offset | DDL | notes
------------------------------------------------------
001 | CREATE TABLE test ... |
002 | ALTER TABLE test ADD COLUMN foo ... |
003 | ALTER TABLE test ADD COLUMN bar ... | connector gets stopped at this offset
004 | ALTER TABLE test ADD COLUMN baz ... |
When the connector gets stopped at offset 003, upon restarting it currently will have to parse the statements 001, 002, 003, so to build up the full table schema valid at 003. The idea is to "compact" this information, persisting the table state valid as of 003, i.e. including the "foo" and "bar" columns, which then can be read back from this single event. After such compaction, the history topic would look like this (note the JSON representation should be used instead of DDL statements, which are used here just for illustration purposes):
offset | DDL | notes ------------------------------------------------------ 003 | CREATE TABLE test (foo, bar) ... | 004 | ALTER TABLE test ADD COLUMN baz ... |
The tool would be used like this:
- Stop the connector
- Start the tool for copying the new, compacted history topic
- There should be an option which recreates the JSON-based representation ("tableChanges") by re-parsing the DDL field
- Point connector to the new DB history topic
- (Delete the old topic or keep it for analysis purposes)
- Re-start connector
Connector start-up times will be improved, as many old entries won't be processed again.
Attachments
Issue Links
- is related to
-
DBZ-1854 mysql history database topic growing too large causing reboot debezium take too much time.
- Closed