Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-4903

Allow adhoc snapshot on tables whose schemas have not been captured

    • Icon: Feature Request Feature Request
    • Resolution: Done
    • Icon: Minor Minor
    • 3.0.5.Final
    • None
    • core-library
    • None

      Currently, ad hoc snapshots work well for existing tables, however, when adding a new table for which Debezium has not yet captured a schema, a warning is logged by isTableInvalid in the AbstractIncrementalSnapshotChangeEventSource class.

      LOGGER.warn("Schema not found for table '{}', known tables {}", currentTableId, databaseSchema.tableIds());
      

      To avoid this, we either need to wait for records to be ingested from this table before running the snapshot or perform a schema_only_recovery to refresh all of the captured schemas.

      It would be useful to have an option in the signal to force schema creation so that ad hoc snapshots can be done on these tables. Suggested structure below:

      INSERT INTO debezium.debezium_signal VALUES('key-1', 'execute-snapshot', '

      {"data-collections": ["TABLE-1"],"create-schema": "true"}

      ');

      I've experimented with adding a createSchemaForTable method in AbstractIncrementalSnapshotChangeEventSource (snippet attached). It's a bit hacked together as I don't fully understand how everything fits together.

      In the attached I've tried running dispatcher.dispatchSchemaChangeEvent for each invalid table but at this level it doesn't look like I can grab the DDL and so the schema that gets stored is missing the create table statement (bottom two entries on the file_database_history file I attached contains synthetic schema events I generated), so I'm guessing this approach isn't going to work (note: I'm testing using the Oracle connector)?

            [DBZ-4903] Allow adhoc snapshot on tables whose schemas have not been captured

            Released

            Debezium Builder added a comment - Released

            I think it's a good idea in general and we should support this. Could you perhaps create a (Draft) PR with your change, so we can discuss it in context on GitHub? Instead of adding a new command, couldn't we simply (TM) capture the schema when it's required when running the ad-hoc snapshot?

            Gunnar Morling added a comment - I think it's a good idea in general and we should support this. Could you perhaps create a (Draft) PR with your change, so we can discuss it in context on GitHub? Instead of adding a new command, couldn't we simply (TM) capture the schema when it's required when running the ad-hoc snapshot?

              Unassigned Unassigned
              nathan-smit-1 Nathan Smit
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: