Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-9454

Incremental Snapshot: database name required in data collection name

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Moderate

      Hello. I'm using debezium 3.1.2.Final and running it locally in embedded mode. For some tables in order to make snapshot work it's required to type database name in addition to schema and table name itself. For example:

      {
          "data-collections": [
              "my_database.public.tobjectstorage",
              "my_database.public.tlinkstorage"
          ],
          "type": "incremental",
          "additional-conditions": [
              {
                  "data-collection": "my_database.public.tobjectstorage",
                  "filter": ""
              },
              {
                  "data-collection": "my_database.public.tlinkstorage",
                  "filter": ""
              }
          ]
      }
       

      The reason is difference between how table schema is obtained: if table schema is obtained via snapshot process during the initial snapshot then it is required to specify database name explicitly; if table schema is obtained via streaming process (for example, after restart of the app which previously made a snapshot and offsets file is present now), there's no need to specify database name. 

      Logs in case where snapshot has been made before streaming has started (initial snapshot after the launch of the app) and later snapshot request has been sent without the database name:

      2025-09-04T15:42:26.064+03:00  INFO 31404 --- [rce-coordinator] ractIncrementalSnapshotChangeEventSource : Schema not found for table 'public.tobjectstorage', known tables [my_database.public.debezium_signal, my_database.public.tobjectstorage, my_database.public.tlinkstorage, public.debezium_signal]. Will attempt to retrieve this schema
      2025-09-04T15:42:26.065+03:00  WARN 31404 --- [rce-coordinator] ractIncrementalSnapshotChangeEventSource : Failed to retrieve schema for public.sometableio.debezium.DebeziumException: Failed to populate table with schema for public.tobjectstorage
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.readSchemaForTable(AbstractIncrementalSnapshotChangeEventSource.java:767) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.retrieveAndRefreshSchema(AbstractIncrementalSnapshotChangeEventSource.java:394) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.isTableInvalid(AbstractIncrementalSnapshotChangeEventSource.java:366) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.readChunk(AbstractIncrementalSnapshotChangeEventSource.java:267) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.addDataCollectionNamesToSnapshot(AbstractIncrementalSnapshotChangeEventSource.java:491) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
          at io.debezium.pipeline.signal.actions.snapshotting.ExecuteSnapshot.arrived(ExecuteSnapshot.java:78) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final] 

      Logs in case where no initial snapshot happened after the launch (offsets file exist) and a single change has been made to tobjectstorage table. This table is now present in known tables array and another table tlinkstorage still not because not changes were made to it:

      2025-09-05T16:28:51.574+03:00  INFO 54140 --- [rce-coordinator] ractIncrementalSnapshotChangeEventSource : Schema not found for table 'public.tlinkstorage', known tables [my_database.public.debezium_signal, public.tobjectstorage, my_database.public.tlinkstorage, my_database.public.tobjectstorage, public.debezium_signal]. Will attempt to retrieve this schema
      2025-09-05T16:28:51.575+03:00  WARN 54140 --- [rce-coordinator] ractIncrementalSnapshotChangeEventSource : Failed to retrieve schema for public.tlinkstorageio.debezium.DebeziumException: Failed to populate table with schema for public.tlinkstorage
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.readSchemaForTable(AbstractIncrementalSnapshotChangeEventSource.java:767) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.retrieveAndRefreshSchema(AbstractIncrementalSnapshotChangeEventSource.java:394) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.isTableInvalid(AbstractIncrementalSnapshotChangeEventSource.java:366) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
          at io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource.readChunk(AbstractIncrementalSnapshotChangeEventSource.java:267) ~[debezium-core-3.1.2.Final.jar:3.1.2.Final]
       

      After that I made a single change to tlinkstorage table and since then no exceptions were seen.

      What Debezium connector do you use and what version?

      Debezium PostgreSQL Connector 3.1.2.Final in embedded mode

      What is the connector configuration?

      {
          "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
          "transforms.replicated_from.static.field": "replicated_from",
          "slot.name": "mySlot",
          "publication.name": "myPub",
          "transforms": "replicated_from",
          "provide.transaction.metadata": "true",
          "transforms.replicated_from.type": "org.apache.kafka.connect.transforms.InsertField$Value",
          "tombstones.on.delete": "false",
          "topic.prefix": "prefix",
          "offset.storage.file.filename": "/kafka/offsets_tobjectstorage_tlinkstorage.dat",
          "signal.data.collection": "public.debezium_signal",
          "copy.existing": "false",
          "transforms.replicated_from.static.value": "a3698112c0a8017143e78ed6204ddf2c",
          "publication.autocreate.mode": "all_tables",
          "database.user": "***",
          "database.dbname": "my_database",
          "offset.storage": "org.apache.kafka.connect.storage.FileOffsetBackingStore",
          "database.port": "5432",
          "plugin.name": "pgoutput",
          "offset.flush.interval.ms": "0",
          "database.hostname": "***",
          "database.password": "***",
          "name": "sc_tobjectstorage_tlinkstorage",
          "snapshot.mode": "initial"
      } 

      What is the captured database version and mode of deployment?

      On-premise PostgreSQL 15

      What behavior do you expect?

      I expect consistent in both cases: when schema obtained via snapshot or via streaming process. Or at least the need to specify database name should be reflected in the docs.

      Do you see the same behaviour using the latest released Debezium version?

      Unfortunately I can not test it with the latest version

        1. logs-no-offsets.txt
          26 kB
          Distant Blue
        2. logs-with-offsets.txt
          21 kB
          Distant Blue
        3. image-2025-09-15-17-49-48-361.png
          81 kB
          Distant Blue

              Unassigned Unassigned
              distantblue Distant Blue
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: