Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-8567

Incremental snapshot - wrong table name in offsets

    • False
    • None
    • False
    • Important

      Bug report

      What Debezium connector do you use and what version?

      DBZ SQL Server Connector 2.5.4

      What is the connector configuration?

      {     "name": "<CONNECTOR_NAME_PLACEHOLDER>",     "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",     "tasks.max": "1",     "topic.prefix": "<TOPIC_PREFIX_PLACEHOLDER>",     "key.converter": "org.apache.kafka.connect.json.JsonConverter",     "value.converter": "org.apache.kafka.connect.json.JsonConverter",     "database.encrypt": "false",     "database.user": "<DATABASE_USER_PLACEHOLDER>",     "database.names": "<DATABASE_NAME_PLACEHOLDER>",     "database.hostname": "<DATABASE_HOSTNAME_PLACEHOLDER>",     "database.password": "<DATABASE_PASSWORD_PLACEHOLDER>",     "table.include.list": "dbo.Process,dbo.Sensor",     "snapshot.mode": "schema_only",     "snapshot.isolation.mode": "snapshot",     "incremental.snapshot.chunk.size": "50000",     "signal.enabled.channels": "source",     "signal.data.collection": "<SIGNAL_DATA_COLLECTION_PLACEHOLDER>",     "signal.kafka.bootstrap.servers": "<KAFKA_BOOTSTRAP_SERVERS_PLACEHOLDER>",     "binary.handling.mode": "hex",     "transforms": "addTopicSuffix,unwrap",     "transforms.addTopicSuffix.type": "org.apache.kafka.connect.transforms.RegexRouter",     "transforms.addTopicSuffix.replacement": "$0.raw",     "transforms.addTopicSuffix.regex": ".*",     "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",     "transforms.unwrap.delete.tombstone.handling.mode": "rewrite",     "schema.history.internal.kafka.topic": "<SCHEMA_HISTORY_TOPIC_PLACEHOLDER>",     "schema.history.internal.kafka.bootstrap.servers": "<KAFKA_BOOTSTRAP_SERVERS_PLACEHOLDER>",     "schema.history.internal.producer.sasl.mechanism": "PLAIN",     "schema.history.internal.producer.security.protocol": "SASL_SSL",     "schema.history.internal.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"<KAFKA_USERNAME_PLACEHOLDER>\" password=\"<KAFKA_PASSWORD_PLACEHOLDER>\";",     "schema.history.internal.consumer.sasl.mechanism": "PLAIN",     "schema.history.internal.consumer.security.protocol": "SASL_SSL",     "schema.history.internal.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"<KAFKA_USERNAME_PLACEHOLDER>\" password=\"<KAFKA_PASSWORD_PLACEHOLDER>\";"      }

      What is the captured database version and mode of deployment?

      on-premise, MS SQL Server 2017/2019

      What behavior do you expect?

      When initiating an incremental snapshot (by inserting a signal in the database signal table) the connector should snapshot the specified table and write the offsets correctly in the Kafka Connect offset topic. Upon inserting another signal for a different table, the second table should be snapshotted, and its offsets should also be written correctly to the Kafka Connect offset topic.

      <I'll add image to show 2 consecutive signals and their offset events>

      What behavior do you see?

      When snapshotting multiple tables consecutively, the Kafka Connect offset topic records the correct (max/last) primary key but retains the table name of the first table, even when a different table is snapshotted.

      This is usually not a problem, but when the connector is reset, couple of problems can happen, since it tries to recover by reading the last message from the kafka connect offset topic - since topic has the correct primary key of the table it was reading, but the wrong table name:

      1. error related to the primary key mismatch - usually due to composite key being partially or incorrectly applied - examples of both are in "Snapshot errors" attachment:
        1. index out of bounds exception
        2. value not set for the parameter x
      2. If the primary keys of the tables are similar in structure, the connector might begin reading the first table from the point where the second table left off, instead of continue reading the second table.

       

      Do you see the same behaviour using the latest released Debezium version?

      havent tried

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

      <Will add later>

      Added examples (attached in image and txt format) of different exceptions occuring after connector has been restarted during incremental snapshot of a (second) table.

      How to reproduce the issue using our tutorial deployment?

      <Will check later>

              rh-ee-mvitale Mario Fiore Vitale
              jelenabole@gmail.com Jelena Bole (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: