Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-8704

Duplicate events from the snapshot and streaming phases

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Moderate

      Bug report

      What Debezium connector do you use and what version?

      SQL Server source connector, managed by Confluent Cloud.

      What is the connector configuration?

       

      {
        "name": "Migrations_20250219094933011",
        "connector.class": "SqlServerCdcSourceV2",
        "kafka.auth.mode": "KAFKA_API_KEY",
        "kafka.api.key": "<key>",
        "kafka.api.secret": "<secret>",
        "database.hostname": "mydb.database.windows.net",
        "database.port": "1433",
        "database.user": "<db-user>",
        "database.password": "<db-password>",
        "database.names": "db",
        "snapshot.isolation.mode": "snapshot",
        "table.include.list": "dbo.Migrations_20250219094933011",
        "tasks.max": "1",
        "topic.prefix": "migrations",
        "decimal.handling.mode": "double",
        "time.precision.mode": "connect",
        "max.batch.size": "1",  
        "output.data.format": "AVRO",
        "output.key.format": "AVRO",
        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "https://psrc-6k76o2.germanywestcentral.azure.confluent.cloud",
        "key.converter.basic.auth.credentials.source": "USER_INFO",
        "key.converter.basic.auth.user.info": "<key>:<secret>",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "https://psrc-6k76o2.germanywestcentral.azure.confluent.cloud",
        "value.converter.basic.auth.credentials.source": "USER_INFO",
        "value.converter.basic.auth.user.info": "<key>:<secret>"
      } 

       

      What is the captured database version and mode of deployment?

      Azure SQL Database (Hyperscale).

      What behavior do you expect?

      No duplicates published by Debezium.

      What behavior do you see?

      I am running the following experiment and see duplicate messages pretty often (in ~20% of runs):

      1. Create a table (Id int, BitColumn bit)
      2. Enable CDC on the table
      3. Create a topic
      4. Create a SQL Server Debezium connector (using REST API)
      5. Start a background topic consumer task
      6. Insert 10 rows into the table (IDs 1-10)
      7. Wait for the first message consumed from the topic
      8. Drop the BitColumn column from the table
      9. Insert 10 rows into the table (IDs 11-20)

      At the end, I compare what I've written to the table with what the topic consumer has read.

      Pretty often, I see 30 messages sent to the topic (though only 20 rows were inserted to the table).

      In those instances where 30 messages are published, the messages for rows 1-10 are sent to the topic twice, with different offsets.

      I tried

      • setting max.batch.size to 1
      • setting exactly.once.support to required
      • setting streaming.delay.ms to 5000

      But neither helped. The only change that seems to get rid of the duplicates is to wait for the connector to get into the RUNNING state after creating it in step 4 above.

      I understand Debezium guarantees at-least-once delivery and duplicates may happen, but I'd expect them to happen in face of connection drops or restarts, which, judging by the frequence I am observing, is not the explanation here.

      Attached, there are

      • The messages published by Debezium.
      • Content of the CDC change table.

      Do you see the same behaviour using the latest released Debezium version?

      Did not verify.

      Do you have the connector logs, ideally from start till finish?

      No.

      How to reproduce the issue using our tutorial deployment?

      I don't know.

              Unassigned Unassigned
              standak Standa K (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: