Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-6834

Provide INSERT/DELETE semantics for incremental snapshot watermarking

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • 2.5.0.CR1
    • None
    • core-library
    • None
    • False
    • None
    • False

      What Debezium connector do you use and what version?

      Postgres, version 2.3.1.Final, running as a docker container

      What is the connector configuration?

      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "plugin.name": "pgoutput",
      "slot.name": "debezium_connector",
      "snapshot.mode": "initial",
      "schema.include.list": "my_schema",
      "topic.prefix": "cdc",
      "topic.transaction": "my_schema.transaction",
      "provide.transaction.metadata": True,
      "topic.creation.default.replication.factor": 3,
      "topic.creation.default.partitions": 3,
      "topic.creation.default.retention.ms": 2592000000,
      "topic.creation.default.retention.bytes": -1,
      "topic.creation.default.cleanup.policy": "delete",
      "topic.creation.default.compression.type": "zstd",
      "signal.data.collection": "my_schema.debezium_signal",
      "key.converter": "org.apache.kafka.connect.json.JsonConverter",
      "key.converter.replace.null.with.default": "false",
      "key.converter.schemas.enable": "false",
      "value.converter": "org.apache.kafka.connect.json.JsonConverter",
      "value.converter.replace.null.with.default": "false",
      "value.converter.schemas.enable": "false",
      "skip.messages.without.change": True,
      "decimal.handling.mode": "string",
      "errors.retry.timeout": 600_000,
      "snapshot.fetch.size": 1_000_000, 

      What is the captured database version and mode of depoyment?

      RDS running in AWS, postgres version 13.10

      What behaviour do you expect?

      Signaling table to contain only the signals that I create.

      What behaviour do you see?

      Millions of rows with types "snapshot-window-open" and "snapshot-window-close" are added to the table, at a rate of hundreds per second. My tables contain several billion rows and incremental snapshots generate a huge amount of waste in the signaling table.

      This is not a new issue, I found a previous discussion: https://groups.google.com/g/debezium/c/2vM9daemReE. There supposedly were some plans to switch from INSERT/INSERT to INSERT/DELETE but those plans got lost somehow?

      Do you see the same behaviour using the latest relesead Debezium version?

      Yes, 2.3.1.Final is the latest stable version. No relevant changes were announced in the 2.4 alpha release.

      Do you have the connector logs, ideally from start till finish?

      No

      How to reproduce the issue using our tutorial deployment?

      1. Prepopulate some tables with large amount of data.
      2. Set up signaling for incremental snapshots.
      3. Send a signal to initiate incremental snapshot.
      4. Observe the infinitely growing signaling table.

       

              rh-ee-mvitale Mario Fiore Vitale
              oleg.anashkin@gmail.com Oleg Anashkin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: