Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-6834

Provide INSERT/DELETE semantics for incremental snapshot watermarking

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • 2.5.0.CR1
    • None
    • core-library
    • None
    • False
    • None
    • False

      What Debezium connector do you use and what version?

      Postgres, version 2.3.1.Final, running as a docker container

      What is the connector configuration?

      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "plugin.name": "pgoutput",
      "slot.name": "debezium_connector",
      "snapshot.mode": "initial",
      "schema.include.list": "my_schema",
      "topic.prefix": "cdc",
      "topic.transaction": "my_schema.transaction",
      "provide.transaction.metadata": True,
      "topic.creation.default.replication.factor": 3,
      "topic.creation.default.partitions": 3,
      "topic.creation.default.retention.ms": 2592000000,
      "topic.creation.default.retention.bytes": -1,
      "topic.creation.default.cleanup.policy": "delete",
      "topic.creation.default.compression.type": "zstd",
      "signal.data.collection": "my_schema.debezium_signal",
      "key.converter": "org.apache.kafka.connect.json.JsonConverter",
      "key.converter.replace.null.with.default": "false",
      "key.converter.schemas.enable": "false",
      "value.converter": "org.apache.kafka.connect.json.JsonConverter",
      "value.converter.replace.null.with.default": "false",
      "value.converter.schemas.enable": "false",
      "skip.messages.without.change": True,
      "decimal.handling.mode": "string",
      "errors.retry.timeout": 600_000,
      "snapshot.fetch.size": 1_000_000, 

      What is the captured database version and mode of depoyment?

      RDS running in AWS, postgres version 13.10

      What behaviour do you expect?

      Signaling table to contain only the signals that I create.

      What behaviour do you see?

      Millions of rows with types "snapshot-window-open" and "snapshot-window-close" are added to the table, at a rate of hundreds per second. My tables contain several billion rows and incremental snapshots generate a huge amount of waste in the signaling table.

      This is not a new issue, I found a previous discussion: https://groups.google.com/g/debezium/c/2vM9daemReE. There supposedly were some plans to switch from INSERT/INSERT to INSERT/DELETE but those plans got lost somehow?

      Do you see the same behaviour using the latest relesead Debezium version?

      Yes, 2.3.1.Final is the latest stable version. No relevant changes were announced in the 2.4 alpha release.

      Do you have the connector logs, ideally from start till finish?

      No

      How to reproduce the issue using our tutorial deployment?

      1. Prepopulate some tables with large amount of data.
      2. Set up signaling for incremental snapshots.
      3. Send a signal to initiate incremental snapshot.
      4. Observe the infinitely growing signaling table.

       

            [DBZ-6834] Provide INSERT/DELETE semantics for incremental snapshot watermarking

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Red Hat build of Debezium 2.5.4 release), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHEA-2024:1726

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Red Hat build of Debezium 2.5.4 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2024:1726

            Released

            Debezium Builder added a comment - Released

            The user just need to have DELETE privilege in addition to INSERT privilege for the signalling table

            Yes, just want to be sure it is fine.

            Mario Fiore Vitale added a comment - The user just need to have DELETE privilege in addition to INSERT privilege for the signalling table Yes, just want to be sure it is fine.

            rh-ee-mvitale I am not sure I follow the question/problem. The user just need to have DELETE privilege in addition to INSERT privilege for the signalling table. Did I miss anything?

            Jiri Pechanec added a comment - rh-ee-mvitale I am not sure I follow the question/problem. The user just need to have DELETE privilege in addition to INSERT privilege for the signalling table. Did I miss anything?

            jpechane with INSERT/DELETE semantics we need to use the connection to execute the delete. What about the user permissions?

            Mario Fiore Vitale added a comment - jpechane with INSERT/DELETE semantics we need to use the connection to execute the delete. What about the user permissions?

            anmohant It's not deleted, there is an extra . at the end which should not be there. Here is the link: https://groups.google.com/g/debezium/c/2vM9daemReE

            Oleg Anashkin (Inactive) added a comment - anmohant It's not deleted, there is an extra . at the end which should not be there. Here is the link: https://groups.google.com/g/debezium/c/2vM9daemReE

            Anisha Mohanty added a comment - - edited

            oleg.anashkin@gmail.com Hi, the link to the google groups discussion seems to be deleted, could share the correct discussion link again? Thanks.

            Anisha Mohanty added a comment - - edited oleg.anashkin@gmail.com Hi, the link to the google groups discussion seems to be deleted, could share the correct discussion link again? Thanks.

            It should be possible o switch between INSERT/INSERT semantics for open/close window signals we use now to INSERT/DELETE semantics that would reduce the pressure on the signalling table.
            Both modes should be kept available so auditing/trackign could be kept.

            Jiri Pechanec added a comment - It should be possible o switch between INSERT/INSERT semantics for open/close window signals we use now to INSERT/DELETE semantics that would reduce the pressure on the signalling table. Both modes should be kept available so auditing/trackign could be kept.

              rh-ee-mvitale Mario Fiore Vitale
              oleg.anashkin@gmail.com Oleg Anashkin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: