• Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • 2.4.0.Alpha2
    • None
    • core-library
    • None
    • False
    • None
    • False

      In some scenarios it is necessary to re-execute initial-like snapshot while the user is content of having streaming stopped.
      We should provide a new blocking ad-hoc snapshot that will work like traditional initial snapshot but can be triggered on-demand.

            [DBZ-6566] Support blocking ad-hoc snapshots

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Red Hat build of Debezium 2.5.4 release), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHEA-2024:1726

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Red Hat build of Debezium 2.5.4 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2024:1726

            Released

            Debezium Builder added a comment - Released

            rk3rn3r Hi, this is more about snapshotting of new tables less about re-snapshotting.
            The main use case is - I have a new table with few millions of rows. It will take a couple of hourse with incrmental snapshot and couple of minutes with the regular one. I am fine with having streaming paused for a while so I want to get this finished fast.

            Jiri Pechanec added a comment - rk3rn3r Hi, this is more about snapshotting of new tables less about re-snapshotting. The main use case is - I have a new table with few millions of rows. It will take a couple of hourse with incrmental snapshot and couple of minutes with the regular one. I am fine with having streaming paused for a while so I want to get this finished fast.

            I think that the typical uses cases is adding new tables and if you don't want to go with snapshot while streaming you can use the blocking snapshot that will stop the streaming. I think it's just another possibility. 

            jpechane do you have more details about specific uses cases?

            Mario Fiore Vitale added a comment - I think that the typical uses cases is adding new tables and if you don't want to go with snapshot while streaming you can use the blocking snapshot that will stop the streaming. I think it's just another possibility.  jpechane do you have more details about specific uses cases?

            Reading that discussion I got one question: Do ad-hoc blocking aka INITIAL-style snapshots make sense? Wouldn't this require to drain/clear the Kafka topic before starting that kind of re-snapshotting to avoid duplication issues like the ones that you are describing here? If so, such kind of snapshots wouldn't make a lot of sense because that means I need to stop the connector anyway, clear/re-create my topic and then I can re-start the connector forcing a normal snapshot. But you are wrapping your head around that longer than I did now and I might oversee things.

            René Kerner added a comment - Reading that discussion I got one question: Do ad-hoc blocking aka INITIAL-style snapshots make sense? Wouldn't this require to drain/clear the Kafka topic before starting that kind of re-snapshotting to avoid duplication issues like the ones that you are describing here? If so, such kind of snapshots wouldn't make a lot of sense because that means I need to stop the connector anyway, clear/re-create my topic and then I can re-start the connector forcing a normal snapshot. But you are wrapping your head around that longer than I did now and I might oversee things.

            Right now it is correct. We need to explain in the docs that blocking snapshot might not be consistent and create a follow-up Jira that would handle deduplication in case of 46-58 overlap.

            Jiri Pechanec added a comment - Right now it is correct. We need to explain in the docs that blocking snapshot might not be consistent and create a follow-up Jira that would handle deduplication in case of 46-58 overlap.

            Mario Fiore Vitale added a comment - - edited

            jpechane 

            I am testing this scenario and would like to understand if sounds good

            Table a with a column the contains the number of the record, contains 1000 records. The connector starts and will snapshot these 1000 records. Then while streaming others 1000 records will be inserted on the same table, and at certain point a signal to execute blocking snapshot will be sent. The streaming stops at record with #1045. The blocking snapshot will snapshot again the table reading the first 1000 (same record from initial snapshot) and then about more ~45, (insert of records continues while snapshot from blocking snapshot request will effectively starts), let's say until record #1057. After the blocking snapshot finishes the streaming will resume from record #1046. Is it right or is should start from record #1058?

            Mario Fiore Vitale added a comment - - edited jpechane   I am testing this scenario and would like to understand if sounds good Table  a  with a column the contains the number of the record, contains 1000 records. The connector starts and will snapshot these 1000 records. Then while streaming others 1000 records will be inserted on the same table, and at certain point a signal to execute blocking snapshot will be sent. The streaming stops at record with #1045. The blocking snapshot will snapshot again the table reading the first 1000 (same record from initial snapshot) and then about more ~45, (insert of records continues while snapshot from blocking snapshot request will effectively starts), let's say until record #1057. After the blocking snapshot finishes the streaming will resume from record #1046. Is it right or is should start from record #1058?

            rh-ee-mvitale Yes, exactly. You always snapshot the full table (or subset depending on filter setting).
            The typical uses case is snapshotting of new tables so this concern will be much lessened.

            Jiri Pechanec added a comment - rh-ee-mvitale Yes, exactly. You always snapshot the full table (or subset depending on filter setting). The typical uses case is snapshotting of new tables so this concern will be much lessened.

            jpechane Just to understand how this should work.

            For my understanding, a connector is configure to do initial snapshot. When the connector starts for the first time it will lock the database to snapshot the current status reading data from configured tables. After the snapshot finishes the streaming will start from position read on the log before starting the snapshot.

            Suppose that in the snapshot phase the record produced for a specific table are 10. Then the streaming executes for a certain time and produces other 10 records. If then I start a `blocking snapshot` should it produces 20 records (10+10)?

            Mario Fiore Vitale added a comment - jpechane Just to understand how this should work. For my understanding, a connector is configure to do initial snapshot. When the connector starts for the first time it will lock the database to snapshot the current status reading data from configured tables. After the snapshot finishes the streaming will start from position read on the log before starting the snapshot. Suppose that in the snapshot phase the record produced for a specific table are 10. Then the streaming executes for a certain time and produces other 10 records. If then I start a `blocking snapshot` should it produces 20 records (10+10)?

            Yes, but let's use word blocking or similar as initial sounds like something done first.

            Jiri Pechanec added a comment - Yes, but let's use word blocking or similar as initial sounds like something done first.

              rh-ee-mvitale Mario Fiore Vitale
              jpechane Jiri Pechanec
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: