Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-7903

Improve blocking snapshot reliability in case of restart

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • 3.0.2.Final
    • None
    • core-library
    • None

      Given the following scenario:

      snapshot.mode=never and a blocking snapshot triggered. If it will not finish and the connector is restarted, since the snapshot.mode does not permit the snapshot it will then throw an error. This is just because the snapshot context for the blocking snapshot is the same for the initial snapshot and a not completed snapshot will be detected.

      We want to enhance this behavior adding a specific flag for marking the blocking snapshot and skip the check (validation of snapshot mode) in case there is a blocking snapshot in progress.

            [DBZ-7903] Improve blocking snapshot reliability in case of restart

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Red Hat build of Debezium 3.0.8 release), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHEA-2025:3803

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Red Hat build of Debezium 3.0.8 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2025:3803

            Released

            Debezium Builder added a comment - Released

            you need to do it in a single transaction

            Yeah, forgot about it. 

            Mario Fiore Vitale added a comment - you need to do it in a single transaction Yeah, forgot about it. 

            Well, in fact it is not possible to do it for blocking snapshot. If you want to have a consistent view on the data then you need to do it in a single transaction. That's the reason while also initial snapshot cannot be resumed and is re-executed from start.

            Jiri Pechanec added a comment - Well, in fact it is not possible to do it for blocking snapshot. If you want to have a consistent view on the data then you need to do it in a single transaction. That's the reason while also initial snapshot cannot be resumed and is re-executed from start.

            Hi peterhmatillion 

            no particular reason behind this decision and there is always room for improvements. So I think it can be a good enhancement so can you please log an enhancement issue and link to this? Thanks.

            Mario Fiore Vitale added a comment - Hi peterhmatillion   no particular reason behind this decision and there is always room for improvements. So I think it can be a good enhancement so can you please log an enhancement issue and link to this? Thanks.

            Peter Hamer added a comment -

            Hi rh-ee-mvitale,

            You have stated "since the blocking snapshot is an on demand procedure it will not has the retry automatism in case of failure/restart."

            However, when an ad-hoc incremental snapshot fails, the connector can recover and retry the process from where it left off using the offset. Can you explain why the two have a different policy around retrying on a restart?

            Peter Hamer added a comment - Hi rh-ee-mvitale , You have stated "since the blocking snapshot is an on demand procedure it will not has the retry automatism in case of failure/restart." However, when an ad-hoc incremental snapshot fails, the connector can recover and retry the process from where it left off using the offset. Can you explain why the two have a different policy around retrying on a restart?

            Peter Hamer added a comment -

            rh-ee-mvitale Sounds good. I would just want to avoid an auto snapshot of all my tables upon connector restart. Im happy for the process to be that I just need to re-trigger the ad-hoc snapshot at my own leisure.

            Peter Hamer added a comment - rh-ee-mvitale Sounds good. I would just want to avoid an auto snapshot of all my tables upon connector restart. Im happy for the process to be that I just need to re-trigger the ad-hoc snapshot at my own leisure.

            Hi peterhmatillion

            since the blocking snapshot is an on demand procedure it will not has the retry automatism in case of failure/restart. The scope of this is to avoid that a blocking snapshot interfere with the initial one. 

            As of now, you issue for example a blocking snapshot for two tables and the second one fails, you just need to manually send another signal for the failing table. 

            So if you think that having a mechanisms of smart retry could be a things to have, feel free to open a new enhancement. I think it should be something like the incremental snasphot.

            Mario Fiore Vitale added a comment - Hi peterhmatillion ,  since the blocking snapshot is an on demand procedure it will not has the retry automatism in case of failure/restart. The scope of this is to avoid that a blocking snapshot interfere with the initial one.  As of now, you issue for example a blocking snapshot for two tables and the second one fails, you just need to manually send another signal for the failing table.  So if you think that having a mechanisms of smart retry could be a things to have, feel free to open a new enhancement. I think it should be something like the incremental snasphot.

            Peter Hamer added a comment - - edited

            Its worth noting that an ad-hoc blocking snapshot can be started with a subset of the captured tables. I would expect the connector to retry the snapshot for that subset of tables, rather than the entire set.

            Perhaps, the set of tables captured by the blocking snapshot could be captured within the offset?

            Peter Hamer added a comment - - edited Its worth noting that an ad-hoc blocking snapshot can be started with a subset of the captured tables. I would expect the connector to retry the snapshot for that subset of tables, rather than the entire set. Perhaps, the set of tables captured by the blocking snapshot could be captured within the offset?

            Yeah, maybe I got your point. Do you mean something like this https://github.com/debezium/debezium/pull/5938?

            Mario Fiore Vitale added a comment - Yeah, maybe I got your point. Do you mean something like this https://github.com/debezium/debezium/pull/5938?

              rh-ee-mvitale Mario Fiore Vitale
              rh-ee-mvitale Mario Fiore Vitale
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: