[DBZ-7903] Improve blocking snapshot reliability in case of restart

Type: Enhancement
Resolution: Done
Priority: Major
Fix Version/s: 3.0.2.Final
Affects Version/s: None
Component/s: core-library
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Git Pull Request:
https://github.com/debezium/debezium/pull/5938
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Given the following scenario:

snapshot.mode=never and a blocking snapshot triggered. If it will not finish and the connector is restarted, since the snapshot.mode does not permit the snapshot it will then throw an error. This is just because the snapshot context for the blocking snapshot is the same for the initial snapshot and a not completed snapshot will be detected.

We want to enhance this behavior adding a specific flag for marking the blocking snapshot and skip the check (validation of snapshot mode) in case there is a blocking snapshot in progress.

is related to

DBZ-8244 An aborted ad-hoc blocking snapshot leaves the connector in a broken state

Closed

relates to

DBZ-8335 Improve behavior of blocking snapshot in case of failures

Open

links to

RHEA-2025:147677 Red Hat build of Debezium 3.0.8 release

Errata Tool added a comment - 2025/04/10 11:35 AM

Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

For information on the advisory (Red Hat build of Debezium 3.0.8 release), and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHEA-2025:3803

Errata Tool added a comment - 2025/04/10 11:35 AM Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Red Hat build of Debezium 3.0.8 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2025:3803

Debezium Builder added a comment - 2024/11/15 12:14 PM

Released

Debezium Builder added a comment - 2024/11/15 12:14 PM Released

Mario Fiore Vitale added a comment - 2024/10/24 9:54 AM

you need to do it in a single transaction

Yeah, forgot about it.

Mario Fiore Vitale added a comment - 2024/10/24 9:54 AM you need to do it in a single transaction Yeah, forgot about it.

Jiri Pechanec added a comment - 2024/10/24 8:14 AM

Well, in fact it is not possible to do it for blocking snapshot. If you want to have a consistent view on the data then you need to do it in a single transaction. That's the reason while also initial snapshot cannot be resumed and is re-executed from start.

Jiri Pechanec added a comment - 2024/10/24 8:14 AM Well, in fact it is not possible to do it for blocking snapshot. If you want to have a consistent view on the data then you need to do it in a single transaction. That's the reason while also initial snapshot cannot be resumed and is re-executed from start.

Mario Fiore Vitale added a comment - 2024/10/24 8:02 AM

Hi peterhmatillion

no particular reason behind this decision and there is always room for improvements. So I think it can be a good enhancement so can you please log an enhancement issue and link to this? Thanks.

Mario Fiore Vitale added a comment - 2024/10/24 8:02 AM Hi peterhmatillion no particular reason behind this decision and there is always room for improvements. So I think it can be a good enhancement so can you please log an enhancement issue and link to this? Thanks.

Peter Hamer added a comment - 2024/10/22 3:11 PM

Hi rh-ee-mvitale,

You have stated "since the blocking snapshot is an on demand procedure it will not has the retry automatism in case of failure/restart."

However, when an ad-hoc incremental snapshot fails, the connector can recover and retry the process from where it left off using the offset. Can you explain why the two have a different policy around retrying on a restart?

Peter Hamer added a comment - 2024/10/22 3:11 PM Hi rh-ee-mvitale , You have stated "since the blocking snapshot is an on demand procedure it will not has the retry automatism in case of failure/restart." However, when an ad-hoc incremental snapshot fails, the connector can recover and retry the process from where it left off using the offset. Can you explain why the two have a different policy around retrying on a restart?

Peter Hamer added a comment - 2024/10/17 9:41 AM

rh-ee-mvitale Sounds good. I would just want to avoid an auto snapshot of all my tables upon connector restart. Im happy for the process to be that I just need to re-trigger the ad-hoc snapshot at my own leisure.

Peter Hamer added a comment - 2024/10/17 9:41 AM rh-ee-mvitale Sounds good. I would just want to avoid an auto snapshot of all my tables upon connector restart. Im happy for the process to be that I just need to re-trigger the ad-hoc snapshot at my own leisure.

Mario Fiore Vitale added a comment - 2024/10/17 8:50 AM

Hi peterhmatillion ,

since the blocking snapshot is an on demand procedure it will not has the retry automatism in case of failure/restart. The scope of this is to avoid that a blocking snapshot interfere with the initial one.

As of now, you issue for example a blocking snapshot for two tables and the second one fails, you just need to manually send another signal for the failing table.

So if you think that having a mechanisms of smart retry could be a things to have, feel free to open a new enhancement. I think it should be something like the incremental snasphot.

Mario Fiore Vitale added a comment - 2024/10/17 8:50 AM Hi peterhmatillion , since the blocking snapshot is an on demand procedure it will not has the retry automatism in case of failure/restart. The scope of this is to avoid that a blocking snapshot interfere with the initial one. As of now, you issue for example a blocking snapshot for two tables and the second one fails, you just need to manually send another signal for the failing table. So if you think that having a mechanisms of smart retry could be a things to have, feel free to open a new enhancement. I think it should be something like the incremental snasphot.

Peter Hamer added a comment - 2024/10/17 8:14 AM - edited

Its worth noting that an ad-hoc blocking snapshot can be started with a subset of the captured tables. I would expect the connector to retry the snapshot for that subset of tables, rather than the entire set.

Perhaps, the set of tables captured by the blocking snapshot could be captured within the offset?

Peter Hamer added a comment - 2024/10/17 8:14 AM - edited Its worth noting that an ad-hoc blocking snapshot can be started with a subset of the captured tables. I would expect the connector to retry the snapshot for that subset of tables, rather than the entire set. Perhaps, the set of tables captured by the blocking snapshot could be captured within the offset?

Mario Fiore Vitale added a comment - 2024/10/16 12:50 PM

Yeah, maybe I got your point. Do you mean something like this https://github.com/debezium/debezium/pull/5938?

Mario Fiore Vitale added a comment - 2024/10/16 12:50 PM Yeah, maybe I got your point. Do you mean something like this https://github.com/debezium/debezium/pull/5938?

Assignee:: Mario Fiore Vitale

Reporter:: Mario Fiore Vitale

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/05/28 12:14 PM

Updated:: 2025/04/10 11:35 AM

Resolved:: 2024/10/31 11:07 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Errata Tool added a comment - 2025/04/10 11:35 AM

Expand comment: Errata Tool added a comment - 2025/04/10 11:35 AM

Collapse comment: Debezium Builder added a comment - 2024/11/15 12:14 PM

Expand comment: Debezium Builder added a comment - 2024/11/15 12:14 PM

Collapse comment: Mario Fiore Vitale added a comment - 2024/10/24 9:54 AM

Expand comment: Mario Fiore Vitale added a comment - 2024/10/24 9:54 AM

Collapse comment: Jiri Pechanec added a comment - 2024/10/24 8:14 AM

Expand comment: Jiri Pechanec added a comment - 2024/10/24 8:14 AM

Collapse comment: Mario Fiore Vitale added a comment - 2024/10/24 8:02 AM

Expand comment: Mario Fiore Vitale added a comment - 2024/10/24 8:02 AM

Collapse comment: Peter Hamer added a comment - 2024/10/22 3:11 PM

Expand comment: Peter Hamer added a comment - 2024/10/22 3:11 PM

Collapse comment: Peter Hamer added a comment - 2024/10/17 9:41 AM

Expand comment: Peter Hamer added a comment - 2024/10/17 9:41 AM

Collapse comment: Mario Fiore Vitale added a comment - 2024/10/17 8:50 AM

Expand comment: Mario Fiore Vitale added a comment - 2024/10/17 8:50 AM

Collapse comment: Peter Hamer added a comment - 2024/10/17 8:14 AM, Edited by Peter Hamer - 2024/10/17 8:28 AM

Expand comment: Peter Hamer added a comment - 2024/10/17 8:14 AM, Edited by Peter Hamer - 2024/10/17 8:28 AM

Collapse comment: Mario Fiore Vitale added a comment - 2024/10/16 12:50 PM

Expand comment: Mario Fiore Vitale added a comment - 2024/10/16 12:50 PM

People

Dates