Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Major
Fix Version/s: 1.9-backlog
Affects Version/s: None
Component/s: postgresql-connector
Labels:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Below I propose 3 potential solution to this issue. We are likely to use Debezium in a way where we will need to add tables to it at least dozens of times per year. So this particular feature is an important one for us.

1. With initial snapshot mode, we could be snapshotting new tables using a similar workflow to how postgres handles snapshotting new tables added to an existing logical replication publication/subscription.

For example, you can add a table in postgres to replication on the subscriber, with a command like so:
Publisher:
```
ALTER PUBLICATION foo ADD TABLE bar;
```
Subscriber:
```
ALTER SUBSCRIPTION foo_sub REFRESH PUBLICATION WITH (copy_data = true);
```

What happens then is explained well in this brief summary:
https://www.postgresql.org/docs/12/logical-replication-architecture.html#LOGICAL-REPLICATION-SNAPSHOT

"The initial data in existing subscribed tables are snapshotted and copied in a parallel instance of a special kind of apply process. This process will create its own temporary replication slot and copy the existing data. Once existing data is copied, the worker enters synchronization mode, which ensures that the table is brought up to a synchronized state with the main apply process by streaming any changes that happened during the initial data copy using standard logical replication. Once the synchronization is done, the control of the replication of the table is given back to the main apply process where the replication continues as normal."

So in terms of what could be done in Debezium:

We create another, temporary replication slot to queue changes during copy. At this point we also ignore published changes for this table from the main replication slot.
Then we copy the new table data into kafka
Once finished, we read all data from the replication slot until the end
At this point, we will have two LSN events for each write in each replication slot, the main and copy one, for the same table. I believe we would need to do something like record the last LSN of the last event from the copy slot, then start reading again from the main slot only following that LSN. Then we drop the copy slot. Of course, we would need to ensure we saw the "same last event" in the main slot so that we can verify we did not skip data.

This is a complex operation and one solution to the problem posed by ~~DBZ-1961~~, which is really a workaround to having this functionality built-in to snapshot new tables.

2. A second, different solution to this problem would be supporting something like `snapshot.table.whitelist`. This idea then is you can actually list in the same connector as the main one a list of tables to snapshot, rather than inheriting from the other configs. This would allow me for example to add a table to debezium config, and then just do a one-time snapshot by restarting the connector with snapshot.mode = always and snapshot.table.whitelist = my_new.table.

3. Yet a third solution is what Jiri suggested in the same ticket - support another snapshot mode which allows me to create a "snapshot only connector" where I do not want to stream changes at all after the snapshot.

I really appreciate any input here. Thank you!

duplicates

DBZ-1263 Allow table.include.list and table.exclude.list to be updated after a connector is created

Closed

is related to

DBZ-1591 Investigate the possibility of using temporary replication slot for snapshotting

Pull Request Sent

Assignee:: Unassigned

Reporter:: Jeremy Finzel (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2020/05/06 10:41 AM

Updated:: 2023/01/24 3:42 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates