-
Bug
-
Resolution: Done
-
Major
-
0.9.1.Final
-
None
If the primary key is updated, there are two corresponding entries in the capture instance - delete and insert, both with exactly the same _$start_lsn and _$seqval.
create table users (name varchar(50) NOT NULL PRIMARY KEY, age int NOT NULL); exec sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'users', @role_name = NULL, @supports_net_changes = 0; insert into users values ("bar", 123); -- start connector. update users set name = "newbar" where name = "bar"; __$start_lsn __$end_lsn __$seqval __$operation __$update_mask name age __$command_id ---------------------- ---------------------- ---------------------- ------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- ----------- ------------- 0x00000026000008F00006 NULL 0x00000026000008F00002 1 0x03 bar 111 1 0x00000026000008F00006 NULL 0x00000026000008F00002 2 0x03 newbar 111 2 (2 rows affected)
SQL Server connector keeps track of (commitLsn, changeLsn) as offsets.
It may happen that before connector crash only the first records has been put into kafka and its offset has been remembered. In such case the second record is skipped after a restart.
EDIT: In fact more CDC logs can be lost. If more than one primary key is modified in a transaction, then the order of CDC logs is as follows:
select [__$start_lsn], [__$seqval], [__$operation] from cdc.fn_cdc_get_all_changes_dbo_capture_instance(..., ..., 'all update old');
__$start_lsn __$seqval __$operation
---------------------- ---------------------- ------------ -------------
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0021 1
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0025 1
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0028 1
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB002B 1
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB002E 1
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0031 1
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0021 2
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0025 2
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0028 2
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB002B 2
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB002E 2
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0031 2
Please note, that records are ordered by 1) commitLsn 2) operation 3) changeLsn.
It may happen, that a part of records with `operation=2` may be lost. Example scenario:
- Kafka Connect has processed the first 4 records. The offset saved is `0x0006C5C1009E39CB0125 0x0006C5C1009E38DB002B`
- Kafka Connect is restarted.
- Kafka Connect skips all records with offset lower or equal to `0x0006C5C1009E39CB0125 0x0006C5C1009E38DB002B`. In consequence the following rows will be skipped, but they should not be!
0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0021 2 0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0025 2 0x0006C5C1009E39CB0125 0x0006C5C1009E38DB0028 2 0x0006C5C1009E39CB0125 0x0006C5C1009E38DB002B 2
- is related to
-
DBZ-1483 Add column_id column to metadata section in messages in Kafka topic
- Closed