-
Enhancement
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
False
-
Undefined
-
The following customer suggestions refer to the Debezium connector for SQL Server doc topic in the upstream Debezium 1.3 documentation.
===================================================================================================
1. v1.3 should be stable, I believe?
[BobR] A header in the Debezium 1.3 topic directs readers to the latest stable release. AFAIK, this header is not part of the documentation source. Is it auto-generated? Although this is moot at this stage for the 1.3 version, because it no longer displays the header,. it would be good to know where it originates, because its now rendered for the master version.
2. Clarify which table the following statement refers to:
2. Obtain a lock on each of the monitored tables to ensure that no structural changes can occur to any of the tables. The level of the lock is determined by snapshot.isolation.mode configuration option.
The CDC table, or the "source" table, so to speak?
Response from engineering: "Yes, this refers to the source table."
6. Scan all of the relevant database tables and schemas as valid at the LSN position read in step 3, and generate a READ event for each row and write that event to the appropriate table-specific Kafka topic.
As far as I can tell from the connector logs (convoluted scenario: Debezium within Kubernetes managed by strimzi.io) it's a SELECT * from the "source" table. There might be no log sequence number (LSN) in the CDC tables, I've been told by our DBA that the CDC tables do have an expiration of sorts. Or what if no changes have been recorded yet (very old lookup tables)? Which table is it again?
I can find out by testing (and we did), but I'd really love to be sure that the "initial snapshot" (done once) is sending all the rows in the tables subject to CDC via read events and then it'll start streaming the actual change events.
I might be of course misreading it, but I'd be more comfortable with a different wording.
3. Timestamp – Question about callout #10 for the customers table example in the Change event values section, which identifies ts_ms as the time when Debezium processed an event.
10 | ts_ms | Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task.
Later, in the source.timestamp.mode entry entry in the advanced configuration properties table, it says that, by default, it is the database operation timestamp.
String representing the criteria of the attached timestamp within the source record (ts_ms). commit will set the source timestamp to the instant where the record was committed in the database (default and current behavior). processing will set the source timestamp to the instant where the record was processed by Debezium. This option could be used when either we want to set the top level ts_ms value here or when we want to skip the query to extract the timestamp of that LSN.
Response from engineering:
ts_ms at the envelope level is always [the] processing time in the sense that it is set to an instant when the Connect record is created out of the CDC data.
ts_ms at source level is either set to [either]:
- PROCESSING - here it is set to an instant at which the row from change table is first time accessed by Debezium.
- The COMMIT setting means that the field is set to an instant when the change was comitted to the database.
The second options has associated cost with it hence two options.