Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1483

Add column_id column to metadata section in messages in Kafka topic

    Details

    • Type: Feature Request
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 0.10.0.Beta1, 0.10.0.Beta2, 0.10.0.Beta3, 0.10.0.Beta4, 0.10.0.CR1
    • Fix Version/s: 0.10.0.CR2
    • Component/s: sqlserver-connector
    • Labels:
      None

      Description

      The situation that leads to an ambiguous result is as follows.
      1. SQL Server updates the unique field (or primary key field) in the table where CDC is enabled.
      Two records with __$operation = 1 (delete) and __$operation = 2 (insert) appear in the corresponding CDC table.
      The record with __$operation = 1 contains data before the change of the unique field (or primary key), the record with __$operation = 2 is data after the change. __$start_lsn and __$seqval of these records are the same.
      Records are also different in the __$command_id field. This field determines the sequence of operations within the transaction. In this case, the value in the field for a 'delete' operation will be less than for the 'insert' one (first 'delete' then 'insert').

      2. Debezium generates and sends to Kafka three events: delete, tombstone event, insert. Events of deletion and insertion, in this case, differ in the 'op' values ('d' for delete, 'c' for inserts) and probably in 'ts_ms'. The 'commit_lsn' value matches the 'change_lsn' one. It means that the order of operations is lost. At the moment, it can be recovered only based on the values of 'op' (first 'delete' then 'insert') or Kafka offset, that is not very convenient and clear.
      For retaining the right order of operations, it would be more correct to pass the value __$command_id to Kafka. It uniquely determines the order of operations.
      The update of 'Customer Id Name' and 'CustomerIdYomiNameof' was made in CRM for the 'IncidentBase' table. The 'Version Num' field, which is unique, was automatically updated which has led to the generating of 'delete'/'insert' instead of 'update'.

      You can see how data in CDC looks like ni Pic1:
      There are 2 bundles of 'delete'/'insert' in one transaction in total. The value of the fields 'Customer Id Name' and 'CustomerInYomiName' were changed from 'Юлия' to 'Beer, Julie'.

      So, Debezium sent the data to Kafka - Pic2:
      There are 6 events in total: two of each operation (delete/tombstone/insert). The sequence of the events is correct according to the Kafka offset.

      Pic3 shows how the data got into GreenPlum.
      The first record is the original record, the next 4 are the data from CDC we are interested in, but the order of operations here is incorrect (I-D-D-I). There is no field __$command_Id by which we could recover the correct sequence. The application of operations in a sequence connector NiFi Load_Raw_To_ODS Time and results in invalid data in the ods.

      Pic4 depicts what we obtain in the ods scheme.
      You can see that 'CustomerId Name' = 'Юлия' should be 'Beer, Julie' according to CDC.

        Gliffy Diagrams

          Attachments

          1. pic1.png
            pic1.png
            39 kB
          2. pic2.png
            pic2.png
            45 kB
          3. pic3.png
            pic3.png
            40 kB
          4. pic4.png
            pic4.png
            18 kB

            Issue Links

              Activity

                People

                • Assignee:
                  jpechanec Jiri Pechanec
                  Reporter:
                  disserakt Roman Ananyev
                • Votes:
                  1 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: