Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1483

Add column_id column to metadata section in messages in Kafka topic

    XMLWordPrintable

Details

    • Feature Request
    • Resolution: Done
    • Major
    • 0.10.0.CR2
    • 0.10.0.Beta1, 0.10.0.Beta2, 0.10.0.Beta3, 0.10.0.Beta4, 0.10.0.CR1
    • sqlserver-connector
    • None

    Description

      The situation that leads to an ambiguous result is as follows.
      1. SQL Server updates the unique field (or primary key field) in the table where CDC is enabled.
      Two records with __$operation = 1 (delete) and __$operation = 2 (insert) appear in the corresponding CDC table.
      The record with __$operation = 1 contains data before the change of the unique field (or primary key), the record with __$operation = 2 is data after the change. __$start_lsn and __$seqval of these records are the same.
      Records are also different in the __$command_id field. This field determines the sequence of operations within the transaction. In this case, the value in the field for a 'delete' operation will be less than for the 'insert' one (first 'delete' then 'insert').

      2. Debezium generates and sends to Kafka three events: delete, tombstone event, insert. Events of deletion and insertion, in this case, differ in the 'op' values ('d' for delete, 'c' for inserts) and probably in 'ts_ms'. The 'commit_lsn' value matches the 'change_lsn' one. It means that the order of operations is lost. At the moment, it can be recovered only based on the values of 'op' (first 'delete' then 'insert') or Kafka offset, that is not very convenient and clear.
      For retaining the right order of operations, it would be more correct to pass the value __$command_id to Kafka. It uniquely determines the order of operations.
      The update of 'Customer Id Name' and 'CustomerIdYomiNameof' was made in CRM for the 'IncidentBase' table. The 'Version Num' field, which is unique, was automatically updated which has led to the generating of 'delete'/'insert' instead of 'update'.

      You can see how data in CDC looks like ni Pic1:
      There are 2 bundles of 'delete'/'insert' in one transaction in total. The value of the fields 'Customer Id Name' and 'CustomerInYomiName' were changed from 'Юлия' to 'Beer, Julie'.

      So, Debezium sent the data to Kafka - Pic2:
      There are 6 events in total: two of each operation (delete/tombstone/insert). The sequence of the events is correct according to the Kafka offset.

      Pic3 shows how the data got into GreenPlum.
      The first record is the original record, the next 4 are the data from CDC we are interested in, but the order of operations here is incorrect (I-D-D-I). There is no field __$command_Id by which we could recover the correct sequence. The application of operations in a sequence connector NiFi Load_Raw_To_ODS Time and results in invalid data in the ods.

      Pic4 depicts what we obtain in the ods scheme.
      You can see that 'CustomerId Name' = 'Юлия' should be 'Beer, Julie' according to CDC.

      Attachments

        1. pic1.png
          pic1.png
          39 kB
        2. pic2.png
          pic2.png
          45 kB
        3. pic3.png
          pic3.png
          40 kB
        4. pic4.png
          pic4.png
          18 kB

        Issue Links

          Activity

            People

              jpechane Jiri Pechanec
              disserakt Roman Ananyev (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: