Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-226

Add SMT implementation to convert CDC event structure to more traditional row state structure

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Done
    • Icon: Major Major
    • 0.6
    • 0.5
    • core-library

      Provide an SMT that converts Debezium's CDC event structure to a more conventional structure commonly used in other sink and non-CDC source connectors where the message represents the state of the inserted or updated row, or null in the case of a deleted row. This new SMT would make it much easier to combine a Debezium source connector with a third party consumer or sink connector that require the more conventional event structure.

      This new SMT should be similar to the ByLogicalTableRouter created for DBZ-121.

      Background

      The change event messages produced by the various Debezium source connectors all have a common structure consisting of an envelope with a before field for the state of the row before the change, an after field holding the state of the row after the change, a source field holding the source-specific metadata about the event (e.g., transaction ID, database name, table name, etc.), and the timestamp at which the connector produced the event. The before and after fields may be null depending upon whether the change event was a creation / insert, update, or delete operation.

      This change event structure accurately represents as much as possible about the captured change event. However, it is fundamentally different than other kinds of source connectors that are not really CDC and more commonly produce messages that simply reflect the current state of the row or information. Likewise most sink connectors also expect that the messages provided to it represent the current state of the row or information. This difference in message structure between CDC connectors and other sink connectors makes it more difficult to simply combine connectors into a data pipeline.

      As of Kafka Connect 0.10.2.0, a new feature called Single Message Transforms (see KIP-66) makes it much easier to customize the messages produced by source connectors and those consumed by sink connectors. Each connector can be configured with a chain of zero or more SMTs that are each applied successively. For example, the SourceRecord objects produced by a source connector are passed through the first SMT that outputs a new SourceRecord, which is passed through the second SMT to produce a new SourceRecord, etc. For a sink connector, when Kafka Connect constructs SinkRecord objects for each message read from a Kafka topic, it first passes the SinkRecord through the first SMT to produce a new SinkRecord, which is then passed through the second SMT to produce a new SinkRecord, etc., until the SinkRecord is passed to the sink connector. Kafka Connect 0.10.2.0 comes with a small library of SMT implementations, but anyone can implement their own SMTs.

      Although SMTs are configured when deploying a connector, the SMTs are actually independent of the connector implementation. This makes SMTs a very powerful tool to adapt the structure of the source connector outputs and the structure of the sink connector inputs. And, this means it is far easier to mix and match different connectors into a useful data pipeline without having to write much code.

      What this means is that Debezium no longer has to provide its own sink connectors that understand the Debezium CDC event structure. Instead, SMTs can be used with the Debezium source connector or with the third party sink connectors to transform the events into the desired structure.

      For example, Confluent’s ElasticSearch sink connector accepts a simple record structure and writes the fields and nested fields into ElasticSearch. It is possible to create a data pipeline starting with a Debezium source connector and ending with the ElasticSearch connector, with the latter sink connector configured with SMTs that remove the before field and extract the contents of the after field to the top level of the event structure. Note that this might require multiple SMTs, so Debezium may want to offer an out-of-the-box SMT implementation that does this conversion.

              jpechane Jiri Pechanec
              rhauch Randall Hauch (Inactive)
              Votes:
              4 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: