Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-52

Add top-level "standard" layer to events for metadata, before/after, and other info

XMLWordPrintable

      Currently, Debezium's MySQL connector generates events with a key Struct that contains the primary/unique key of the affected row, plus a value Struct that contains the a field for each of the row's columns. This approach is simple, but it leaves no room for additional information in the event, such as metadata (see snapshot generation ID in DBZ-31) or even before/after values associated with updates (see DBZ-45).

      To add other information to the event, we'd either need to reserve special fields and risk conflicts with column names in the upstream sources, or we need to rethink the current structure of the event values and introduce at the top-level of the value structure a "container" struct with fields for new values, old values, type of event/operation, and other metadata not included in the record per se. For example, a value struct might look like this:

      {
        "before" : {
          "customerId" : 1234,
          "customerName" : "Janie Anne Doe",
         },
        "after" : {
          "customerId" : 1234,
          "customerName" : "Jane Doe",
         },
         "op" : "u"
      }
      

      This actually would allow us to distinguish between INSERT and UPDATE events, would allow inclusion of the before state for UPDATE and DELETE events, store additional metadata such as the type of operation, the source offset information, snapshot generation identifiers, etc. It leaves room for the connectors to evolve and add more information to the messages over time. And, this structure would even allow downstream services to augment messages with additional information (e.g., patches describing what exactly changed between the old and new values) and write them to other topics.

      Additionally, it is possible that all connectors could share a single schema for this top-level container structure (likely with many of the fields being optional) – or at the very least to define top-level fields with common semantics.

      (This approach is a bit more like the MongoDB events, although rather than "before" and "after" fields MongoDB events might have a "patch" field containing a description of what changed.)

      One concern that we will have is what to do with DELETE events if they have a container structure. Prior to this change, DELETE events have a null message value so that Kafka's log compaction will recognize that as an entity that has been removed and will reclaim the space. A simple approach is to issue the DELETE event and follow it up with a tombstone event that has the same key but a null value, though we have to ensure that a tombstone event is always generated in all cases.

              rhauch Randall Hauch (Inactive)
              rhauch Randall Hauch (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: