Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-5894

Transition Kafka serialization from the deprecated JsonMode.STRICT mode to the standard, EJSON-based JsonMode.EXTENDED

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Major
    • 2.4-backlog
    • None
    • mongodb-connector
    • None
    • False
    • None
    • False

    Description

      Overview

      Currently, we use the deprecated STRICT mode over the standard, EJSON-based EXTENDED mode for serializing MongoDB documents on the wire to Kafka.

      The public documentation mentions "extended json serialization strict mode", but the link then leads to EJSON v2 which doesn't support Strict mode. Originally the link lead to EJSON v1.

      It is assumed this is because the standard extended mode didn't exist back when this code was originally written.

      Problem

      There are two problems currently with respect to MongoDB JSON serialization for Kafka publication.

      Deprecation

      STRICT is marked deprecated by MongoDB:

      Deprecated
      The format generated with this mode is no longer considered standard for MongoDB tools. This value is not currently scheduled for removal.

      While not an immediate concerns due to it not being slated for removal today, it may be problematic in the future, or for tooling that relies on directly parsing the JSON which does't support his mode (see next section).

      Documentation

      Today, only documentation related to the keys is present:

      A change event’s key contains the schema for the changed document’s key and the changed document’s actual key. For a given collection, both the schema and its corresponding payload contain a single id field. The value of this field is the document’s identifier represented as a string that is derived from MongoDB extended JSON serialization strict mode.

      However, there is no public facing documentation that describes the current format of the values encoded on the wire. This makes it a challenge for downstream tools to be able to parse Debezium encoded Kafka messages reliably. We should consider at least documenting the value encode.

      Notes

      We can considering using the following property to ease the transition:

      source.struct.version

      Schema version for the source block in CDC events

      When looking into the actual differences of STRICT vs EXTENDED, there are only 3 of note:

      Which describes this use case pretty close.

      Attachments

        Activity

          People

            Unassigned Unassigned
            btiernay Bob Tiernay
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: