Details
-
Task
-
Resolution: Unresolved
-
Major
-
None
-
None
-
False
-
None
-
False
Description
Overview
Currently, we use the deprecated STRICT mode over the standard, EJSON-based EXTENDED mode for serializing MongoDB documents on the wire to Kafka.
The public documentation mentions "extended json serialization strict mode", but the link then leads to EJSON v2 which doesn't support Strict mode. Originally the link lead to EJSON v1.
It is assumed this is because the standard extended mode didn't exist back when this code was originally written.
Problem
There are two problems currently with respect to MongoDB JSON serialization for Kafka publication.
Deprecation
STRICT is marked deprecated by MongoDB:
Deprecated
The format generated with this mode is no longer considered standard for MongoDB tools. This value is not currently scheduled for removal.
While not an immediate concerns due to it not being slated for removal today, it may be problematic in the future, or for tooling that relies on directly parsing the JSON which does't support his mode (see next section).
Documentation
Today, only documentation related to the keys is present:
A change event’s key contains the schema for the changed document’s key and the changed document’s actual key. For a given collection, both the schema and its corresponding payload contain a single id field. The value of this field is the document’s identifier represented as a string that is derived from MongoDB extended JSON serialization strict mode.
However, there is no public facing documentation that describes the current format of the values encoded on the wire. This makes it a challenge for downstream tools to be able to parse Debezium encoded Kafka messages reliably. We should consider at least documenting the value encode.
Notes
We can considering using the following property to ease the transition:
Schema version for the source block in CDC events
When looking into the actual differences of STRICT vs EXTENDED, there are only 3 of note:
- Date-time encoding
- Binary encoding
- [Regular expression encoding|]
Which describes this use case pretty close.