-
Feature Request
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
None
-
False
Feature request or enhancement
Which use case/requirement will be addressed by the proposed feature?
Debezium should be able to truncate large columns if they exceed a certain size limit. This means that the field values can be preserved as is for messages that are lower than a size limit, and for field values beyond the size limit the data of that specific column is truncated. This would initially apply to string data types which are common large column data types (e.g., mediumtext in MySQL, Vitess).
The use case that is addressed here is:
- Downstream users do not need the entire content of a large field - users don't need to read everything just the first N characters. Since the whole content is not needed, a chunking or payload offloading solution is not necessary. Typically, most messages won't have a large value, so it won't be truncated (handles long tail behavior for infrequent oversized values)
- Kafka or downstream message queue is not misused: Kafka works best with small messages <= 1 MiB. For database tables, this size can easily be violated by certain data types common in databases. Truncation will keep messages below this threshold and not cause issues on Kafka cluster or downstream message system. This reduces risk of Debezium causing an incident (e.g., a table starts producing way more large messages could cause problems for Kafka)
Implementation ideas (optional)
We propose adding new configs
- column.truncate.list - An optional, comma-separated list of regular expressions that match the fully-qualified names of columns that should be truncated
- column.truncate.sizes - An optional, comma-separated list of integer values that are used in order to truncate the length of the columns of the previous config
We will validate that the truncation is only applied to string values initially. If we want to extend this to other data types (e.g.., byte arrays) we can do that in another ticket.
This would be useful to us as users of the Vitess connector, so it could be implemented there, or we could implement this in the main Debezium repo if we think other connectors would benefit.
- links to
-
RHEA-2024:129636 Red Hat build of Debezium 2.5.4 release