-
Feature Request
-
Resolution: Unresolved
-
Major
-
None
-
None
-
False
-
-
False
Feature request or enhancement
For feature requests or enhancements, provide this information, please:
Which use case/requirement will be addressed by the proposed feature?
When using the Debezium Postgres JDBC sink connector, if a source table's REPLICA IDENTITY is not set to FULL, any large, unchanged columns (i.e., TOASTed values) are represented in the Debezium message as the literal string `__debezium_unavailable_value`.
This creates a significant problem for the JDBC Sink connector when operating in upsert mode.
- The sink connector receives a message containing the __debezium_unavailable_value token.
- It constructs an INSERT ... ON CONFLICT ... DO UPDATE SET ... statement.
- The DO UPDATE SET clause includes all fields from the message, including the one with the __debezium_unavailable_value token.
- This fails at the database level when it tries to assign this literal string to a column of a different type (e.g., JSONB, INT, TIMESTAMP), causing a DataException
Current Workarounds and Their Limitations
- Set REPLICA IDENTITY = FULL on the source table: This is a solution, but it is often highly undesirable as it can cause significant WAL bloat and performance degradation on the source production database.
- On the debezium source connector use the "reselect column" post-processor (https://debezium.io/documentation/reference/stable/post-processors/reselect-columns.html). This can potentially create a lot of load querying the db.
Implementation ideas (optional)
Enhance the JDBC Sink connector to intelligently handle this token. The TOAST value indicates that the value is unchanged, so there is no need to include it in the update. When the connector prepares an UPSERT statement, it should recognize the __debezium_unavailable_value as a "no-op" signal for that field.
If a field in the Kafka record contains the __debezium_unavailable_value token, the connector should omit that column entirely from the SET clause of the DO UPDATE statement.
This leverages standard SQL behavior: any column not explicitly listed in the SET clause of an UPDATE is left untouched, correctly preserving its existing value in the target database.
This could be controlled by a new connector configuration property, e.g.: unavailable.value.handling.mode = "omit" (default could be fail to maintain current behavior).
Example of Proposed Behavior
Assume a target table products with columns id (PK), name (text), and data (jsonb).
Current (Failing) Behavior:
- Inbound Simplified Record:
Unknown macro: {Unknown macro}
}
- Generated Simplified SQL:
INSERT INTO products (id, name, data)
VALUES (123, 'new name', '__debezium_unavailable_value')
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name,
data = EXCLUDED.data; – This line causes the database error - Result: DataException: invalid input syntax for type json
Proposed (Successful) Behavior:
- Inbound Simplified Record:
Unknown macro: {Unknown macro}
}
- Generated Simplified SQL:
INSERT INTO products (id, name, data)
VALUES (123, 'new name', NULL) – (or 'DEFAULT', etc., for the INSERT)
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name; – 'data' column is omitted from the SET clause - Result: Success. The name column is updated, and the existing data value for row 123 is correctly preserved.
This enhancement would make the Debezium Postgres JDBC Sink pipeline more robust and usable for replication scenarios where setting {{REPLICA IDENTITY = FULL }}or using the reselect column post processor are not viable options.