Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-9648

Add behavior to omit __debezium_unavailable_value columns from UPDATE statements

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • Backlog
    • None
    • jdbc-connector
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Feature request or enhancement

      For feature requests or enhancements, provide this information, please:

      Which use case/requirement will be addressed by the proposed feature?

      When using the Debezium Postgres JDBC sink connector, if a source table's REPLICA IDENTITY is not set to FULL, any large, unchanged columns (i.e., TOASTed values) are represented in the Debezium message as the literal string `__debezium_unavailable_value`.

      This creates a significant problem for the JDBC Sink connector when operating in upsert mode.

      1. The sink connector receives a message containing the __debezium_unavailable_value token.
      1. It constructs an INSERT ... ON CONFLICT ... DO UPDATE SET ... statement.
      1. The DO UPDATE SET clause includes all fields from the message, including the one with the __debezium_unavailable_value token.
      1. This fails at the database level when it tries to assign this literal string to a column of a different type (e.g., JSONB, INT, TIMESTAMP), causing a DataException

      Current Workarounds and Their Limitations

      1. Set REPLICA IDENTITY = FULL on the source table: This is a solution, but it is often highly undesirable as it can cause significant WAL bloat and performance degradation on the source production database.
      1. On the debezium source connector use the "reselect column" post-processor (https://debezium.io/documentation/reference/stable/post-processors/reselect-columns.html). This can potentially create a lot of load querying the db. 

      Implementation ideas (optional)

      Enhance the JDBC Sink connector to intelligently handle this token. The TOAST value indicates that the value is unchanged, so there is no need to include it in the update. When the connector prepares an UPSERT statement, it should recognize the __debezium_unavailable_value as a "no-op" signal for that field.

      If a field in the Kafka record contains the __debezium_unavailable_value token, the connector should omit that column entirely from the SET clause of the DO UPDATE statement.

      This leverages standard SQL behavior: any column not explicitly listed in the SET clause of an UPDATE is left untouched, correctly preserving its existing value in the target database.

      This could be controlled by a new connector configuration property, e.g.: unavailable.value.handling.mode = "omit" (default could be fail to maintain current behavior).

      Example of Proposed Behavior

      Assume a target table products with columns id (PK), name (text), and data (jsonb).

      Current (Failing) Behavior:

      • Inbound Simplified Record:
        Unknown macro: {Unknown macro}

        }

      • Generated Simplified SQL:
        INSERT INTO products (id, name, data)
        VALUES (123, 'new name', '__debezium_unavailable_value')
        ON CONFLICT (id) DO UPDATE SET
        name = EXCLUDED.name,
        data = EXCLUDED.data; – This line causes the database error
      • Result: DataException: invalid input syntax for type json

      Proposed (Successful) Behavior:

      • Inbound Simplified Record:
        Unknown macro: {Unknown macro}

        }

      • Generated Simplified SQL:
        INSERT INTO products (id, name, data)
        VALUES (123, 'new name', NULL) – (or 'DEFAULT', etc., for the INSERT)
        ON CONFLICT (id) DO UPDATE SET
        name = EXCLUDED.name; – 'data' column is omitted from the SET clause
      • Result: Success. The name column is updated, and the existing data value for row 123 is correctly preserved.

      This enhancement would make the Debezium Postgres JDBC Sink pipeline more robust and usable for replication scenarios where setting {{REPLICA IDENTITY = FULL }}or using the reselect column post processor are not viable options. 

              Unassigned Unassigned
              mhmcdonald Mark McDonald
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: