Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-5571

Inconsistent column type naming between streaming and incremental snapshot

    XMLWordPrintable

Details

    • False
    • None
    • False

    Description

      In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.

      Bug report

      For bug reports, provide this information, please:

      What Debezium connector do you use and what version?

      PostgresConnector, v1.9.2.Final

      What is the connector configuration?

       

      {
          "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
          "database.hostname": "{{database_hostname}}",
          "database.port": "5432",
          "database.dbname": "db1",
          "database.user": "postgres",
          "database.password": "{{database_password}}",
          "database.server.name": "db1",
          "slot.name": "cdc_debezium_test",
          "plugin.name": "pgoutput",
          "publication.name": "cdc_dbz",
          "table.include.list": "db1.table1,db1.table2,db1.debezium_signals",
          "signal.data.collection": "db1.debezium_signals",
          "heartbeat.interval.ms": "3000",
          "heartbeat.action.query": "INSERT INTO db1.debezium_heartbeat (id, ts) VALUES ('cdc', now()) ON CONFLICT (id) DO UPDATE SET ts = now();",
          "snapshot.mode": "never",
          "key.converter": "org.apache.kafka.connect.json.JsonConverter",
          "key.converter.schemas.enable": "false",
          "value.converter": "io.confluent.connect.avro.AvroConverter",
          "value.converter.schema.registry.url": "http://host.docker.internal:8081",
          "topic.creation.default.replication.factor": 1,
          "topic.creation.default.partitions": 1,
          "topic.creation.default.cleanup.policy": "compact",
          "transforms": "route",
          "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
          "transforms.route.regex": ".*",
          "transforms.route.replacement": "cdc_dbz.$0.v1",
          "converters": "financialAsset",
          "financialAsset.column.type.name": "financial_asset",
          "financialAsset.schema.name": "brex.debezium.extensions.data.FinancialAsset",
          "financialAsset.type": "brex.debezium.extensions.converters.FinancialAssetConverter",
          "include.unknown.datatypes": true
      }

       

      What is the captured database version and mode of depoyment?

      (E.g. on-premises, with a specific cloud provider, etc.)

      Database version: PostgreSQL 13

      Mode of deployment: Amazon EKS

      What behaviour do you expect?

      We expect to have consistent column type naming between streaming change events and incremental snapshot change events.

      What behaviour do you see?

      When we test with user user-defined data types (UDT), we are seeing inconsistent column type names between streaming events and incremental snapshot events.

      For streaming events (insert, update and delete), the column type name is <type_name>. But for incremental snapshotting, we are seeing fully qualified name like "<schema_name>"."<type_name>".

       

      Our investigation:

      For incremental snapshot, when the connector reads the table schema via jdbc driver, it returns either fully qualified name or the name without schema name for user defined types. This depends on the search path. If the db user search path includes the schema name of the UDT, then it returns the short type name (code).

      For streaming change events, seems the Debezium pgoutput message decoder drops the schema name of the UDT fully qualified name from the db query results and only uses the short type name as column typeName (code).

      Do you see the same behaviour using the latest relesead Debezium version?

      (Ideally, also verify with latest Alpha/Beta/CR version)

      We did not test with the latest Debezium version, but we suspect to see the same issue.

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

      The custom data type is "financial_asset" in "expenses_v2" schema and we got the following DEBUG logs during incremental snapshotting:

      cdc-kafka-connect-connect-5f7f88b75-x4b6r cdc-kafka-connect-connect {"@timestamp":"2022-08-31T00:26:54.490Z","source_host":"cdc-kafka-connect-connect-5f7f88b75-x4b6r","file":"TableSchemaBuilder.java","method":"addField","level":"DEBUG","line_number":"418","thread_name":"debezium-postgresconnector-expenses_v2-change-event-source-coordinator","@version":1,"logger_name":"io.debezium.relational.TableSchemaBuilder","message":"- field 'amount' (BYTES) from column amount \"expenses_v2\".\"financial_asset\"(2147483647, 0) DEFAULT VALUE NULL","class":"io.debezium.relational.TableSchemaBuilder","mdc":{"dbz.connectorContext":"streaming","dbz.connectorType":"Postgres","connector.context":"[expenses-v2-debezium-connector-v2|task-0] ","dbz.taskId":"0","dbz.connectorName":"expenses_v2"}}
      cdc-kafka-connect-connect-5f7f88b75-x4b6r cdc-kafka-connect-connect {"@timestamp":"2022-08-31T00:26:54.491Z","source_host":"cdc-kafka-connect-connect-5f7f88b75-x4b6r","file":"TableSchemaBuilder.java","method":"addField","level":"DEBUG","line_number":"418","thread_name":"debezium-postgresconnector-expenses_v2-change-event-source-coordinator","@version":1,"logger_name":"io.debezium.relational.TableSchemaBuilder","message":"- field 'original_amount' (BYTES) from column original_amount \"expenses_v2\".\"financial_asset\"(2147483647, 0) DEFAULT VALUE NULL","class":"io.debezium.relational.TableSchemaBuilder","mdc":{"dbz.connectorContext":"streaming","dbz.connectorType":"Postgres","connector.context":"[expenses-v2-debezium-connector-v2|task-0] ","dbz.taskId":"0","dbz.connectorName":"expenses_v2"}}
      cdc-kafka-connect-connect-5f7f88b75-x4b6r cdc-kafka-connect-connect {"@timestamp":"2022-08-31T00:26:54.491Z","source_host":"cdc-kafka-connect-connect-5f7f88b75-x4b6r","file":"TableSchemaBuilder.java","method":"addField","level":"DEBUG","line_number":"418","thread_name":"debezium-postgresconnector-expenses_v2-change-event-source-coordinator","@version":1,"logger_name":"io.debezium.relational.TableSchemaBuilder","message":"- field 'billing_amount' (BYTES) from column billing_amount \"expenses_v2\".\"financial_asset\"(2147483647, 0) DEFAULT VALUE NULL","class":"io.debezium.relational.TableSchemaBuilder","mdc":{"dbz.connectorContext":"streaming","dbz.connectorType":"Postgres","connector.context":"[expenses-v2-debezium-connector-v2|task-0] ","dbz.taskId":"0","dbz.connectorName":"expenses_v2"}}
      cdc-kafka-connect-connect-5f7f88b75-x4b6r cdc-kafka-connect-connect {"@timestamp":"2022-08-31T00:26:54.491Z","source_host":"cdc-kafka-connect-connect-5f7f88b75-x4b6r","file":"TableSchemaBuilder.java","method":"addField","level":"DEBUG","line_number":"418","thread_name":"debezium-postgresconnector-expenses_v2-change-event-source-coordinator","@version":1,"logger_name":"io.debezium.relational.TableSchemaBuilder","message":"- field 'budget_amount' (BYTES) from column budget_amount \"expenses_v2\".\"financial_asset\"(2147483647, 0) DEFAULT VALUE NULL","class":"io.debezium.relational.TableSchemaBuilder","mdc":{"dbz.connectorContext":"streaming","dbz.connectorType":"Postgres","connector.context":"[expenses-v2-debezium-connector-v2|task-0] ","dbz.taskId":"0","dbz.connectorName":"expenses_v2"}}

      How to reproduce the issue using our tutorial deployment?

      We use a test database "db1" to reproduce the issue.

      database setup:

      -- connect with Postgres admin user
      CREATE SCHEMA db1;
      CREATE TABLE IF NOT EXISTS db1.debezium_heartbeat (id VARCHAR(255) NOT NULL PRIMARY KEY, ts TIMESTAMP NOT NULL);
      CREATE TABLE IF NOT EXISTS db1.debezium_signals (id VARCHAR(42) PRIMARY KEY, type VARCHAR(32) NOT NULL, data VARCHAR(2048) NULL);
      
      CREATE PUBLICATION cdc_dbz FOR ALL TABLES;
      CREATE TYPE db1.financial_asset AS (quantity NUMERIC, instrument_code TEXT);
      CREATE TABLE db1.table1 (id VARCHAR PRIMARY KEY, amount db1.financial_asset);

      create connector (we implemented our custom converter "brex.debezium.extensions.converters.FinancialAssetConverter" and added it to the connect image):

      # use Postgres admin user for the connector
      curl --location --request POST 'localhost:8083/connectors' \
      --header 'Accept: application/json' \
      --header 'Content-Type: application/json' \
      --data-raw '{
        "name":"cdc_connector_fa1",
        "config":{
          "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
          "database.hostname": "host.docker.internal",
          "database.port": "5432",
          "database.dbname": "db1",
          "database.user": "postgres",
          "database.password": <database_password>,
          "database.server.name": "db1",
          "slot.name": "cdc_debezium_test",
          "plugin.name": "pgoutput",
          "publication.name": "cdc_dbz",
          "table.include.list": "db1.table1,db1.table2,db1.debezium_signals",
          "signal.data.collection": "db1.debezium_signals",
          "heartbeat.interval.ms": "3000",
          "heartbeat.action.query": "INSERT INTO db1.debezium_heartbeat (id, ts) VALUES ('\''cdc'\'', now()) ON CONFLICT (id) DO UPDATE SET ts = now();",
          "snapshot.mode": "never",
          "key.converter": "org.apache.kafka.connect.json.JsonConverter",
          "key.converter.schemas.enable": "false",
          "value.converter": "io.confluent.connect.avro.AvroConverter",
          "value.converter.schema.registry.url": "http://host.docker.internal:8081",
          "topic.creation.default.replication.factor": 1,
          "topic.creation.default.partitions": 1,
          "topic.creation.default.cleanup.policy": "compact",
          "transforms": "route",
          "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
          "transforms.route.regex": ".*",
          "transforms.route.replacement": "cdc_dbz.$0.v1",
          "converters": "financialAsset",
          "financialAsset.column.type.name": "financial_asset",
          "financialAsset.schema.name": "brex.debezium.extensions.data.FinancialAsset",
          "financialAsset.type": "brex.debezium.extensions.converters.FinancialAssetConverter",
          "include.unknown.datatypes": true
        }
      }' 

      create streaming events and incremental snapshotting:

      -- connect with Postgres admin user
      -- create insert events
      INSERT INTO db1.table1 (id, amount) SELECT '01', '(1.5,usd)';
      -- start incremental snapshot
      INSERT INTO db1.debezium_signals (id, type, data) VALUES ('snapshot-test01', 'execute-snapshot', '{"data-collections": ["db1.table1"]}');

      Please let us know if you need any other information

      Feature request or enhancement

      For feature requests or enhancements, provide this information, please:

      Which use case/requirement will be addressed by the proposed feature?

      <Your answer>

      Implementation ideas (optional)

      <Your answer>

      Attachments

        Activity

          People

            Unassigned Unassigned
            yannickzj Kevin Zhao (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: