Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-5743

Support unicode table names in topic names

XMLWordPrintable

      Bug report

      Debezium version : 1.9

      Captured database version:Sql Server 2008 

      Problem description:

      debezium is automatically replaced with _ when the special characters are the same, but in fact our special characters do exist in some limited sense.

      For example:

      JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value
      JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典修正.Value

      '检查排序编码' and '检查字典修正' is special but in chinese they're not the same.

      After replacing ,  it becomes CT______ and CT______ , this can cause table name conflicts.

      Therefore, I hope that special characters can be enriched. For example, when the table name of a certain language is made, special transcoding can be done to change it into a name that conforms to Avro schema name

      What behaviour do you expect?

      JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value
      JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value

      What behaviour do you see?

      JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value
      JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value

      Do you see the same behaviour using the latest relesead Debezium version?

      Yes

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

      [2022-10-19 07:57:33,816] WARN Topic 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码' name isn't a valid topic name, replacing it with 'JHDL-TOPIC-CDC4S-LOG.dbo.CT______'. (io.debezium.schema.TopicSelector$TopicNameSanitizer:130)
      [2022-10-19 07:57:33,816] ERROR The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value' (io.debezium.util.SchemaNameAdjuster:165)
      [2022-10-19 07:57:33,837] INFO Snapshot - Final stage (io.debezium.pipeline.source.AbstractSnapshotChangeEventSource:88)
      [2022-10-19 07:57:33,838] INFO Removing locking timeout (io.debezium.connector.sqlserver.SqlServerSnapshotChangeEventSource:239)
      [2022-10-19 07:57:33,840] ERROR Producer failure (io.debezium.pipeline.ErrorHandler:35)
      io.debezium.DebeziumException: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
          at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85)
          at io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:155)
          at io.debezium.connector.sqlserver.SqlServerChangeEventSourceCoordinator.executeChangeEventSources(SqlServerChangeEventSourceCoordinator.java:71)
          at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
          at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createSchemaChangeEventsForTables$2(RelationalSnapshotChangeEventSource.java:281)
          at io.debezium.pipeline.EventDispatcher.dispatchSchemaChangeEvent(EventDispatcher.java:302)
          at io.debezium.relational.RelationalSnapshotChangeEventSource.createSchemaChangeEventsForTables(RelationalSnapshotChangeEventSource.java:276)
          at io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:119)
          at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:76)
          ... 8 more
      Caused by: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
          at io.debezium.util.SchemaNameAdjuster.lambda$create$1(SchemaNameAdjuster.java:151)
          at io.debezium.util.SchemaNameAdjuster.lambda$create$2(SchemaNameAdjuster.java:168)
          at io.debezium.util.SchemaNameAdjuster$ReplacementOccurred.lambda$firstTimeOnly$0(SchemaNameAdjuster.java:103)
          at io.debezium.util.SchemaNameAdjuster.validFullname(SchemaNameAdjuster.java:331)
          at io.debezium.util.SchemaNameAdjuster.lambda$create$6(SchemaNameAdjuster.java:201)
          at io.debezium.relational.TableSchemaBuilder.create(TableSchemaBuilder.java:134)
          at io.debezium.relational.RelationalDatabaseSchema.buildAndRegisterSchema(RelationalDatabaseSchema.java:135)
          at io.debezium.connector.sqlserver.SqlServerDatabaseSchema.applySchemaChange(SqlServerDatabaseSchema.java:53)
          at io.debezium.pipeline.EventDispatcher$SchemaChangeEventReceiver.schemaChangeEvent(EventDispatcher.java:539)
          at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createSchemaChangeEventsForTables$2(RelationalSnapshotChangeEventSource.java:278)
          ... 12 more
      [2022-10-19 07:57:33,841] INFO Connected metrics set to 'false' (io.debezium.pipeline.ChangeEventSourceCoordinator:236)
      [2022-10-19 07:57:33,882] ERROR WorkerSourceTask{id=sqlserver-connector-20-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:195)
      org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.
          at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:50)
          at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:116)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: io.debezium.DebeziumException: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
          at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85)
          at io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:155)
          at io.debezium.connector.sqlserver.SqlServerChangeEventSourceCoordinator.executeChangeEventSources(SqlServerChangeEventSourceCoordinator.java:71)
          at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)
          ... 5 more
      Caused by: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
          at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createSchemaChangeEventsForTables$2(RelationalSnapshotChangeEventSource.java:281)
          at io.debezium.pipeline.EventDispatcher.dispatchSchemaChangeEvent(EventDispatcher.java:302)
          at io.debezium.relational.RelationalSnapshotChangeEventSource.createSchemaChangeEventsForTables(RelationalSnapshotChangeEventSource.java:276)
          at io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:119)
          at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:76)
          ... 8 more
      Caused by: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
          at io.debezium.util.SchemaNameAdjuster.lambda$create$1(SchemaNameAdjuster.java:151)
          at io.debezium.util.SchemaNameAdjuster.lambda$create$2(SchemaNameAdjuster.java:168)
          at io.debezium.util.SchemaNameAdjuster$ReplacementOccurred.lambda$firstTimeOnly$0(SchemaNameAdjuster.java:103)
          at io.debezium.util.SchemaNameAdjuster.validFullname(SchemaNameAdjuster.java:331)
          at io.debezium.util.SchemaNameAdjuster.lambda$create$6(SchemaNameAdjuster.java:201)
          at io.debezium.relational.TableSchemaBuilder.create(TableSchemaBuilder.java:134)
          at io.debezium.relational.RelationalDatabaseSchema.buildAndRegisterSchema(RelationalDatabaseSchema.java:135)
          at io.debezium.connector.sqlserver.SqlServerDatabaseSchema.applySchemaChange(SqlServerDatabaseSchema.java:53)
          at io.debezium.pipeline.EventDispatcher$SchemaChangeEventReceiver.schemaChangeEvent(EventDispatcher.java:539)
          at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createSchemaChangeEventsForTables$2(RelationalSnapshotChangeEventSource.java:278)
          ... 12 more
      [2022-10-19 07:57:33,884] INFO Stopping down connector (io.debezium.connector.common.BaseSourceTask:238)
      [2022-10-19 07:57:33,887] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:956)
      [2022-10-19 07:57:33,889] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:956)
      [2022-10-19 07:57:33,889] INFO [Producer clientId=JHDL-TOPIC-CDC4S-LOG-dbhistory] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1249)
      [2022-10-19 07:57:33,891] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics:659)
      [2022-10-19 07:57:33,891] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics:663)
      [2022-10-19 07:57:33,891] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics:669)
      [2022-10-19 07:57:33,892] INFO App info kafka.producer for JHDL-TOPIC-CDC4S-LOG-dbhistory unregistered (org.apache.kafka.common.utils.AppInfoParser:83)
      [2022-10-19 07:57:33,892] INFO [Producer clientId=connector-producer-sqlserver-connector-20-0] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1249)
      [2022-10-19 07:57:33,894] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics:659)
      [2022-10-19 07:57:33,894] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics:663)
      [2022-10-19 07:57:33,894] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics:669)
      [2022-10-19 07:57:33,895] INFO App info kafka.producer for connector-producer-sqlserver-connector-20-0 unregistered (org.apache.kafka.common.utils.AppInfoParser:83)
      [2022-10-19 08:00:07,706] INFO [AdminClient clientId=adminclient-8] Node -2 disconnected. (org.apache.kafka.clients.NetworkClient:935)

      How to reproduce the issue using our tutorial deployment?

      First you need two special characters of the same length for the table name.

      For example:

      JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value
      JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典修正.Value

      enhancement

      • This demand is solved

      This feature resolves schema name conflicts when special characters are replaced with _ lines. I don't think this one-size-fits-all approach is a good idea. For example, I have encountered a schema name with the same special character length after replacing it

      • My implementation method

      It is suggested to enhance the processing of special characters, single characters can be replaced by _ processing, long characters can be encoded as a whole into additional values in accordance with Avro specification, for mapping

              Unassigned Unassigned
              simonchou12138 民帅 周 (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: