-
Enhancement
-
Resolution: Done
-
Major
-
1.9.0.Final
-
False
-
None
-
False
-
Medium
Bug report
Debezium version : 1.9
Captured database version:Sql Server 2008
Problem description:
debezium is automatically replaced with _ when the special characters are the same, but in fact our special characters do exist in some limited sense.
For example:
JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value
JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典修正.Value
'检查排序编码' and '检查字典修正' is special but in chinese they're not the same.
After replacing , it becomes CT______ and CT______ , this can cause table name conflicts.
Therefore, I hope that special characters can be enriched. For example, when the table name of a certain language is made, special transcoding can be done to change it into a name that conforms to Avro schema name
What behaviour do you expect?
JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value
JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value
What behaviour do you see?
JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value
JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value
Do you see the same behaviour using the latest relesead Debezium version?
Yes
Do you have the connector logs, ideally from start till finish?
(You might be asked later to provide DEBUG/TRACE level log)
[2022-10-19 07:57:33,816] WARN Topic 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码' name isn't a valid topic name, replacing it with 'JHDL-TOPIC-CDC4S-LOG.dbo.CT______'. (io.debezium.schema.TopicSelector$TopicNameSanitizer:130)
[2022-10-19 07:57:33,816] ERROR The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value' (io.debezium.util.SchemaNameAdjuster:165)
[2022-10-19 07:57:33,837] INFO Snapshot - Final stage (io.debezium.pipeline.source.AbstractSnapshotChangeEventSource:88)
[2022-10-19 07:57:33,838] INFO Removing locking timeout (io.debezium.connector.sqlserver.SqlServerSnapshotChangeEventSource:239)
[2022-10-19 07:57:33,840] ERROR Producer failure (io.debezium.pipeline.ErrorHandler:35)
io.debezium.DebeziumException: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85)
at io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:155)
at io.debezium.connector.sqlserver.SqlServerChangeEventSourceCoordinator.executeChangeEventSources(SqlServerChangeEventSourceCoordinator.java:71)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createSchemaChangeEventsForTables$2(RelationalSnapshotChangeEventSource.java:281)
at io.debezium.pipeline.EventDispatcher.dispatchSchemaChangeEvent(EventDispatcher.java:302)
at io.debezium.relational.RelationalSnapshotChangeEventSource.createSchemaChangeEventsForTables(RelationalSnapshotChangeEventSource.java:276)
at io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:119)
at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:76)
... 8 more
Caused by: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
at io.debezium.util.SchemaNameAdjuster.lambda$create$1(SchemaNameAdjuster.java:151)
at io.debezium.util.SchemaNameAdjuster.lambda$create$2(SchemaNameAdjuster.java:168)
at io.debezium.util.SchemaNameAdjuster$ReplacementOccurred.lambda$firstTimeOnly$0(SchemaNameAdjuster.java:103)
at io.debezium.util.SchemaNameAdjuster.validFullname(SchemaNameAdjuster.java:331)
at io.debezium.util.SchemaNameAdjuster.lambda$create$6(SchemaNameAdjuster.java:201)
at io.debezium.relational.TableSchemaBuilder.create(TableSchemaBuilder.java:134)
at io.debezium.relational.RelationalDatabaseSchema.buildAndRegisterSchema(RelationalDatabaseSchema.java:135)
at io.debezium.connector.sqlserver.SqlServerDatabaseSchema.applySchemaChange(SqlServerDatabaseSchema.java:53)
at io.debezium.pipeline.EventDispatcher$SchemaChangeEventReceiver.schemaChangeEvent(EventDispatcher.java:539)
at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createSchemaChangeEventsForTables$2(RelationalSnapshotChangeEventSource.java:278)
... 12 more
[2022-10-19 07:57:33,841] INFO Connected metrics set to 'false' (io.debezium.pipeline.ChangeEventSourceCoordinator:236)
[2022-10-19 07:57:33,882] ERROR WorkerSourceTask{id=sqlserver-connector-20-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:195)
org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.
at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:50)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:116)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.debezium.DebeziumException: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85)
at io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:155)
at io.debezium.connector.sqlserver.SqlServerChangeEventSourceCoordinator.executeChangeEventSources(SqlServerChangeEventSourceCoordinator.java:71)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)
... 5 more
Caused by: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createSchemaChangeEventsForTables$2(RelationalSnapshotChangeEventSource.java:281)
at io.debezium.pipeline.EventDispatcher.dispatchSchemaChangeEvent(EventDispatcher.java:302)
at io.debezium.relational.RelationalSnapshotChangeEventSource.createSchemaChangeEventsForTables(RelationalSnapshotChangeEventSource.java:276)
at io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:119)
at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:76)
... 8 more
Caused by: org.apache.kafka.connect.errors.ConnectException: The Kafka Connect schema name 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value' is not a valid Avro schema name and its replacement 'JHDL_TOPIC_CDC4S_LOG.dbo.CT______.Value' conflicts with another different schema 'JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典_修.Value'
at io.debezium.util.SchemaNameAdjuster.lambda$create$1(SchemaNameAdjuster.java:151)
at io.debezium.util.SchemaNameAdjuster.lambda$create$2(SchemaNameAdjuster.java:168)
at io.debezium.util.SchemaNameAdjuster$ReplacementOccurred.lambda$firstTimeOnly$0(SchemaNameAdjuster.java:103)
at io.debezium.util.SchemaNameAdjuster.validFullname(SchemaNameAdjuster.java:331)
at io.debezium.util.SchemaNameAdjuster.lambda$create$6(SchemaNameAdjuster.java:201)
at io.debezium.relational.TableSchemaBuilder.create(TableSchemaBuilder.java:134)
at io.debezium.relational.RelationalDatabaseSchema.buildAndRegisterSchema(RelationalDatabaseSchema.java:135)
at io.debezium.connector.sqlserver.SqlServerDatabaseSchema.applySchemaChange(SqlServerDatabaseSchema.java:53)
at io.debezium.pipeline.EventDispatcher$SchemaChangeEventReceiver.schemaChangeEvent(EventDispatcher.java:539)
at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createSchemaChangeEventsForTables$2(RelationalSnapshotChangeEventSource.java:278)
... 12 more
[2022-10-19 07:57:33,884] INFO Stopping down connector (io.debezium.connector.common.BaseSourceTask:238)
[2022-10-19 07:57:33,887] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:956)
[2022-10-19 07:57:33,889] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:956)
[2022-10-19 07:57:33,889] INFO [Producer clientId=JHDL-TOPIC-CDC4S-LOG-dbhistory] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1249)
[2022-10-19 07:57:33,891] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics:659)
[2022-10-19 07:57:33,891] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics:663)
[2022-10-19 07:57:33,891] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics:669)
[2022-10-19 07:57:33,892] INFO App info kafka.producer for JHDL-TOPIC-CDC4S-LOG-dbhistory unregistered (org.apache.kafka.common.utils.AppInfoParser:83)
[2022-10-19 07:57:33,892] INFO [Producer clientId=connector-producer-sqlserver-connector-20-0] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1249)
[2022-10-19 07:57:33,894] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics:659)
[2022-10-19 07:57:33,894] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics:663)
[2022-10-19 07:57:33,894] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics:669)
[2022-10-19 07:57:33,895] INFO App info kafka.producer for connector-producer-sqlserver-connector-20-0 unregistered (org.apache.kafka.common.utils.AppInfoParser:83)
[2022-10-19 08:00:07,706] INFO [AdminClient clientId=adminclient-8] Node -2 disconnected. (org.apache.kafka.clients.NetworkClient:935)
How to reproduce the issue using our tutorial deployment?
First you need two special characters of the same length for the table name.
For example:
JHDL-TOPIC-CDC4S-LOG.dbo.CT检查排序编码.Value
JHDL-TOPIC-CDC4S-LOG.dbo.CT检查字典修正.Value
enhancement
- This demand is solved
This feature resolves schema name conflicts when special characters are replaced with _ lines. I don't think this one-size-fits-all approach is a good idea. For example, I have encountered a schema name with the same special character length after replacing it
- My implementation method
It is suggested to enhance the processing of special characters, single characters can be replaced by _ processing, long characters can be encoded as a whole into additional values in accordance with Avro specification, for mapping
- links to
-
RHEA-2023:120698 Red Hat build of Debezium 2.3.4 release