Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Done
Priority: Major
Fix Version/s: 0.5.1
Affects Version/s: 0.3.1
Component/s: mongodb-connector, mysql-connector, postgresql-connector
Labels:
None

Git Pull Request:
https://github.com/debezium/debezium/pull/124, https://github.com/debezium/debezium/pull/129, https://github.com/debezium/debezium/pull/211, https://github.com/debezium/debezium/pull/213

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In most business systems the tables in MySQL are sharded. For convenient analysis, the shard table needs te be merged.

Here is my business scenes:

The table name in MySQL follow the rule as below:

{base_table}_{tenant}_{YYYYMM}
{base_table}_{YYYYMM}
{base_table}_{seqno}
{base_table}_{tenant}_{seqno}_{YYYYDD}
{base_table}_{tenant}_{YYYYDD}_{seqno}

{base_table} is a base table name for a business domain, usually it only contain letter.
{Tenant} is a identify for support multi-tenancy.

For business analysis, we need merger this shard tables into one table. but for support multi-tenancy we need store {tenant} in Kafka Topic.

Actually we merged shard table by using regex now. Merging shard table into one Kafka Topic by modifying TopicSelector will be very convenient . In order to support more cases, it needs to add a configuration to support merge rule according to the regex.

Thank you "Randall Hauch" for idea.

in your particular case, we need to consider what happens with the Kafka Connect schemas for each of the tables that belong to the shards. Remember that the connector uses the name of the source table in the schema’s name, so which table name should be used? The actual source table name (e.g., {baseTable}{YYYMM}) or the logical table name (e.g., {baseTable})?
The first thing that comes to mind for me is that it might be useful to use the source table name - this helps distinguish the events that come from the different shards and it actually allows each shard table’s schema to evolve differently. The latter is very important, because unless you ensure that the shard tables are altered at the same time in a single transaction, it would be possible for the connector to see these alterations at slightly different times relative to each other, and that might break the connector.
This approach would not only be more robust, it would also be really easy to implement since it literally would be a new TopicSelector implementation. The connector’s internal schema model (how it represents the structure of each table by parsing DDL, and used to generate the Kafka Connect schemas and interpret the binlog’s row events) would still use the actual source table names.

is related to

DBZ-325 Cover ByLogicalTableRouter SMT in reference documentation

Closed

DBZ-159 Control how the Postgres connector maps tables to topics

Closed

Assignee:: David Leibovic (Inactive)

Reporter:: RenZhu Zhang (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2016/09/14 3:58 AM

Updated:: 2022/09/09 7:09 AM

Resolved:: 2017/04/04 5:55 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates