Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-121

Control how Debezium connectors maps tables to topics for sharding and other use cases

    XMLWordPrintable

Details

    Description

      In most business systems the tables in MySQL are sharded. For convenient analysis, the shard table needs te be merged.

      Here is my business scenes:

      The table name in MySQL follow the rule as below:

      {base_table}_{tenant}_{YYYYMM}
      {base_table}_{YYYYMM}
      {base_table}_{seqno}
      {base_table}_{tenant}_{seqno}_{YYYYDD}
      {base_table}_{tenant}_{YYYYDD}_{seqno}
      

      {base_table} is a base table name for a business domain, usually it only contain letter.
      {Tenant} is a identify for support multi-tenancy.

      For business analysis, we need merger this shard tables into one table. but for support multi-tenancy we need store {tenant} in Kafka Topic.

      Actually we merged shard table by using regex now. Merging shard table into one Kafka Topic by modifying TopicSelector will be very convenient . In order to support more cases, it needs to add a configuration to support merge rule according to the regex.

      Thank you "Randall Hauch" for idea.

      in your particular case, we need to consider what happens with the Kafka Connect schemas for each of the tables that belong to the shards. Remember that the connector uses the name of the source table in the schema’s name, so which table name should be used? The actual source table name (e.g., {baseTable}{YYYMM}) or the logical table name (e.g., {baseTable})?
      The first thing that comes to mind for me is that it might be useful to use the source table name - this helps distinguish the events that come from the different shards and it actually allows each shard table’s schema to evolve differently. The latter is very important, because unless you ensure that the shard tables are altered at the same time in a single transaction, it would be possible for the connector to see these alterations at slightly different times relative to each other, and that might break the connector.
      This approach would not only be more robust, it would also be really easy to implement since it literally would be a new TopicSelector implementation. The connector’s internal schema model (how it represents the structure of each table by parsing DDL, and used to generate the Kafka Connect schemas and interpret the binlog’s row events) would still use the actual source table names.

      Attachments

        Issue Links

          Activity

            People

              dasl_jira David Leibovic (Inactive)
              renwu58_jira RenZhu Zhang (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: