Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-6231

MySQL treats NCHAR/NVARCHAR differently when processed in snapshot vs streaming phases.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.2.0.Beta1
    • 2.1.3.Final, 2.2.0.Alpha3
    • mysql-connector
    • None
    • False
    • None
    • False

      MySQL treats the NCHAR and NVARCHAR data types as synonyms for CHAR/VARCHAR in JDBC terms with UTF8-based collations.

      During the snapshot phase of the connector, if a table already exists with such columns, the connector resolves the column data type and JDBC type from the JDBC metadata. The driver returns these columns as a "CHAR" or "VARCHAR', mapping the final value as a STRING data type, but propagating the column type without any indication of the collation of the field.

      During the streaming phase, if the table existed during the snapshot where the Table in-memory model was prepared from the JDBC driver metadata, then the streaming phase will emit the value like it was during the snapshot, which in this case is all incorrect.

      However, if the table is created after streaming has started and the include/exclude filters allow the table to be captured. In this case, the table's structure and JDBC type metadata are evaluated from the DDL event itself, and the ANTLR parser maps a NCHAR/NVARCHAR to a JDBC type of NCHAR or NVARCHAR. This means that the final event will specify the schema type as STRING and the column type as either NCHAR or NVARCHAR; which is different than if the table had been captured during snapshot.

      To reproduce, start the MySQL connector with the table.include.list allowing the "test_nc_table". Then, once the connector has started the streaming phase, create the following table, insert a row into the table, and note the field schemas. Then repeat the process except where the following table is created before the connector starts with an existing row. Then, compare the schemas to the previous schemas when the table was created during streaming, and they differ drastically.

      CREATE TABLE test_nc_table ( nc1 nchar, nc2 nchar(5), nc3 nvarchar(25) );
      

      Regardless of when the table is created, data types should be treated consistently.

              Unassigned Unassigned
              ccranfor@redhat.com Chris Cranford
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: