Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-3969

Strings with binary collation shouldn't be parsed as Types.BINARY by MySqlAntlrDdlParser.

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Done
    • 1.6.2.Final
    • 1.7.0.CR1
    • core-library
    • None

    Description

      Strings with binary collation shouldn't be parsed as Types.BINARY by MySqlAntlrDdlParser.

      As MySQL document charset-binary-collations says:

      The BINARY and VARBINARY data types are distinct from the CHAR BINARY and VARCHAR BINARY data types. For the latter types, the BINARY attribute does not cause the column to be treated as a binary string column. Instead, it causes the binary (_bin) collation for the column character set (or the table default character set if no column character set is specified) to be used, and the column itself stores nonbinary character strings rather than binary byte strings. For example, if the default character set is utf8mb4, CHAR(5) BINARY is treated as CHAR(5) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin. This differs from BINARY(5), which stores 5-byte binary strings that have the binary character set and collation. For information about the differences between the binary collation of the binary character set and the _bin collations of nonbinary character sets, see Section 10.8.5, “The binary Collation Compared to _bin Collations”.

      Nonbinary strings (as stored using the CHAR, VARCHAR, and TEXT data types) have a character set and collation other than binary. A given nonbinary character set can have several collations, each of which defines a particular comparison and sort order for the characters in the set. For most character sets, one of these is the binary collation, indicated by a _bin suffix in the collation name. For example, the binary collation for utf8 and latin1 is named utf8_bin and latin1_bin, respectively. utf8mb4 is an exception that has two binary collations, utf8mb4_bin and utf8mb4_0900_bin; see Section 10.10.1, “Unicode Character Sets”.

      So, columns such as char binary, varchar binary are nonbinary strings type but using binary collation. Parsing them as Types.BINARY may produce incorrect default value.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jiabao-sun Jiabao Sun (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: