MySQL treats the NCHAR and NVARCHAR data types as synonyms for CHAR/VARCHAR in JDBC terms with UTF8-based collations.
During the snapshot phase of the connector, if a table already exists with such columns, the connector resolves the column data type and JDBC type from the JDBC metadata. The driver returns these columns as a "CHAR" or "VARCHAR', mapping the final value as a STRING data type, but propagating the column type without any indication of the collation of the field.
During the streaming phase, if the table existed during the snapshot where the Table in-memory model was prepared from the JDBC driver metadata, then the streaming phase will emit the value like it was during the snapshot, which in this case is all incorrect.
However, if the table is created after streaming has started and the include/exclude filters allow the table to be captured. In this case, the table's structure and JDBC type metadata are evaluated from the DDL event itself, and the ANTLR parser maps a NCHAR/NVARCHAR to a JDBC type of NCHAR or NVARCHAR. This means that the final event will specify the schema type as STRING and the column type as either NCHAR or NVARCHAR; which is different than if the table had been captured during snapshot.
To reproduce, start the MySQL connector with the table.include.list allowing the "test_nc_table". Then, once the connector has started the streaming phase, create the following table, insert a row into the table, and note the field schemas. Then repeat the process except where the following table is created before the connector starts with an existing row. Then, compare the schemas to the previous schemas when the table was created during streaming, and they differ drastically.
Regardless of when the table is created, data types should be treated consistently.