We recently observed what appears to be a pretty nasty bug in either Debezium or Shyiko's binlog library. Our data quality checkers noticed that some of our rows have what appear to be corrupt binary fields.
An example row follows. The lifecycle of the row is:
- Row is inserted and processed via binlog
- Row is updated and processed via binlog
- Row is re-bootstrapped via JDBC DBZ snapshot (upon determining the corruption in the row)
Note the file_uuid fields. In the initial binlog-based insert/update, they're ZRrtCDkPSJOy8TaSPnt0. In the JDBC-based snapshot, the file_uuid is ZRrtCDkPSJOy8TaSPnt0AA==. We went to the source DB, and determined that the file UUID is the full 16 bytes. As a normally-formatted UUID, the value is: 651aed08-390f-4893-b2f1-36923e7b7400. Note the trailing double zero. This trailing double zero is what's getting truncated in the binlog path.
We have reviewed our DB, and determined this issue to be widespread. It seems that at LEAST any UUID with a trailing 0 results in this truncation. We have not investigated whether binary fields with multiple 0's at the end all get truncated, but I suspect it's the case as well.
NOTE: the schema for the table is: