Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-254

Binary fields with trailing "00" are truncated

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • 0.5.1
    • 0.4
    • mysql-connector
    • None

      We recently observed what appears to be a pretty nasty bug in either Debezium or Shyiko's binlog library. Our data quality checkers noticed that some of our rows have what appear to be corrupt binary fields.

      An example row follows. The lifecycle of the row is:

      1. Row is inserted and processed via binlog
      2. Row is updated and processed via binlog
      3. Row is re-bootstrapped via JDBC DBZ snapshot (upon determining the corruption in the row)
      # initial insert (binlog)
      {"before":null,"after":{"file_uuid":"ZRrtCDkPSJOy8TaSPnt0","server_file_path":"***","state":"1","service_timestamp":"1493991133000","modify_time":"1493991601","create_time":"1493991601","server_id":"***"},"source":{"name":"***","server_id":"4132478999","ts_sec":"1493991601","gtid":"***:1284003","file":"mysql-bin.000086","pos":"504068621","row":"0","snapshot":null,"thread":"177217","db":"***","table":"response_files"},"op":"c","ts_ms":"1493991601989","kafkaData":{"topic":"***","partition":"1","offset":"1365","insertTime":"1493991602"}}
      
      # update (binlog)
      {"before":{"file_uuid":"ZRrtCDkPSJOy8TaSPnt0","server_file_path":"***","state":"1","service_timestamp":"1493991133000","modify_time":"1493991601","create_time":"1493991601","server_id":"***"},"after":{"file_uuid":"ZRrtCDkPSJOy8TaSPnt0","server_file_path":"***","state":"2","service_timestamp":"1493991133000","modify_time":"1493991604","create_time":"1493991601","server_id":"***"},"source":{"name":"***","server_id":"4132478999","ts_sec":"1493991604","gtid":"***:1284006","file":"mysql-bin.000086","pos":"504340610","row":"0","snapshot":null,"thread":"177217","db":"***","table":"response_files"},"op":"u","ts_ms":"1493991604032","kafkaData":{"topic":"***","partition":"1","offset":"1366","insertTime":"1493991604.038"}}
      
      # re-snapshot (JDBC)
      {"before":null,"after":{"file_uuid":"ZRrtCDkPSJOy8TaSPnt0AA==","server_file_path":"***","state":"2","service_timestamp":"1493991133000","modify_time":"1493991604","create_time":"1493991601","server_id":"***"},"source":{"name":"***","server_id":"0","ts_sec":"0","gtid":null,"file":"mysql-bin.000141","pos":"997974085","row":"0","snapshot":"true","thread":null,"db":"***","table":"response_files"},"op":"c","ts_ms":"1494884115072","kafkaData":{"topic":"***","partition":"0","offset":"1623","insertTime":"1494884126.866"}}
      

      Note the file_uuid fields. In the initial binlog-based insert/update, they're ZRrtCDkPSJOy8TaSPnt0. In the JDBC-based snapshot, the file_uuid is ZRrtCDkPSJOy8TaSPnt0AA==. We went to the source DB, and determined that the file UUID is the full 16 bytes. As a normally-formatted UUID, the value is: 651aed08-390f-4893-b2f1-36923e7b7400. Note the trailing double zero. This trailing double zero is what's getting truncated in the binlog path.

      We have reviewed our DB, and determined this issue to be widespread. It seems that at LEAST any UUID with a trailing 0 results in this truncation. We have not investigated whether binary fields with multiple 0's at the end all get truncated, but I suspect it's the case as well.

      NOTE: the schema for the table is:

      field               type         null  key  default
      file_uuid           binary(16)   NO    PRI		
      server_file_path    varchar(255) NO	
      state               int(11)	     NO	
      service_timestamp   bigint(20)	 NO	
      modify_time         timestamp	 NO         CURRENT_TIMESTAMP	
      create_time         timestamp	 NO         CURRENT_TIMESTAMP	
      server_id           varchar(64)	 NO    PRI		
      

            gunnar.morling Gunnar Morling
            criccomini Chris Riccomini (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: