We are testing Debezium on some fairly large tables (a couple of hundred million rows), and our production databases are even bigger. We are noticing that Debezium seems to hang for quite some time before it starts the actual snapshot of each table.
After a couple of threaddumps the hang seems to be caused by the SELECT COUNT(*) FROM <table> in io.debezium.connector.mysql.SnapshotReader.execute. This kind of query can be very slow for large InnoDB tables.
It would be great to have a configuration option to always use the streaming resultset (and skip the select count query), or optimize this to get an approximate table size faster.
For example, MySQL has a `show table status like <tableName>` that returns an approximate row count, perhaps that would be good enough for this use case.