-
Bug
-
Resolution: Done
-
Major
-
0.8.2.Final, 0.9.0.Alpha1
-
None
If a row contains columns with TOASTed data and an update to that row does not alter the TOASTed data, Debezium's PostgresSQL Connector unnecessarily refreshes its local cache of table schemas while processing the update. In the worst case, this results in a query to the database for each update event on that table. The overhead is tens to the low hundreds of milliseconds. Multiplied by several thousand or more change events, and the performance hit is obvious. Where we expect an update of, say, 3000 rows to be processed in several seconds, processing instead takes almost 10 minutes.
I have confirmed this issue when using `REPLICA IDENTITY FULL`. This setting guarantees that the `before` record will provide all column values, since the WAL record provides the entire row as the key. This setting is rarely used, as it is inefficient, which explains why this bug has gone undetected. However, there are valid use cases; the issue cannot be ignored. I suspect that the issue exists for other `REPLICA IDENTITY` settings, but I have not tested this. If so (and I am confident it does), this is a serious performance issue that will affect all PostgreSQL users.