For Postgres, the Incremental Snapshots are not performant for large tables with large index. The query lacks some attributes for Postgres to choose optimal query plan. For example, consider the following scenario.
Configure additionalCondition for execute-snapshot as follows for
For the initial chunk it generates the query as follows.
The query planner may decide to perform scan on `my_table_pkey` index even though scanning the index `index_my_table_updated_at` is optimal for this. If the my_table_pkey index is too big (100GB), this may take a while to finish or even time out.
One possible solution is to calculate the min and max PK range and use that PK range even for the initial chunk.
For example min and max PK range can be calculated as
initial chunk query
and subsequent chunks can be as follows. (assuming last_id is the id of the row that we last wrote)
This should work well if Debezium chooses my_table_pkey index or index_my_table_updated_at. This will help for the use cases where we want to backfill recently updated rows.
Calculate min and max key for the incremental snapshot here and use the min/max range for all the chunks.