- 
    Bug 
- 
    Resolution: Done
- 
    Major 
- 
    0.9.0.CR1
- 
    None
- 
    AWS RDS Postgres 10.6 
 Confluent Platform 5.1.0 (Running on 2 AWS m5a.xlarge instances, with 3 kafka brokers & 2gb heaps, and a connect instance with a 2gb heap)
Overview
During a snapshot phase, when reading from a partitioned postgres table, rows from the parent partition table are written to a topic for a derived table.
As described further in the reproduction notes below, let's create a parent partition named `dbz_repo` and two partition tables named `dbz_repo_1_500` and `dbz_repo_501_1000`. Next we insert random records into the partitioned tables and kick off a debezium snapshot phase.
After the snapshot completes, the `dbz_repo` topic contains 0 records, while the `dbz_repo_1_500` contains N records from the `dbz_repo_1_500` table PLUS M records from the `dbz_repo` table (i.e. an additionally copy of `dbz_repo_1_500` and everything from `dbz_repo_501_1000`). The `dbz_repo_501_1000` is the only correct topic, containing only a single copy of all rows of the source table.
Logs
During snapshotting there are errors related to flush failures, but otherwise look clean:
[2019-01-31 18:40:59,085] ERROR WorkerSourceTask
{id=debezium_repo-0} Failed to flush, timed out while waiting for producer to flush outstanding 29175 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)[2019-01-31 18:40:59,091] ERROR WorkerSourceTask{id=debezium_repo-0}
Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
Comparison output sample
Attached below is a small python script to compare the number of records in the database tables to the number of records in each topic. Here's sample output for a run:
| table | rows_in_db | records_in_topic | diff | 
|---|---|---|---|
| dbz_repo | 2000000 | 0 | 2000000 | 
| dbz_repo_1001_1500 | 500077 | 500077 | 0 | 
| dbz_repo_1_500 | 499596 | 2499596 | -2000000 | 
| dbz_repo_1501_2000 | 499529 | 499529 | 0 | 
| dbz_repo_501_1000 | 500798 | 500798 | 0 |