Status: Closed (View Workflow)
Affects Version/s: 1.3.0.CR1
Fix Version/s: 1.4.0.Alpha2
Steps to Reproduce:
Declare Debezium connector to Kafka Connect Wait for snapshot to end Use pgbench to write into the database and trigger the heartbeats Write on other databases Wait at least the heartbeat interval time to ensure the slot LSN is moved forward Do a failover with patronictl or shut down the instance manually
- Declare Debezium connector to Kafka Connect
- Wait for snapshot to end
- Use pgbench to write into the database and trigger the heartbeats
- Write on other databases
- Wait at least the heartbeat interval time to ensure the slot LSN is moved forward
- Do a failover with patronictl or shut down the instance manually
Patroni is an high-availability operator that controls PostgreSQL process lifecycle (initdb, start, stop, promote) and manages replication. It constantly tries to acquire a lock on a distributed data store (DCS) and, when it fails to do so, another node will be promoted by acquiring the same lock somewhere else.
When Patroni is used to manage PostgreSQL failovers and Debezium is connected, a forced switchover is blocked in shutting down state.
No new connection is allowed. The following processes are still alive:
When we kill the debezium backend process from the operating system, the instance is able to shut down and failover completes successfully.
Heartbeat is enabled:
The query is executed regularly:
But the lag never goes down to zero:
The shutdown mode used is "fast" which means it kills all client connections without waiting for them to disconnect. That also kills the heartbeat connection. But not the wal streaming connection.
The bug happens on:
- Patroni 1.6.5
- PostgreSQL 9.6.19
- Debezium 1.3.0.CR1
- wal2json 2.3-1.pgdg90+1
- Kafka 2.6.0
- Debian 9.13
This issue is similar to
DBZ-1727 but it happens on a more recent version.