-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
None
-
False
-
False
-
1.4
-
Undefined
-
-
Connector can't reconnect to Postgres to its replication slot, while its free.
Error:
2021-01-19 21:48:43,824 INFO Postgres|ccp_ocm.dev|postgres-connector-task Searching for WAL resume position [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource] 2021-01-19 21:48:48,883 INFO Postgres|ccp_ocm.dev|postgres-connector-task First LSN 'LSN{E0/EE0B8CC8}' received [io.debezium.connector.postgresql.connection.WalPositionLocator] 2021-01-19 21:48:48,883 INFO Postgres|ccp_ocm.dev|postgres-connector-task WAL resume position 'LSN{E0/EE0B8CC8}' discovered [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource] 2021-01-19 21:48:48,892 INFO Postgres|ccp_ocm.dev|postgres-connector-task Initializing PgOutput logical decoder publication [io.debezium.connector.postgresql.connection.PostgresReplicationConnection] 2021-01-19 21:48:48,895 ERROR Postgres|ccp_ocm.dev|postgres-connector-task Producer failure [io.debezium.pipeline.ErrorHandler] io.debezium.DebeziumException: Failed to start replication stream at LSN{E0/EE0B8C90}; when setting up multiple connectors for the same database host, please make sure to use a distinct replication slot name for each. at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startStreaming(PostgresReplicationConnection.java:308) at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:143) at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:140) at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:113) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.postgresql.util.PSQLException: ERROR: replication slot "cdc_ocm_slot" is active for PID 6899 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532) at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1200) at org.postgresql.core.v3.QueryExecutorImpl.startCopy(QueryExecutorImpl.java:882) at org.postgresql.core.v3.replication.V3ReplicationProtocol.initializeReplication(V3ReplicationProtocol.java:58) at org.postgresql.core.v3.replication.V3ReplicationProtocol.startLogical(V3ReplicationProtocol.java:42) at org.postgresql.replication.fluent.ReplicationStreamBuilder$1.start(ReplicationStreamBuilder.java:38) at org.postgresql.replication.fluent.logical.LogicalStreamBuilder.start(LogicalStreamBuilder.java:37) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startPgReplicationStream(PostgresReplicationConnection.java:569) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.createReplicationStream(PostgresReplicationConnection.java:406) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startStreaming(PostgresReplicationConnection.java:301) ... 8 more
When i try restart task, it fails with same error. Only PID changes.
It happens allways after connection lost to Postgres or after restart/recreate debezium container with the same connector or restart task.
In postgres there is no connected clients to this slot:
postgres=# SELECT * FROM pg_replication_slots where slot_name='cdc_ocm_slot';
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
--------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
cdc_ocm_slot | pgoutput | logical | 415842 | ccp_ocm | f | f | | | 94650945 | DF/1921AAE8 | DF/19220DC8
(1 row)
I found only one way to fix it: delete replaction slot in postgres manually with select pg_drop_replication_slot('cdc_ocm_slot') and then restart task.
I can start streaming from this slot manually with and see WAL records in console:
pg_recvlogical --slot=cdc_ocm_slot -v -d ccp_ocm --host db-hostname --username user -o proto_version=1 -o publication_names='dbz_publication' --file - --start
I noticed, that there are 2 PID apears in replication slot after restart consistently: first for a small amount time, and second during all work. And when task failed, it says that slot is busy with first PID.
One restart (when slot is free) then 2 PIDs:
postgres=# SELECT * FROM pg_replication_slots where slot_name='cdc_ocm_slot'; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- cdc_ocm_slot | pgoutput | logical | 415842 | ccp_ocm | f | f | | | 96647723 | E0/EEB268C8 | E0/EEB26900 (1 row) postgres=# SELECT * FROM pg_replication_slots where slot_name='cdc_ocm_slot'; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- cdc_ocm_slot | pgoutput | logical | 415842 | ccp_ocm | f | t | 7330 | | 96647723 | E0/EEB268C8 | E0/EEB26900 (1 row) postgres=# SELECT * FROM pg_replication_slots where slot_name='cdc_ocm_slot'; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- cdc_ocm_slot | pgoutput | logical | 415842 | ccp_ocm | f | t | 7332 | | 96647723 | E0/EEB268C8 | E0/EEB26900 (1 row) postgres=# SELECT * FROM pg_replication_slots where slot_name='cdc_ocm_slot'; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- cdc_ocm_slot | pgoutput | logical | 415842 | ccp_ocm | f | t | 7332 | | 96647723 | E0/EEB268C8 | E0/EEB26900 (1 row)
- duplicates
-
DBZ-2952 Postgres connector config validation fails because current connector is occupying replication slot
- Closed