Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-8722

DB Server error didn't cause task failure, and continued to reconnect for days without failing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 3.1.0.CR1
    • None
    • postgresql-connector
    • None
    • False
    • None
    • False

      What Debezium connector do you use and what version?

      debezium-connector-postgres 3.0.4.Final

      What is the connector configuration?

      {
            "name":  "connector_name",
            "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
            "plugin.name" : "pgoutput",
            "tasks.max" : "1",
            "slot.name" : "slot_name",
            "publication.name": "publication_name",
            "publication.autocreate.mode" = "disabled",
            "topic.prefix": "prefix",
            "table.include.list": "table1,table2",
            "snapshot.mode" : "never",
            "signal.data.collection": "signals_table",
            "database.sslmode": "require",
            "database.hostname": "db-host-name",
            "database.port": "5432",
            "database.dbname":  "db_name",
            "database.user":  "user",
            "database.password": "password",
            "key.converter" : "io.confluent.connect.avro.AvroConverter",
            "key.converter.enhanced.avro.schema.support": true
            "key.converter.schema.registry.url": "schema-registyr-url",
            "key.converter.basic.auth.credentials.source": "USER_INFO"
            "key.converter.basic.auth.user.info": "key:pass",
            "value.converter" : "io.confluent.connect.avro.AvroConverter",
            "value.converter.enhanced.avro.schema.support": true,
            "value.converter.schema.registry.url": "schema-registry-url"
            "value.converter.basic.auth.credentials.source" = "USER_INFO",
            "value.converter.basic.auth.user.info": "key:pass",
            "heartbeat.interval.ms": "60000",
            "topic.heartbeat.prefix": "prefix",
            "incremental.snapshot.chunk.size": "4000",
            "column.exclude.list": "somecolumns",
            "skipped.operations": "t",
            "errors.tolerance" : "none",
            "errors.log.enable": "true"
      }
      

      What is the captured database version and mode of deployment?

      AWS RDS PostgreSQL 14

      What behavior do you expect?

      When there is a connection or server-side non-recoverable error, Debezium should log the error, and mark the task as failed.

      What behavior do you see?

      Due to what we assume is a DB server side bug, Debezium couldn't start streaming data from PostgreSQL. The error was:

      Producer failure
      org.postgresql.util.PSQLException: ERROR: could not create file "pg_replslot/slot_name/state.tmp": File exists
        Where: slot "slot_name", output plugin "pgoutput", in the change callback, associated LSN AAA/ABC12345
      	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2733)
      	at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1311)
      	at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1210)
      	at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:49)
      	at org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:163)
      	at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:128)
      	at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:85)
      	at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.readPending(PostgresReplicationConnection.java:663)
      	at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.processMessages(PostgresStreamingChangeEventSource.java:217)
      	at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:179)
      	at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:42)
      	at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:324)
      	at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:203)
      	at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:143)
      	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      	at java.base/java.lang.Thread.run(Thread.java:840)
      

      Debezium continued trying to connect to the DB without success, and the task was never reported as "failed". Due to that reason we didn't identify the problem on time (we monitor failed tasks), and the DB filled up with WAL, and stopped working.

      After a DB server restart the problem was gone, and Debezium connected successfully.

      Do you see the same behaviour using the latest released Debezium version?

      I couldn't test it as the DB problem was solved after a restart, and we don't know how to reproduce that problem.

      Do you have the connector logs, ideally from start till finish?

      Unfortunately the only log I can share is the one above.

      How to reproduce the issue using our tutorial deployment?

      I don't know how to reproduce the problem on the DB server, which would cause the problem on Debezium side.

              Unassigned Unassigned
              enzo.cappa Enzo Cappa
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: