Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1245

Postgres connector failing because empty state data is being stored in offsets topic

    Details

    • Steps to Reproduce:
      Hide

      Seems to randomly happen shortly after new PG connectors using WAL2JSON are created.

      Show
      Seems to randomly happen shortly after new PG connectors using WAL2JSON are created.

      Description

      Sometimes a PG connector (using the WAL2JSON decoder) task can get into a weird state when it is restarted. I am seeing a message like this:

      {"name":"my-connector","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"localhost:8083","trace":"java.lang.NullPointerException\n\tat io.debezium.connector.postgresql.SourceInfo.load(SourceInfo.java:132)\n\tat io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:109)\n\tat io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:49)\n\tat org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:198)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"}],"type":"source"}
      

      Restarting the connector task more does not seem to help. It looks like the state data pullout off of the offsets topic is empty and there is no LSN to grab. I looked through the offsets topic and here is what the message looks:

      offset 27694: key ["my-connector",

      {"server":"cluster2"}

      ]: {}

      I can get the connector working by manually writing an LSN of 0 to the partition and restarting the connector.

      I am not sure what is causing this empty data to be written or if it is related to the recent change I made to make the heartbeat fire for all events. The empty should not be written but maybe the SourceTask error logic should be improved so that the connector falls back to getting the LSN from the slot when it cannot get the LSN from the offsets topic.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                jpechanec Jiri Pechanec
                Reporter:
                trolison Taylor Rolison
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: